A variational stochastic dirichlet process-based autoencoder model for fine-grained music source separation
Yin ZHU , Jingqi LI , Cong JIN , Qiuqiang KONG , Hongming SHAN , Junping ZHANG
Front. Comput. Sci. ›› 2027, Vol. 21 ›› Issue (1) : 2101308
Traditional source separation methods rely on coarse-grained categorical labels through labeling all vocals collectively without distinguishing individual voices in an audio mixture, which inherently limits the ability to isolate single tracks. While fine-grained annotations could partially address this issue, they demand substantial resources and face challenges in extracting tracks from raw signals. To overcome these limitations, we propose to extract each track through decomposing the patterns of data generation. Specifically, we refine Variational Stochastic Dirichlet Process-VAE, a variational autoencoder framework through replacing the standard variational distribution by a variational stochastic Dirichlet process (VSDP). Among our proposed framework, the encoder, leveraging stick-breaking constructions, adaptively partitions the latent space into clusters, while the decoder designed to recover each component achieves implicit signal separation. Its advantage is that the reconstruction target can be shifted from the raw input to its individual components. Experiments demonstrate our method’s efficacy in two scenarios: (1) Under coarse-grained source definitions, it reaches near-state-of-the-art performance (SDR=10.3); (2) For fine-grained track separation, what’s more, it identifies 83% of individual vocal tracks with an average SDR of 7.8, which cannot be obtained by other SOTA methods without the help of annotations.
music source separation / variational stochastic process / generative model / fine-grained track separation
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
Yao D, Wang J, Chen W, Guo F, Han P, Bi J. Deep dirichlet process mixture model for non-parametric trajectory clustering. In: Proceedings of the 40th IEEE International Conference on Data Engineering (ICDE). 2024, 4449–4462 |
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
Higher Education Press
/
| 〈 |
|
〉 |