Learning a discriminative high-fidelity dictionary for single channel source separation

doi:10.23919/JSEE.2021.000094

Journal of Systems Engineering and Electronics ›› 2021, Vol. 32 ›› Issue (5): 1097-1110.doi: 10.23919/JSEE.2021.000094

收稿日期:2020-07-22 出版日期:2021-10-18 发布日期:2021-11-04

Learning a discriminative high-fidelity dictionary for single channel source separation

Yuanrong TIAN^1,*(), Xing WANG²()

¹ School of Electronic Countermeasure, National University of Defense Technology, Hefei 230037, China
² Institute of Aeronautics Engineering, Air Force Engineering University, Xi’an 710038, China

Received:2020-07-22 Online:2021-10-18 Published:2021-11-04
Contact: Yuanrong TIAN E-mail:tianyuanrong20@nudt.edu.cn;xwang_mail@yeah.net
About author:|TIAN Yuanrong was born in 1989. He received his B.S. degree in electronics and information engineering from China University of Geosciences in 2011, and M.S. and Ph. D. degrees in communication and information system from Air Force Engineering University in 2014 and 2019 respectively. Currently, he is a lecturer in National University of Defense Technology. His primary research is on pattern analysis of geometric or statistical models in high-dimensional data space and applications in signal interception and analysis, image targets detection and mixing audio separation. E-mail: tianyuanrong20@nudt.edu.cn||WANG Xing was born in 1965. He received his B.S. and M.S. degrees in communication and electronical system from the former Air Force Engineering College, China, in 1987 and 1990. His Ph. D. degree is obtained in signal and information processing from Northwestern Polytechnical University in 2001. From 1996 to 1999, he was a visiting scholar at Zhukovsky Air Force Engineering College, Moscow, Russia. He served as the director of the Airborne Electronic Countermeasures Laboratory, Air Force Enginering University (AFEU), from 2001 to 2012. He is currently a professor in the Radar and Electronic Countermeasure Department, AFEU. His research interests mainly include radar signal interception and jamming, statistics model analysis, artificial intelligence based on machine learning, circuit design and integration. E-mail: xwang_mail@yeah.net
Supported by:
This work was supported by the National Natural Science Foundation of China (62001489) and the scientific research planning project of National University of Defense Technology (JS19-04)

摘要/Abstract

Abstract:

Sparse-representation-based single-channel source separation, which aims to recover each source’s signal using its corresponding sub-dictionary, has attracted many scholars’ attention. The basic premise of this model is that each sub-dictionary possesses discriminative information about its corresponding source, and this information can be used to recover almost every sample from that source. However, in a more general sense, the samples from a source are composed not only of discriminative information but also common information shared with other sources. This paper proposes learning a discriminative high-fidelity dictionary to improve the separation performance. The innovations are threefold. Firstly, an extra sub-dictionary was combined into a conventional union dictionary to ensure that the source-specific sub-dictionaries can capture only the purely discriminative information for their corresponding sources because the common information is collected in the additional sub-dictionary. Secondly, a task-driven learning algorithm is designed to optimize the new union dictionary and a set of weights that indicate how much of the common information should be allocated to each source. Thirdly, a source separation scheme based on the learned dictionary is presented. Experimental results on a human speech dataset yield evidence that our algorithm can achieve better separation performance than either state-of-the-art or traditional algorithms.

Key words: single channel source separation, sparse representation, dictionary learning, discrimination, high-fidelity

. [J]. Journal of Systems Engineering and Electronics, 2021, 32(5): 1097-1110.

Yuanrong TIAN, Xing WANG. Learning a discriminative high-fidelity dictionary for single channel source separation[J]. Journal of Systems Engineering and Electronics, 2021, 32(5): 1097-1110.

图/表 10

参考文献 31

1	ROWEIS S T. One microphone source separation. Advances in Neural Information Processing Systems, 2000, 13: 793–799.
2	VINCENT E, VIRTANEN T, GANNOT S. Audio source separation and speech enhancement. New York: John Wiley & Sons Press, 2018.
3	XU C, RAO W, XIAO X, et al. Single channel speech separation with constrained utterance level permutation invariant training using grid LSTM. Proc. of the International Conference on Acoustics, Speech and Signal Processing, 2018: 6–10.
4	WANG Q, WOO W L, DLAY S S, et al. Informed single channel speech separation with time-frequency exemplar GMM-HMM model. Proc. of the International Conference on Digital Signal Processing, 2015: 1130–1134.
5	YEMINY Y R, KELLER Y, GANNOT S. Single microphone speech separation by diffusion-based HMM estimation. EURASIP Journal on Audio, Speech, and Music Processing, 2016. DOI: 10.1186/s13636-016-0094-9.
6	ZIBULEVSKY M, PEARLMUTTER B A Blind source separation by sparse decomposition in a signal dictionary. Neural Computing, 2001, 13 (4): 863- 882. doi: 10.1162/089976601300014385
7	GOWREESUNKER B V, TEWFIK A H. Blind source separation using monochannel overcomplete dictionaries. Proc. of the International Conference on Acoustics, Speech and Signal Processing, 2008: 33−36.
8	QIAN Y M, WENG C, CHANG X K, et al. Past review, current progress, and challenges ahead on the cocktail party problem. Frontiers of Information Technology & Electronic Engineering, 2018, 19: 40–63.
9	BAO C L, JI H, QUAN Y H, et al Dictionary learning for sparse coding: algorithms and convergence analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2015, 38 (7): 1356- 1369.
10	AHARON M, ELAD M, BRUCKSTEIN A K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. on Signal Processing, 2006, 54 (11): 4311- 4322. doi: 10.1109/TSP.2006.881199
11	LEE D D, SEUNG H S Learning the parts of objects by non-negative matrix factorization. Nature, 1999, 401 (6755): 788- 791. doi: 10.1038/44565
12	HOYER P O. Non-negative sparse coding. Proc. of the 12th Workshop on Neural Networks for Signal Processing, 2002: 557−565.
13	YANG J C, WANG Z W, LIN Z, et al Coupled dictionary training for image super-resolution. IEEE Trans. on Image Processing, 2012, 21 (8): 3467- 3478. doi: 10.1109/TIP.2012.2192127
14	WEI X, SHEN H, LI Y X, et al Reconstructible nonlinear dimensionality reduction via joint dictionary learning. IEEE Trans. on Neural Networks and Learning Systems, 2018, 30 (1): 175- 189.
15	SCHMIDT M N, OLSSON R K. Single-channel speech separation using sparse non-negative matrix factorization. Proc. of the 9th International Conference on Spoken Language Processing, 2006: 2614–2617.
16	KING B J, ATLAS L Single-channel source separation using complex matrix factorization. IEEE Trans. on Audio Speech and Language Processing, 2011, 19 (8): 2591- 2597. doi: 10.1109/TASL.2011.2156786
17	GRAIS E M, ERDOGAN H. Single channel speech music separation using nonnegative matrix factorization with sliding window and spectral masks. Proc. of the 12th Annual Conference on International Speech Communication Association, 2011: 1773–1776.
18	GRAIS E M, ERDOGAN H. Discriminative nonnegative dictionary learning using cross-coherence penalties for single channel source separation. Proc. of the 14th Annual Conference on International Speech Communication Association, 2013: 808–812.
19	GANG A, BIYANI P On discriminative framework for single channel audio source separation. Proc. of the 17th Annual Conference on International Speech Communication Association, 2016, 565- 569.
20	XU Y F, BAO G Z, XU X, et al Single channel speech separation using sequential discriminative dictionary learning. Signal Processing, 2015, 106, 134- 140. doi: 10.1016/j.sigpro.2014.07.012
21	SUN L H, ZHAO C, SU M, et al Single-channel blind source separation based on joint dictionary with common sub-dictionary. International Journal of Speech Technology, 2018, 21 (1): 19- 27. doi: 10.1007/s10772-017-9469-2
22	WANG Z, SHA F. Discriminative non-negative matrix factorization for single-channel speech separation. Proc. of the International Conference on Acoustics, Speech and Signal Processing, 2014: 3749–3753.
23	WENINGER F, ROUX J L, HERSHEY J R, et al. Discriminative NMF and its application to single-channel source separation. Proc. of the 15th Annual Conference on International Speech Communication Association, 2014: 865–869.
24	WANG Y, WANG D L Towards scaling up classification-based speech separation. IEEE Trans. on Audio Speech and Language Processing, 2013, 21 (7): 1381- 1390. doi: 10.1109/TASL.2013.2250961
25	WENINGER F, HERSHEY J R, ROUX J L, et al. Discriminatively trained recurrent neural networks for single-channel speech separation. Proc. of the Global Conference on Signal and Information Processing, 2014: 577–581.
26	GRAIS E M, PLUMBLEY M D. Single channel audio source separation using convolutional denoising autoencoders. Proc. of the Global Conference on Signal and Information Processing, 2017: 1265–1269.
27	HERSHEY J R, CHEN Z, ROUX J L, et al. Deep clustering: discriminative embeddings for segmentation and separation. Proc. of the International Conference on Acoustics, Speech and Signal Processing, 2016: 31–35.
28	WRIGHT J, YANG A Y, GANESH A, et al Robust face recognition via sparse representation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2009, 31 (2): 210- 227. doi: 10.1109/TPAMI.2008.79
29	CHRISTENSEN H, BARKER J, MA N, et al. The CHiME corpus: a resource and a challenge for computational hearing in multisource environments. Proc. of the 11th Annual Conference on International Speech Communication Association, 2010: 1918–1921.
30	VINCENT E, GRIBONVAL R, FEVOTTE C Performance measurement in blind audio source separation. IEEE Trans. on Audio Speech and Language Processing, 2006, 14 (4): 1462- 1469. doi: 10.1109/TSA.2005.858005
31	MAJEED S A, HUSAIN H, SAMAD S A, et al Mel frequency cepstral coefficients (MFCC) feature extraction enhancement in the application of speech recognition: a comparison study. Journal of Theoretical and Applied Information Technology, 2015, 79 (1): 38- 56.

Method	l	SER/dB					SIR/dB
Method	l	η=0.06	η=0.07	η=0.08	η=0.1	η=0.15	η=0.06	η=0.07	η=0.08	η=0.1	η=0.15
K-SVD	60	0.75	1.39	1.81	1.82	1.98	6.56	7.83	8.67	8.93	9.35
	70	0.48	1.25	1.72	1.91	2.02	5.77	6.98	8.32	8.79	9.06
	80	0.92	1.31	1.66	1.91	2.07	5.40	7.01	7.61	8.34	9.02
	100	1.03	1.33	1.47	1.67	1.84	5.55	6.49	7.21	7.50	7.89
	120	1.09	1.33	1.53	1.67	1.79	5.50	6.51	6.91	7.16	7.57
DDL	60	2.15	2.09	2.03	2.00	1.95	9.21	8.95	9.07	9.25	9.34
	70	2.12	2.17	2.13	2.09	2.07	9.40	9.32	9.18	9.34	9.50
	80	2.12	2.18	2.17	2.11	2.12	9.68	9.23	9.30	9.07	9.38
	100	2.08	2.12	2.11	2.16	2.16	9.96	9.42	9.23	9.29	9.25
	120	2.00	2.11	2.17	2.22	2.24	9.87	9.71	9.72	9.70	9.69
DSNMF-LS	60	2.00	2.11	2.19	2.21	2.19	8.98	9.12	9.38	9.46	9.42
	70	2.08	2.22	2.26	2.26	2.20	9.00	9.22	9.46	9.61	9.57
	80	2.10	2.30	2.35	2.30	2.21	9.40	9.76	9.92	10.09	9.65
	100	2.00	2.10	2.19	2.16	2.17	9.48	9.62	9.82	9.79	10.00
	120	1.96	1.98	2.12	2.14	2.15	9.03	9.54	9.76	9.84	9.99
DHFDL	60	2.56	2.90	2.87	2.83	2.83	9.45	9.26	8.98	8.75	8.79
	70	2.64	2.80	2.80	2.90	2.91	10.48	9.93	8.99	9.29	9.16
	80	2.46	2.68	2.77	2.84	2.90	10.49	10.02	9.72	9.34	9.36
	100	2.16	2.58	2.63	2.74	2.81	7.78	10.32	9.95	9.79	9.93
	120	2.13	2.39	2.59	2.64	2.72	7.19	9.63	10.22	10.38	10.21

SSR	SER				SIR
SSR	K-SVD	DDL	DSNMF-LS	DHFDL	K-SVD	DDL	DSNMF-LS	DHFDL
?2	0.94	1.68	1.81	2.04	7.62	8.27	8.43	8.75
?1	1.42	2.05	2.09	2.37	8.34	8.95	9.12	9.13
0	2.07	2.12	2.21	2.90	9.02	9.38	9.65	9.36
1	1.43	2.05	2.18	2.36	8.34	8.97	9.12	9.14
2	0.96	1.67	1.69	1.99	7.67	8.20	8.43	8.69
Average	1.36	1.91	2.00	2.33	8.20	8.75	8.95	9.01

SNR	SER				SIR
SNR	K-SVD	DDL	DSNMF-LS	DHFDL	K-SVD	DDL	DSNMF-LS	DHFDL
?6	0.10	0.09	0.13	0.49	0.78	0.95	1.20	2.97
?3	0.27	0.15	0.20	0.64	1.50	1.54	1.78	3.90
0	0.37	0.40	0.48	0.79	2.59	2.38	2.35	5.20
3	0.57	0.87	0.95	1.00	3.12	3.10	3.28	6.02
6	1.23	1.69	1.70	1.86	3.70	4.12	4.94	7.13
9	1.98	2.22	2.48	2.69	4.84	5.44	6.88	7.34
Average	0.75	0.90	0.99	1.24	2.76	2.92	3.41	5.43

Learning a discriminative high-fidelity dictionary for single channel source separation

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 31

相关文章 0

编辑推荐

Metrics

本文评价