The Python codes for the implementation of the deep learning baselines can be downloaded from Qiuqiang Kong's Github page:
Python codes for task 4: audio tagging (More about this task can be found from here)
Key references: Q. Kong, I. Sobieraj, W. Wang and M. D. Plumbley, "Deep Neural Network Baseline for DCASE Challenge 2016," in DCASE2016 workshop. [PDF]
Hierarchical DNN for acoustic scene classification
Key references:
- Y. Xu, Q. Huang, W. Wang, M. D. Plumbley, "Hierarchical learning for DNN-based acoustic scene classification," in DCASE2016 workshop. [PDF]
Fully deep neural networks for audio tagging
Key references:
- Y. Xu, Q. Huang, W. Wang, P. J. B. Jackson and M. D. Plumbley, "Fully DNN-Based Multi-Label Regression for Audio Tagging," in DCASE 2016 Workshop. [PDF]
Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging
Key references:
- Y. Xu, Q. Huang, W. Wang, P. Foster, S. Sigtia, P. J. B. Jackson, and M. D. Plumbley, "Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging," IEEE/ACM Transactions on Audio Speech and Language Processing, February 2017. [PDF] (in press)
Deep learning for sterero speech separation
Key references: Y. Yu, W. Wang, and P. Han, "Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks", EURASIP Journal on Audio Speech and Music Processing, 2016:7, 18 pages, DOI 10.1186/s13636-016-0085-x, 2016. [PDF]
Particle Filtering, PHD Filtering, & Particle Flow Algorithms with Applications in Multimodal Fusion and Tracking
Adaptive particle filtering for audio-visual tracking of multiple speakers
Key references: V. Kilic, M. Barnard, W. Wang, and J. Kittler, "Audio assisted robust visual tracking with adaptive particle filtering", IEEE Transactions on Multimedia, vol. 17, no. 2, pp. 186-200, 2015. [PDF]
PHD filtering, Mean-Shift PHD filtering, and Sparse Sampling MS-PHD filtering for audio-visual tracking of multiple speakers
Key references: V. Kilic, M. Barnard, W. Wang, A. Hilton, and J. Kittler, "Mean-Shift and Sparse Sampling Based SMC-PHD Filtering for Audio Informed Visual Speaker Tracking", IEEE Transactions on Multimedia, vol. 18, no. 10, October 2016. [PDF]
Convolutive ICA, NMF, Time-Frequency Masking with applications in Blind Source Separation & Computational Auditory Scene Analysis
Underdetermined speech source separation based on sparse coding and dictionary learning
Key references:
- T. Xu, W. Wang, and W. Dai, "Sparse Coding with Adaptive Dictionary Learning for Underdetermined Blind Speech Separation", Speech Communication, vol. 55, no. 3, pp. 432-450, 2013. [PDF]
Convolutive speech source separation based on probabilistic time-frequency masking
Key references:
- A. Alinaghi, P. Jackson, Q. Liu, and W. Wang, "Joint Mixing Vector and Binaural Model Based Stereo Source Separation", IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 22, no. 9, pp. 1434-1448, 2014. [PDF]
Spatial Audio
Sparse L1-Optimal Multi-Loudspeaker Panning
Key references:
- A. Franck, W. Wang, F.M. Fazi, "Sparse, L_1-Optimal Multi-Loudspeaker Panning and its Relation to Vector Base Amplitude Panning", IEEE/ACM Transactions on Audio Speech and Language Processing, February 2017.[PDF]
[Home] [Publications] [Research] [Teaching] [Short Bio] [Demo & Data] [Codes]
Last updated in 14 March 2018
First created in 14 March 2018