[Home] [Publications] [Research] [Teaching] [Short Bio] [Demo & Data] [Codes]


Wenwu Wang


Professor of Signal Processing and Machine Learning
Co-Director of A-Lab (Machine Audition Lab)
AI Fellow of Surrey Institute for People Centred Artificial Intelligence
Chair (2023-2024) & Vice Chair (2022) & Member (2021-) of IEEE SPS Machine Learning for Signal Processing Technical Committe
Board Member (2023-2024) of IEEE SPS Technical Directions Board
Senior Area Editor (2019-2023) & Associate Editor (2014-2018) of IEEE Transactions on Signal Processing
Associate Editor (2020-2025) of IEEE/ACM Transactions on Audio Speech and Language Processing
Vice Chair (2022-2024) of EURASIP Technical Area Committee on Acoustic Speech and Music Signal Processing
Elected Member (2021-2026) of IEEE Signal Processing Theory and Methods Technical Committee

Contact:
Center for Vision Speech and Signal Processing
Department of Electrical and Electronic Engineering
Faculty of Engineering and Physical Sciences
University of Surrey
Guildford GU2 7XH
United Kingdom

Phone: +44 (0) 1483 686039
Fax: +44 (0) 1483 686031
Email: w.wang AT surrey.ac.uk
Office: 04BB01 (Building BB, 1st floor)


Research Interests


Publications

Research

Teaching

Short Bio

Demo & Data

Free software toolbox


Administrative Duties


News

[02/2024] We are involved in the organization of two tasks in DCASE 2024 Challenges: Task 6 - Automated Audio Captioning, and Task 9 - Language-Queried Audio Source Separation.

[02/2024] Appointed Associate Editor for IEEE Transactions on Multimedia.

[02/2024] Special Session Co-Chair for MLSP 2024, London, 22-25 September, 2024.

[01/2024] Invited Speaker on the Undersea Defence Technology Conference (UDT), London, 2024.

[01/2024] Shortlisted for Surrey Open Research Award, Annual Open Research Culture Event 2024, University of Surrey, 2024, for the work "Advancing Open Research in Audio Processing" by H. Liu, M. Plumbley, W. Wang.

[01/2024] Invited Keynote Speaker on the 2nd International Workshop on Signal Processing and Machine Learning (WSPML),Virtual, 2024.

[12/2023] Awarded an IAS Fellowship project to host Associate Professor Dr Bingfeng Zhang, China University of Petroleum (East China), to study in the field of leveraging audio and text information to improve image segmentation of rare objects.

[12/2023] Invited Speaker on GHOST DAY Applied Machine Learning Conference, Poland, 05-06 April, 2024.

[10/2023] Invited Keynote Speaker on SoRAIM Winter School, Grenoble, France, February 19th-23rd, 2024.

[10/2023] Invited Speaker on SANE 2023, Speech and Audio in the Northeast, 26th October 2023, New York.

[09/2023] Won the Judge's Award on DCASE 2023, Tampere, Finland, for the work "Text-Driven Foley Sound Generation With Latent Diffusion Model", co-authored by Y. Yuan, H. Liu, X. Liu, X. Kang, M. D. Plumbley, and W. Wang. This is an improved version of AudioLDM tuned for the DCASE challenge task 7 dataset. See the full paper, an earlier report, and the code.

[08/2023] Invited Survey Talk on the 24th INTERSPEECH Conference Interspeech 2023, 20-24 August, Dublin, Ireland. [Talk ppt slides download]

[07/2023] WavJourney: a new compositional text-to-audio generation model, released in August 2023. The WavJourney system can be used to create audio content with storylines encompassing speech, music, and sound effects, guided by text instructions. WavJourney leverages large language models to connect various audio models for audio content generation, applicable across diverse real-world scenarios, including science fiction, education, and radio play.

[07/2023] Following the release of the powerful text-to-audio model AudioLDM in February 2023, here comes an even more powerful model AudioLDM 2 with paper/code/demos released in August 2023. AudioLDM 2 is a novel and versatile audio generation model capable of performing conditional audio, music, and intelligible speech generation. AudioLDM 2 achieves state-of-the-art (SoTA) performance in text-to-audio and text-to-music generation, while also delivering competitive results in text-to-speech generation, comparable to the current SoTA.

[07/2023] Satellite Workshop Co-Chair for ICASSP 2024 - the 49th IEEE International Conference on Acoustics, Speech, and Signal Processing, to be held in Seoul, South Korea.

[06/2023] Invited Talk on the Workshop of Advances in Neuromorphic AI and Electronics 2023, 26-29 June, Loughborough, UK.

[06/2023] Congratulations to Thomas Marshall for winning The BAE Systems Applied Intelligence Prize for his final-year project "Using Electroencephalogram and Machine Learning for Prosthetics", completed under my supervision.

[06/2023] Our system is ranked at the First Position in DCASE 2023 Challenge Task 7 (Foley Sound Synthesis). More details can be found from the results, paper, and the code.

[06/2023] Honored to be an invited Perspective Talks Speaker on the 48th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), held at Rhodes, Greece, on June 4-10, 2023. This is one of the six Perspective Talks (three academic & three industrial) on the conference. ICASSP 2023 had 4000+ people registered, and 3500+ attended onsite. [Talk ppt slides download]

[04/2023] Keynote Speaker on the 10th National Conference on Sound and Music Technology (CSMT 2023), held at Guangzhou, China, on June 2-4, 2023.

[03/2023] Guest Editor of IEEE Transactions on Circuits and Systems for Video Technology, Special Issue on AI-Generated Content for Multimedia (submission deadline: 1st June 2023).

[03/2023] Our first text to audio generation method has been selected as the baseline system for Task 7 (Foley Sound Synthesis) of the DCASE 2023 Challenge. This method, presented on IEEE MLSP 2021, was, to our knowledge, among the first in the field of general audio generation with text input, e.g. dog barking, people talking, baby crying. Our earlier work on sound generation include:

[02/2023] AudioLDM: a powerful state-of-the-art method for generating speech/sound effects/music and beyond, in terms of text description, e.g. "A hammer is hitting a wooden surface". Somebody on social media called it "ChatGPT for audio". It is now one of the 25 most-liked machine learning apps on Hugging Face Spaces among 25000+ apps. It can also be downloaded from Replicate and Zenodo. Since its release in early February 2023 by Haohe Liu (our second year PhD), it has attracted significant attention in the community and on the social media. Typing "AudioLDM" in Google will pop up at least six pages of entries discussing AudioLDM. Some examples of media attention about this work include: Youtube: 1, 2, and more, MarkTechPost, Note, MachineHeart, and many twits on Twitter and LinkedIn. This tool has been integrated by others into their apps such as AI Albums, Diffuser Library, and Image to Audio Generation. Please check out the project page for the paper, code, and demos of this fantastic method. See the University Press Release here.

[01/2023] Featured Article "Automated audio captioning: an overview of recent progress and new challenges" [PDF]

[01/2023] Board Member, IEEE SPS Technical Directions Board.

[12/2022] IEEE SPS Young Author Best Paper Award given to our former PhD graduates Qiuqiang Kong and Turab Iqbal for the paper co-authored with Yin Cao, Yuxuan Wang, Wenwu Wang, and Mark D. Plumbley, "PANNs: large-scale pretrained audio neural networks for audio pattern recognition", IEEE/ACM Transactions on Audio Speech and Language Processing, 2020. [PDF] [code] [IEEE award page]

[08/2022] Invited Plenary Speaker on the 21st UK Workshop on Computational Intelligence (UKCI 2022), 7-9 September 2022, Sheffield.

[07/2022] Achieved excellent results on DCASE 2022 Challenge: the Second Place on Task 5 - Few Shot Bioacoustic Event Detection (results, paper, and code), the Second Place on Task 6b - Language based Audio Retrieval (results, paper, and code), the Third Place in DCASE 2022 Challenge Task 6a - Automated Audio Captioning (results, paper, and code).

[06/2022] Elected Vice Chair of EURASIP Technical Area Committee on Acoustic Speech and Music Signal Processing (TAC-ASMSP). Thanks the TAC-ASMSP members for voting me.

[04/2022] Appointed Associate Editor for (Nature) Scientific Reports.

[03/2022] Awarded two projects by SAAB in the area of intelligent information fusion for distributed sensor networks and sensor swarms, with Prof Pei Xiao (CI), 5G/6G Innovation Centre, Institute of Communication Systems.

[02/2022] Plenary Speaker on GHOST DAY Applied Machine Learning Conference, 24-26 March, 2022. Video lecture online.

[01/2022] Awarded an EPSRC iCASE project, titled "Differentiable particle filters for data-driven sequential inference", with Dr Yunpeng Li (PI), and industrial partner National Physical Laboratory. The university also provides a second project to match the iCASE award.

[01/2022] Appointed AI Fellow of the newly established Surrey Institute for People Centred Artificial Intelligence.

[11/2021] Elected Vice Chair of IEEE Machine Learning for Signal Processing Technical Committe, to serve from January 2022. Thanks the TC members for voting me.

[11/2021] Proud to be on the Standford University List of World Top 2% Scientists. See more information here.

[10/2021] Plenary Speaker of the International Workshop on Neuro-engineering and Signal Processing for Diagonostics and Control, Taiyuan, China, 15-17, October, 2021.

[09/2021] Won the Best Paper Award on the 2021 International Conference on Autonomous Unmanned Systems (ICAUS 2021). See the full paper here.

[08/2021] Keynote Speaker, 8th International Conference on Signal Processing and Integrated Networks (SPIN 2021), Noida Delhi/NCR, India, August 26-27, 2021.

[07/2021] Achieved the Third Place in DCASE 2021 Challenge on Task 6 - Automated Audio Captioning. Check the results here, and the paper here.

[04/2021] Awarded a five-year EPSRC grant, under the Prosperity Partnership scheme, worth £15M (including industrial supports), titled "BBC Prosperity Partnership: AI for Future Personalised Media Experiences". (Surrey investigators: Prof Adrian Hilton (PI, project lead), CIs: Dr Philip Jackson, Dr Armin Mustafa, Dr Jean-Yves Guillemaut, Dr Marco Volina, and Prof Wenwu Wang; Project manager: Mrs Elizabeth James) [The project is led by University of Surrey, in collaboration with BBC and University of Lancaster, with supports from 10+ industrial partners.] (See project website for more information, and the press releases by Surrey, UKRI, and BBC)

[04/2021] Appointed Award Sub-Committee Chair of IEEE Machine Learning for Signal Processing Technical Committe.

[03/2021] Invited to serve as Satellite Workshop Co-chair on the organising committee of INTERSPEECH 2022, to be held in Incheon, Korea, 18-22 September, 2022. INTERSPEECH is the flagship conference in speech/language science and technology with 1000+ attendees each year.

[03/2021] Keynote Speaker, on Robotics and Artificial Intelligence Virtual Conference (V-Robot2021), 27-28 March, 2021.

[02/2021] Awarded a £250k two-year British Council grant (Newton Institutional Links Award), titled "Automated Captioning of Image and Audio for Visually and Hearing Impaired". (Surrey investigators: Prof Wenwu Wang (PI), project lead). [jointly with Izmir Katip Celebi University (IKCU) (Dr Volkan Kilic).]

[01/2021] Invited Keynote Speaker, Global Summit and Expo on "Robot Intelligence Technology and Applications" (GSERITA2021), Lisbon, Portugal, September 06-08, 2021.

[12/2020] Keynote Speaker, Workshop on Intelligent Navigation and Advanced Information Fusion Technology, Harbin, China, December 12-13, 2020. (1000+ attendees online)

[11/2020] Elected Member of two IEEE Technical Committees: IEEE Signal Processing Theory and Methods Technical Committee & IEEE Machine Learning for Signal Processing Technical Committe, both for a three-year term to start on 1st January 2021.

[11/2020] Won two awards on DCASE 2020, November, 2-4, Tokyo, Japan. The paper "Incorporating Auxiliary Data for Urban Sound Tagging", authored by Turab Iqbal, Yin Cao, Mark D. Plumbley and Wenwu Wang, was given the Judge's Award, for "the method considered by the judges to be the most interesting or innovative". The paper "Event-Independent Network for Polyphonic Sound Event Localization and Detection", authored by Yin Cao, Turab Iqbal, Qiuqiang Kong, Zhong Yue, Wenwu Wang, and Mark D. Plumbley, was given the Reproducible system award, for "the highest scoring method that is open-source and fully reproducible". Read the university newsletter here.

[10/2020] Awarded a $1.2M three-year DoD & MoD grant (UDRC phase 3 application theme on Signal and Information Processing for Decentralized Intelligence, Surveillance, and Reconnaissance), titled "SIGNetS: signal and information gathering for networked surveillance". (Surrey investigators: Prof Wenwu Wang (PI), and Prof Pei Xiao (CI)). [The project is led by University of Cambridge (Prof Simon Godsill), jointly with University of Surrey and University of Sheffield (Prof Lyudmila Mihaylova).] (project website)

[10/2020] Awarded a £500k MoD grant (DASA call on Countering Drones), titled "Acoustic surveillance", to develop AI technologies for drone detection with acoustic sensors. (Surrey investigators: Prof Wenwu Wang). [The project is led by Airspeed.]

[09/2020] Awarded a £2.3M three-year EPSRC grant (responsive mode), titled "Multimodal video search by examples". (Surrey investigators: Prof Josef Kittler (PI), Prof Miroslaw Bober (CI), Prof Wenwu Wang (CI) and Prof Mark Plumbley (CI). [The project is led by Ulster University (Prof Hui Wang), jointly with University of Surrey and University of Cambridge (Prof Mark Gales).]

[08/2020] Keynote Speaker on the 6th International Conference on Machine Vision and Machine Learning (MVML 2020), Prague, Czech Republic, August 13-15, 2020.

[07/2020] Achieved the First Place (with Turab Iqbal, Yin Cao, and Mark Plumbley) on the DCASE 2020 challenge on Task 3: "Urban Sound Tagging with Spatiotemporal Contexts".

[07/2020] Awarded a three-year industrial project by Tencent AI Lab (Seattle, US), titled "Particle flow PHD filtering for audio-visual multi-speaker speech tracking". Surrey Investigator: Prof Wenwu Wang (PI). Industry partner: Tencent (Dr Yong Xu).

[05/2020] Awarded an EPSRC impact acceleration account (IAA) project titled "audio tagging for meta data generation of media for programme recommendation". Surrey Investigators: Prof Wenwu Wang (PI) and Prof Mark Plumbley (CI). Industry partner: BBC (Dr Chris Baume and Dr Chris Pike).

[02/2020] External PhD Examiner at Nanyang Technological University, Singapore.

[01/2020] Appointed Associate Editor (2020-) for IEEE/ACM Transactions on Audio Speech and Language Processing.

[01/2020] Awarded a 2020 DUO-India Professor Fellowship, to study deep embedding techniques in audio scene classification and event detection.

[12/2019] Keynote Speaker on the IEEE International Conference on Signal, Information and Data Processing (ICSIDP 2019), Chongqing, China, 11-13 December 2019, with 1000+ attendees.

[12/2019] External PhD Examiner at University of Oxford, Imperial College London, Queen Mary University of London, and Newcastle University.

[11/2019] Elected Member of the International Steering Committee of Latent Variable Analysis and Signal Separation.

[10/2019] CVSSP audio team (Yin Cao, Turab Iqbal, Qiuqiang Kong, Miguel Blanco Galindo, Wenwu Wang and Mark Plumbley) was given the Reproducible System Award on the DCASE 2019 Workshop for their system "Sound Event Localization and Detection". The award was given to recognize the quality, innovation and reproducibility of the work. Our system is described in this paper. See here for the source codes that implement the system.

[07/2019] I gave 32 hours invited lectures in a Summer School on Machine Learning, in Beijing Post and Telecommunication University.

[07/2019] CVSSP audio team (Yin Cao, Turab Iqbal, Qiuqiang Kong, Miguel Blanco Galindo, Wenwu Wang and Mark Plumbley) did well in the DCASE 2019 Challenge Task 3 (Sound Event Localization and Detection). The team was ranked 2nd overall out of 23 teams, and the top academic team. More details can be found from here. See here for the paper that describes our proposed system, and here for source codes for implementing the system.

[05/2019] ICASSP 2024 will be held in Seoul, South Korea. I will serve as Tutorial Chair on the organising committee.

[05/2019] ICASSP 2019 was successfully held during 12-17 May, in Brighton, UK, with 3100+ attendees from all over the world. I served as Publication Chair on the organising committee.

[05/2019] Congratulations to Qiuqiang Kong for the acceptance by IJCAI 2019 of our paper "Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks". Acceptance rate this year: 850/4752 = 17.9%. IJCAI is a flagship conference in AI, along with NIPS and AAAI.

[05/2019] Congratulations to Yang Liu for being selected as a Best Student Paper Award Finalist on IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019), Brighton, UK, for our paper: Y. Liu, Q. Hu, Y. Zou, and W. Wang, "Labelled non-zero particle flow for SMC-PHD filtering". [PDF]

[01/2019] Appointed Senior Area Editor (2019-) for IEEE Transactions on Signal Processing. TSP is the flagship journal in the area of signal processing.

[11/2018] Plenary Speaker on the 7th International Conference on Signal and Image Processing, November 28-30, 2018, Sanya, China.

[11/2018] Awarded a Guest Professor by Qingdao University of Science and Techonology, China.

[11/2018] Keynote Speaker on the 6th China Conference on Sound and Music Technology, November 24-26, 2018, Xiamen, China.

[11/2018] Congratulations to Turab Iqbal for being awarded the CVSSP Outstanding First Year PhD.

[10/2018] Invited Keynote Speaker on the International Conference on Digital Image and Signal Processing, April 29-30, 2019, Oxford, UK.

[08/2018] Keynote Speaker on the China Computer Federation (CCF) Workshop on Sparse Representation and Deep Learning, Shenzhen, China.

[08/2018] We finished on the 3rd place among the 558 teams worldwide who participated the Kaggle "Freesound General-Purpose Audio Tagging Challenge" (Can you automatically recognize sounds from a wide range of real-world environments?). Congratulations to Turab Iqbal and Qiuqiang Kong for this great achievement. See here for the competition results, and here for the paper that describes our proposed system. Read the university press release here.

[07/2018] Best Student Paper Award on the 14th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA 2018). Congratulations to Lucas Rencker. Paper: Consistent dictionary learning for signal declipping. The Matlab codes can be found from Lucas Rencker's Github page or personal page. Read the university news here.

[06/2018] Invited seminar at Oxford University, Machine Learning Group, "Deep learning for audio classification".

[12/2017] Plenary Speaker on 3rd Intelligent Signal Processing Conference, London.

[11/2017] Plenary Speaker in Alan Turing Institute, Workshop on Data Science and Signal Processing.

[09/2017] CVSSP audio team (Yong Xu, Qiuqiang Kong, Wenwu Wang and Mark D. Plumbley) won the 1st prize on the DCASE2017 Challenge for the task "large-scale weakly supervised sound event detection for smart cars"!. DCASE 2017 challenge is organized by TUT, CMU and INRIA, sponsored by Google and Audio Analytic. The CVSSP team submitted four systems to the audio tagging sub-task, which took all the top four places on the result table, among the 31 systems submitted by a number of organisations. CVSSP's system is also ranked at the 3rd place in the sound event detection subtask, among 17 systems. The competitors include CMU, New York University, Bosch, USC, TUT, Singapore A*Star, Korean Advanced Institute of Science and Technology, Seoul National University, National Taiwan university, etc. More details about the systems we submitted can be found from here. The competition results can be found from here. Read the university news here.

[02/2017] CVSSP awarded a £1.5M five-year EPSRC platform grant entitled "Audio-Visual Media Research". The project is led by Prof Adrian Hilton with co-investigators including Mark Plumbley, Josef Kittler, Wenwu Wang, John Collomosse, Philip Jackson, and Jean-Yves Guillemaut.

[11/2016] Congratulations to Lucas Rencker for winning the CVSSP Directors Award for Outstanding First Year PhD Performance after his PhD confirmation on "Sparse representations for audio restoration and inpainting".

[10/2016] S3A and BBC Research have won the TVB Europe Award for Best Achievement in Sound for "The Turning Forest" VR sound experience using the spatial audio Radio Drama produced in S3A integrated into an immersive audio-visual experience by the BBC. The award was made at the European TV Industry awards last night, with S3A winning against entries from Britain's Got Talent, Sky's the Five and BBC TV Programme coverage. More details can be found from here.

[10/2015] Awarded a €2.98M three-year EC Horizon 2020 grant entitled "ACE-CReAte: Audio Commons - an Ecosystem for Creative Use of Audio Contents". The project is led by Universitat Pompeu Fabra (Spain), in collaboration with University of Surrey, Queen Mary University of London, Jamendo SA (LU), AudioGaming (France), and Waves Audio Ltd (Ireland). Surrey team is composed of Prof Mark Plumbley (PI), Dr Wenwu Wang, Dr Tim Brookes (Institute of Sound Recording), and Dr David Plans (School of Business).

[09/2015] Awarded a £1.3M three-year EPSRC grant entitled "Making Sense of Sounds". The project is led by University of Surrey, in collaboration with University of Salford. Surrey team is composed of Prof Mark Plumbley (Lead and PI of the whole project), Dr Wenwu Wang, Dr Philip Jackson, and Prof David Frohlich (Digital World Research Centre).

[03/2014] Congratulations to Jing Dong for winning the IEEE Signal Processing Society Travel Grant to attend the ICASSP 2014 conference in Florence, Italy.

[01/2014] We were delighted to see that Figure 6 of our paper below was shown on the front page of IEEE Transactions on Signal Processing: Q. Liu, W. Wang, P. Jackson, M. Barnard, J. Kittler, and J.A. Chambers, "Source Separation of Convolutive and Noisy Mixtures using Audio-Visual Dictionary Learning and Probabilistic Time-Frequency Masking", IEEE Transactions on Signal Processing, vol. 61, no. 22, pp. 5520-5535, 2013. [PDF]

[10/2013] Congratulations to Volkan Kilic for winning the CVSSP Directors Award for Outstanding Performance in the First Year of his PhD. Research topic: Audio-visual tracking of multiple moving speakers.

[10/2013] Awarded an industrial project entitled "Enhancing Speech Quality Using Lip Tracking" by Samsung Electronics Research Institute (UK). Industry partner: Dr Holly Francis (Samsung).

[09/2013] Awarded a £5.4M (FEC: £6.5M) five-year EPSRC programme grant entitled "S3A: Future Spatial Audio for an Immersive Listener Experience at Home". The project is led by University of Surrey, in collaboration with University of Southampton, University of Salford and BBC. Surrey team is composed of Prof Adrian Hilton (Lead and PI of the whole program), Dr Philip Jackson, Dr Wenwu Wang and Dr Tim Brookes (Institute of Sound Recording).

[03/2013] IEEE Signal Processing Society Travel Grant. Congratulations to Volkan Kilic for winning this competitive award for attending the ICASSP 2013 conference in Vancouver, Canada.

[12/2012] Awarded a £4.4M (FEC) five-year project supported by the EPSRC and Dstl entitled "Signal Processing Solutions for the Networked Battlespace". Our consortium, as part of the Phase II of the UDRC in Signal Processing, is composed of Loughborough, Surrey, Strathclyde and Cardiff (LSSC) universities, as well as six industry partners QinetiQ, Selex-Galileo, Thales, Texas Instruments, PrismTech and Steepest Ascent. Surrey team is composed of Dr Wenwu Wang (PI), Prof Josef Kittler and Dr Philip Jackson. The project was led by Prof Jonathon Chambers.

[09/2012] Best Solution Award on the DSTL Challenge Workshop for the signal processing challenge "Undersampled Signal Recognition", announced on the SSPD 2012 conference, London, September 25-27, 2012. Congratulations to Qingju Liu for this achievement.


[Home] [Publications] [Research] [Teaching] [Short Bio] [Demo & Data] [Codes]

Last updated in May 2023
First created in May 2007
Taken in March 2013