High-fidelity audio generation and representation learning with guided adversarial autoencoder

Article

Haque, Kazi Nazmul, Rana, Rajib and Schuller, Bjorn W.. 2020. "High-fidelity audio generation and representation learning with guided adversarial autoencoder." IEEE Access. 8, pp. 223509-223528. https://doi.org/10.1109/ACCESS.2020.3040797

Related Output
Article Title	High-fidelity audio generation and representation learning with guided adversarial autoencoder
ERA Journal ID	210567
Article Category	Article
Authors	Haque, Kazi Nazmul (Author), Rana, Rajib (Author) and Schuller, Bjorn W. (Author)
Journal Title	IEEE Access
Journal Citation	8, pp. 223509-223528
Article Number	9272282
Number of Pages	20
Year	2020
Publisher	IEEE (Institute of Electrical and Electronics Engineers)
Place of Publication	Piscataway, NJ, United States
ISSN	2169-3536
Digital Object Identifier (DOI)	https://doi.org/10.1109/ACCESS.2020.3040797
Web Address (URL)	https://ieeexplore.ieee.org/document/9272282
Abstract	Generating high-fidelity conditional audio samples and learning representation from unlabelled audio data are two challenging problems in machine learning research. Recent advances in the Generative Adversarial Neural Networks (GAN) architectures show great promise in addressing these challenges. To learn powerful representation using GAN architecture, it requires superior sample generation quality, which requires an enormous amount of labelled data. In this paper, we address this issue by proposing Guided Adversarial Autoencoder (GAAE), which can generate superior conditional audio samples from unlabelled audio data using a small percentage of labelled data as guidance. Representation learned from unlabelled data without any supervision does not guarantee its' usability for any downstream task. On the other hand, during the representation learning, if the model is highly biased towards the downstream task, it losses its generalisation capability. This makes the learned representation hardly useful for any other tasks that are not related to that downstream task. The proposed GAAE model also address these issues. Using this superior conditional generation, GAAE can learn representation specific to the downstream task. Furthermore, GAAE learns another type of representation capturing the general attributes of the data, which is independent of the downstream task at hand. Experimental results involving the S09 and the NSynth dataset attest the superior performance of GAAE compared to the state-of-the-art alternatives.
Keywords	audio generation, representation learning, generative adversarial neural network, guided generative adversarial autoencoder
Is part of	Guided Disentangled Representation Learning from Audio data for Transfer Learning
Contains Sensitive Content	Does not contain sensitive content
ANZSRC Field of Research 2020	460212. Speech recognition
	461104. Neural networks
	461103. Deep learning
Public Notes	This article is part of a UniSQ Thesis by publication. See Related Output.
Byline Affiliations	School of Sciences
	Imperial College London, United Kingdom
Institution of Origin	University of Southern Queensland

Permalink -

https://research.usq.edu.au/item/q63y2/high-fidelity-audio-generation-and-representation-learning-with-guided-adversarial-autoencoder

Download files

Published Version

	09272282.pdf
License: CC BY 4.0
File access level: Anyone

148
total views
121
total downloads
2
views this month
0
downloads this month

Export as

Related outputs

Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition

Rajapakshe, Thejan, Rana, Rajib, Khalifa, Sara and Schuller, Björn W.. 2024. "Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition." IEEE Access. 12, pp. 193101-193114. https://doi.org/10.1109/ACCESS.2024.3519761

A Systematic Review Protocol of Child Mental Health Screening Tools and Systems

Terlich, Rebecca Jane, Krishnamoorthy, Govind, March, Sonja, Fein, Erich, Rana, Rajib and Koirala, Ishwar. 2024. "A Systematic Review Protocol of Child Mental Health Screening Tools and Systems." OSF Registries. https://doi.org/10.17605/OSF.IO/YN6HB

Dual-Phase Neural Networks for Feature Extraction and Ensemble Learning for Recognizing Human Health Activities

Dhar, Joy, Rana, Kapil, Goyal, Puneet, Alavi, Azadeh, Rana, Rajib, Vo, Bao Quoc, Mishr, Sudeepta and Mistry, Sajib. 2024. "Dual-Phase Neural Networks for Feature Extraction and Ensemble Learning for Recognizing Human Health Activities." Applied Soft Computing. 169. https://doi.org/10.1016/j.asoc.2024.112550

Feasibility of Mental Health Triage Call Priority Prediction Using Machine Learning

Rana, Rajib, Higgins, Niall, Haque, Kazi Nazmul, Burke, Kylie, Turner, Kathryn and Stedman, Terry. 2024. "Feasibility of Mental Health Triage Call Priority Prediction Using Machine Learning." Nursing Reports. 14 (4), pp. 4162-4172. https://doi.org/10.3390/nursrep14040303

emoDARTS: Joint Optimization of CNN and Sequential Neural Network Architectures for Superior Speech Emotion Recognition

Rajapakshe, Thejan, Rana, Rajib, Khalifa, Sara, Sisman, Berrak, Schuller, Björn W. and Busso, Carlos. 2024. "emoDARTS: Joint Optimization of CNN and Sequential Neural Network Architectures for Superior Speech Emotion Recognition." IEEE Access. 12, pp. 110492-110503. https://doi.org/10.1109/ACCESS.2024.3439604

HGSOXGB: Hunger-Games-Search-Optimization-Based Framework to Predict the Need for ICU Admission for COVID-19 Patients Using eXtreme Gradient Boosting

Pinki, Farhana Tazmim, Awal, Md Abdul, Mumenin, Khondoker Mirazu, Hossain, Md. Shahadat, Faysal, Jabed Al, Rana, Rajib, Almuqren, Rajib, Ksibi, Amel and Samad, Md Abdus. 2023. "HGSOXGB: Hunger-Games-Search-Optimization-Based Framework to Predict the Need for ICU Admission for COVID-19 Patients Using eXtreme Gradient Boosting." Mathematics. 11 (18). https://doi.org/10.3390/math11183960

Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering

Siriwardhana, Shamane, Weerasekera, Rivindu, Wen, Elliott, Kaluarachchi, Tharindu, Rana, Rajib and Nanayakkara, Suranga. 2023. "Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering." Transactions of the Association for Computational Linguistics. 11, pp. 1-17. https://doi.org/10.1162/tacl_a_00530

A novel employability embedding framework for three-year bachelor’s programs

Rana, Rajib, Galligan, Linda, Fard, Rouz and McCredie, Tessa. 2023. "A novel employability embedding framework for three-year bachelor’s programs." Journal of Teaching and Learning for Graduate Employability. 14 (1), pp. 104-118. https://doi.org/10.21153/jtlge2023vol14no1art1604

Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review

Hossain, Elias, Rana, Rajib, Higgins, Niall, Soar, Jeffrey, Barua, Prabal Datta, Pisani, Anthony R. and Turner, Kathryn. 2023. "Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review." Computers in Biology and Medicine. 155. https://doi.org/10.1016/j.compbiomed.2023.106649

Multitask Learning From Augmented Auxiliary Data for Improving Speech Emotion Recognition

Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja and Schuller, Bjorn W.. 2023. "Multitask Learning From Augmented Auxiliary Data for Improving Speech Emotion Recognition ." IEEE Transactions on Affective Computing. 14 (4), pp. 3164-3176. https://doi.org/10.1109/TAFFC.2022.3221749

Speech Synthesis with Mixed Emotions

Zhou, Kun, Sisman, Berrak, Rana, R., Schuller, Bjorn W. and Li, Haizhou. 2023. "Speech Synthesis with Mixed Emotions." IEEE Transactions on Affective Computing. 14 (4), pp. 3120-3134. https://doi.org/10.1109/TAFFC.2022.3233324

Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition

Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja and Schuller, Bjorn. 2023. "Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition." IEEE Transactions on Affective Computing. 14 (3), pp. 1912-1926. https://doi.org/10.1109/TAFFC.2022.3167013

Emotion Intensity and its Control for Emotional Voice Conversion

Zhou, Kun, Sisman, Berrak, Rana, Rajib, Schuller, Bjorn W. and Li, Haizhou. 2023. "Emotion Intensity and its Control for Emotional Voice Conversion." IEEE Transactions on Affective Computing. 14 (1), pp. 31-48. https://doi.org/10.1109/TAFFC.2022.3175578

A Novel Policy for Pre-trained Deep Reinforcement Learning for Speech Emotion Recognition

Rajapakshe, Thejan, Rana, Rajib, Khalifa, Sara, Liu, Jiajun and Schuller, Bjorn W. 2022. "A Novel Policy for Pre-trained Deep Reinforcement Learning for Speech Emotion Recognition." Abramson, David and Dinh, Minh Ngoc (ed.) 2022 Australasian Computer Science Week (ACSW 2022). Brisbane, Australia 14 - 17 Feb 2022 United States. https://doi.org/10.1145/3511616.3513104

SigRep: Towards Robust Wearable Emotion Recognition with Contrastive Representation Learning

Dissanayake, Vipula, Seneviratne, Sachith, Rana, Rajib, Wen, Elliot, Kaluarachchi, Tharindu and Nanayakkara, Suranga. 2022. "SigRep: Towards Robust Wearable Emotion Recognition with Contrastive Representation Learning." IEEE Access. 10, pp. 18105-18120. https://doi.org/10.1109/ACCESS.2022.3149509

Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition

Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja, Epps, Julien and Schuller, Bjorn W.. 2022. "Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition." IEEE Transactions on Affective Computing. 13 (2), pp. 992-1004. https://doi.org/10.1109/TAFFC.2020.2983669

Survey of Deep Representation Learning for Speech Emotion Recognition

Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja, Qadir, Junaid and Schuller, Bjorn. 2023. "Survey of Deep Representation Learning for Speech Emotion Recognition." IEEE Transactions on Affective Computing. 14 (2), pp. 1634-1654. https://doi.org/10.1109/TAFFC.2021.3114365

Towards a Compressive-Sensing-Based Lightweight Encryption Scheme for the Internet of Things

Xue, Wanli, Luo, Chengwen, Shen, Yiran, Rana, Rajib, Lan, Guohao, Jha, Sanjay, Seneviratne, Aruna and Hu, Wen. 2021. "Towards a Compressive-Sensing-Based Lightweight Encryption Scheme for the Internet of Things." IEEE Transactions on Mobile Computing. 20 (10), pp. 3049-3065. https://doi.org/10.1109/TMC.2020.2992737

Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation using Fewer Labelled Audio Data

Haque, Kazi Nazmul, Rana, Rajib, Liu, Jiajun, Hansen, John H. L., Cummins, Nicholas, Busso, Carlos and Schuller, Bjorn W.. 2021. "Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation using Fewer Labelled Audio Data." IEEE ACM Transactions on Audio, Speech, and Language Processing. 29, pp. 2575-2590. https://doi.org/10.1109/TASLP.2021.3098764

Development Data of Mood Inference Engine

Rana, Rajib. Development Data of Mood Inference Engine. Springfield. https://doi.org/10.26192/PV1J-E485

Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition

Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja and Schuller, Bjorn W.. 2020. "Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition." 21st Annual Conference of the International Speech Communication Association: Cognitive Intelligence for Speech Processing (INTERSPEECH 2020). Shanghai, China 25 - 29 Oct 2020 France. https://doi.org/10.21437/Interspeech.2020-3190

Augmenting Generative Adversarial Networks for Speech Emotion Recognition

Latif, Siddique, Asim, Muhammad, Rana, Rajib, Khalifa, Sara, Jurdak, Raja and Schuller, Bjorn W.. 2020. "Augmenting Generative Adversarial Networks for Speech Emotion Recognition." 21st Annual Conference of the International Speech Communication Association: Cognitive Intelligence for Speech Processing (INTERSPEECH 2020). Shanghai, China 25 - 29 Oct 2020 France. https://doi.org/10.21437/Interspeech.2020-3194

Evaluating the performance of BSBL methodology for EEG source localization on a realistic head model

Saha, Sajib, Rana, Rajib, Nesterets, Yakov, Tahtali, Murat, de Hoog, Frank and Gureyev, Timur. 2017. "Evaluating the performance of BSBL methodology for EEG source localization on a realistic head model." International Journal of Imaging Systems and Technology. 27 (1), pp. 46-56. https://doi.org/10.1002/ima.22209

Federated Learning for Speech Emotion Recognition Applications

Latif, Siddique, Khalifa, Sara, Rana, Rajib and Jurdak, Raja. 2020. "Federated Learning for Speech Emotion Recognition Applications." 19th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN 2020). Sydney, Australia 21 - 24 Apr 2020 United States. https://doi.org/10.1109/IPSN48710.2020.00-16

Direct modelling of speech emotion from raw speech

Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja and Epps, Julien. 2019. "Direct modelling of speech emotion from raw speech." 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language (INTERSPEECH 2019). Graz, Austria 15 - 19 Sep 2019 France. https://doi.org/10.21437/Interspeech.2019-3252

Variational Autoencoders to Learn Latent Representations of Speech Emotion

Latif, Siddique, Rana, Rajib, Qadir, Junaid and Epps, Julien. 2018. "Variational Autoencoders to Learn Latent Representations of Speech Emotion." 19th Annual Conference of the International Speech Communication Association: Speech Research for Emerging Markets in Multilingual Societies (INTERSPEECH 2018). Hyderabad, India 02 - 06 Sep 2018 France. https://doi.org/10.21437/Interspeech.2018-1568

Transfer learning for improving speech emotion classification accuracy

Latif, Siddique, Rana, Rajib, Younis, Shahzad, Qadir, Junaid and Epps, Julien. 2018. "Transfer learning for improving speech emotion classification accuracy." 19th Annual Conference of the International Speech Communication Association: Speech Research for Emerging Markets in Multilingual Societies (INTERSPEECH 2018). Hyderabad, India 02 - 06 Sep 2018 France. https://doi.org/10.21437/Interspeech.2018-1625

A novel framework for distress detection through an automated speech processing system

Rana, Rajib, Gururajan, Raj, Mackenzie, Geraldine, Dunn, Jeff, Gray, Anthony, Zhou, Xujuan, Barua, Prabal Datta, Epps, Julien and Humphris, Gerald Michael. 2018. "A novel framework for distress detection through an automated speech processing system." 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2018). Santiago, Chile 03 - 06 Dec 2018 Los Alamitos, CA, United States. https://doi.org/10.1109/WI.2018.00-29

Automated screening for distress: A perspective for the future

Rana, Rajib, Latif, Siddique, Gururajan, Raj, Gray, Anthony, Mackenzie, Geraldine, Humphris, Gerald and Dunn, Jeff. 2019. "Automated screening for distress: A perspective for the future." European Journal of Cancer Care. 28 (4). https://doi.org/10.1111/ecc.13033

Phonocardiographic sensing using deep learning for abnormal heartbeat detection

Latif, Siddique, Usman, Muhammad, Rana, Rajib and Qadir, Junaid. 2018. "Phonocardiographic sensing using deep learning for abnormal heartbeat detection." IEEE Sensors Journal. 18 (22), pp. 9393-9400. https://doi.org/10.1109/JSEN.2018.2870759

Kryptein: A Compressive-Sensing-Based Encryption Scheme for the Internet of Things

Xue, Wanli, Luo, Chengwen, Lan, Guohao, Rana, Rajib, Hu, Wen and Seneviratne, Aruna. 2017. "Kryptein: A Compressive-Sensing-Based Encryption Scheme for the Internet of Things." Zhang, Pei, Dutta, Prabal and Xing, Guoliang (ed.) 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN 2017). Pittsburgh, Pennsylvania, USA 18 - 21 Apr 2017 United States. https://doi.org/10.1145/3055031.3055079

Demo Abstract: CScrypt - A Compressive-Sensing-Based Encryption Engine for the Internet of Things

Xue, Wanli, Luo, Chengwen, Rana, Rajib, Hu, Wen and Seneviratne, Aruna. 2016. "Demo Abstract: CScrypt - A Compressive-Sensing-Based Encryption Engine for the Internet of Things." 14th ACM Conference on Embedded Network Sensor Systems (SenSys '16). Stanford, Calif, USA 14 - 16 Nov 2016 https://doi.org/10.1145/2994551.2996525

IEEE Access special section editorial: health informatics for the developing world

Qadir, Junaid, Mujeeb-U-Rahman, Muhammad, Rehmani, Mubashir Husain, Pathan, Al-Sakib Khan, Imran, Muhammad Ali, Hussain, Amir, Rana, Rajib and Luo, Bin. 2017. "IEEE Access special section editorial: health informatics for the developing world." IEEE Access. 5, pp. 27818-27823. https://doi.org/10.1109/ACCESS.2017.2783118

EEG source localization using a sparsity prior based on Brodmann areas

Saha, Sajib, Nesterets, Yakov, Rana, Rajib, Tahtali, Murat, de Hoog, Frank and Gureyev, Timur. 2017. "EEG source localization using a sparsity prior based on Brodmann areas." International Journal of Imaging Systems and Technology. 27 (4), pp. 333-344. https://doi.org/10.1002/ima.22236

Mobile health in the Developing World: review of literature and lessons from a case study

Latif, Siddique, Rana, Rajib, Qadir, Junaid, Ali, Anwaar, Imran, Muhammad Ali and Younis, Muhammad Shahzad. 2017. "Mobile health in the Developing World: review of literature and lessons from a case study." IEEE Access. 5, pp. 11540-11556. https://doi.org/10.1109/ACCESS.2017.2710800

Guiding Ebola patients to suitable health facilities: an SMS-based approach

Trad, Mohamad-Ali, Jurdak, Raja and Rana, Rajib. 2015. "Guiding Ebola patients to suitable health facilities: an SMS-based approach." F1000Research. 4 (43). https://doi.org/10.12688/f1000research.6105.1

Context-driven mood mining

Rana, Rajib. 2016. "Context-driven mood mining." 14th International Conference on Mobile Systems, Applications, and Services (MobiSys 2016). Singapore 25 - 30 Jun 2016 United States. https://doi.org/10.1145/2938559.2938601

Gait velocity estimation using time-interleaved between consecutive passive IR sensor activations

Rana, Rajib, Austin, Daniel, Jacobs, Peter G., Karunanithi, Mohanraj and Kaye, Jeffrey. 2016. "Gait velocity estimation using time-interleaved between consecutive passive IR sensor activations." IEEE Sensors Journal. 16 (16), pp. 6351-6358. https://doi.org/10.1109/JSEN.2016.2577708

wHealth - transforming telehealth services

Rana, Rajib, Hume, Margee, Reilly, John and Soar, Jeffrey. 2016. "wHealth - transforming telehealth services." EAI Endorsed Transactions on Scalable Information Systems. 3 (8). https://doi.org/10.4108/eai.9-8-2016.151635

Real-time classification via sparse representation in acoustic sensor networks

Wei, Bo, Yang, Mingrui, Shen, Yiran, Rana, Rajib, Chou, Chun Tung and Hu, Wen. 2013. "Real-time classification via sparse representation in acoustic sensor networks." 11th ACM Conference on Embedded Networked Sensor Systems (SenSys 2013). Rome, Italy 11 - 15 Nov 2013 United States. https://doi.org/10.1145/2517351.2517357

Feasibility analysis of using humidex as an indoor thermal comfort predictor

Rana, Rajib, Kusy, Brano, Jurdak, Raja, Wall, Josh and Hu, Wen. 2013. "Feasibility analysis of using humidex as an indoor thermal comfort predictor." Energy and Buildings. 64, pp. 17-25. https://doi.org/10.1016/j.enbuild.2013.04.019

Ear-phone: a context-aware noise mapping using smart phones

Rana, Rajib, Chou, Chun Tung, Bulusu, Nirupama, Kanhere, Salil and Hu, Wen. 2015. "Ear-phone: a context-aware noise mapping using smart phones." Pervasive and Mobile Computing. 17 (Part A), pp. 1-22. https://doi.org/10.1016/j.pmcj.2014.02.001

Opportunistic and context-aware affect sensing on smartphones: the concept, challenges and opportunities

Rana, Rajib, Hume, Margee, Reilly, John, Jurdak, Raja and Soar, Jeffrey. 2016. "Opportunistic and context-aware affect sensing on smartphones: the concept, challenges and opportunities." IEEE Pervasive Computing. 15 (2), pp. 60-69. https://doi.org/10.1109/MPRV.2016.36

SimpleTrack: adaptive trajectory compression with deterministic projection matrix for mobile sensor networks

Rana, Rajib, Yang, Mingrui, Wark, Tim, Chou, Chun Tung and Hu, Wen. 2015. "SimpleTrack: adaptive trajectory compression with deterministic projection matrix for mobile sensor networks." IEEE Sensors Journal. 15 (1), pp. 365-373. https://doi.org/10.1109/JSEN.2014.2335210

Optimal sampling strategy enabling energy-neutral operations at rechargeable wireless sensor networks

Rana, Rajib, Hu, Wen and Chou, Chun Tung. 2015. "Optimal sampling strategy enabling energy-neutral operations at rechargeable wireless sensor networks." IEEE Sensors Journal. 15 (1), pp. 201-208. https://doi.org/10.1109/JSEN.2014.2337334

Nonuniform compressive sensing for heterogeneous wireless sensor networks

Shen, Yiran, Hu, Wen, Rana, Rajib and Chou, Chun Tung. 2013. "Nonuniform compressive sensing for heterogeneous wireless sensor networks." IEEE Sensors Journal. 13 (6), pp. 2120-2128. https://doi.org/10.1109/JSEN.2013.2248253

Novel activity classification and occupancy estimation methods for intelligent HVAC (heating, ventilation and air conditioning) systems

Rana, Rajib, Kusy, Brano, Wall, Josh and Hu, Wen. 2015. "Novel activity classification and occupancy estimation methods for intelligent HVAC (heating, ventilation and air conditioning) systems." Energy. 93 (1), pp. 245-255. https://doi.org/10.1016/j.energy.2015.09.002

This item comprises data from the following sources: Scopus

High-fidelity audio generation and representation learning with guided adversarial autoencoder

Download files

Published Version

148

121

2

0

Export as

Related outputs