Deep Representation Learning for Speech Emotion Recognition

PhD by Publication

Latif, Siddique. 2022. Deep Representation Learning for Speech Emotion Recognition. PhD by Publication Doctor of Philosophy (DPHD). University of Southern Queensland. https://doi.org/10.26192/w8w00

Supervisor
Title	Deep Representation Learning for Speech Emotion Recognition
Type	PhD by Publication
Authors	Latif, Siddique
1. First	Prof Rajib Rana
2. Second	Jiabao Zhang
3. Third	Bjorn W. Schuller
3. Third	Sara Khalifa
Institution of Origin	University of Southern Queensland
Qualification Name	Doctor of Philosophy (DPHD)
Number of Pages	104
Year	2022
Publisher	University of Southern Queensland
Place of Publication	Australia
Digital Object Identifier (DOI)	https://doi.org/10.26192/w8w00
Abstract	The success of machine learning (ML) algorithms generally depends on the quality of data representation or features. Good representations of the data make it easier to develop machine learning predictors or even deep learning (DL) classifiers. In speech emotion recognition (SER) research, the emotion classifiers heavily depend on hand-engineered acoustic features, which are typically crafted with human domain knowledge. Automatic emotional representation learning from the speech is a challenging task because speech contains different attributes of the speaker (i.e., gender, age, emotion, etc.) along with the linguistic message. Recent advancements in DL have fuelled the area of deep representation learning from speech. The prime goal of deep representation learning is to learn the complex relationships from input data, usually through the nonlinear transformations. Research on deep representation learning has significantly evolved, however, very few studies have investigated emotional representation learning from speech using advanced DL techniques. In this thesis, I explore different deep representation learning techniques for SER to improve the performance and generalisation of the systems. I broadly solve two major problems: (1) how deep representation learning can be utilised to improve the performance of SER by utilising the unlabelled, synthetic, and augmented data; (2) how deep representation learning can be applied to design generalised and robust SER systems. To address these problems, I propose different deep representation learning techniques to learn from unlabelled, synthetic, and augmented data to improve the performance and generalisation of SER systems. I found that injecting the additional unlabelled, augmented, and synthetic data in SER systems help improve the performance of SER systems. I also show that adversarial self-supervised learning can improve cross-language SER and deeper architectures learn robust generalised representation for SER in noisy conditions.
Keywords	deep representation learning; multi-task learning; semi-supervised learning; self-supervised learning; adversarialmachine learning; speech emotion recognition
Related Output
Has part	Survey of Deep Representation Learning for Speech Emotion Recognition
Has part	Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition
Has part	Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition
Has part	Augmenting Generative Adversarial Networks for Speech Emotion Recognition
Has part	Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition
Has part	Multitask Learning From Augmented Auxiliary Data for Improving Speech Emotion Recognition
Contains Sensitive Content	Does not contain sensitive content
ANZSRC Field of Research 2020	461103. Deep learning
	461106. Semi- and unsupervised learning
	461104. Neural networks
	460208. Natural language processing
	461101. Adversarial machine learning
	461102. Context learning
	461104. Neural networks
Public Notes	File reproduced in accordance with the copyright policy of the publisher/author.
Byline Affiliations	School of Mathematics, Physics and Computing

Permalink -

https://research.usq.edu.au/item/w8w00/deep-representation-learning-for-speech-emotion-recognition

Download files

Published Version

	Siddique Latif - Thesis_Redacted.pdf
License: CC BY-NC-ND 4.0
File access level: Anyone

204
total views
141
total downloads
4
views this month
5
downloads this month

Export as

Related outputs

Medicine's New Rhythm: Harnessing Acoustic Sensing via the Internet of Audio Things for Healthcare

Pervez, Farrukh, Shoukat, Moazzam, Suresh, Varsha, Farooq, Muhammad Umar Bin, Sandhu, Moid, Qayyum, Adnan, Usama, Muhammad, Girardi, Adnan, Latif, Siddique and Qadir, Junaid. 2024. "Medicine's New Rhythm: Harnessing Acoustic Sensing via the Internet of Audio Things for Healthcare." IEEE Open Journal of the Computer Society. 5, pp. 491-510. https://doi.org/10.1109/OJCS.2024.3462812

SSMD-UNet: semi-supervised multi-task decoders network for diabetic retinopathy segmentation

Ullah, Zahid, Akram, Muhammad, Latif, Siddique, Khan, Asifullah and Gwak, Jeonghwan. 2023. "SSMD-UNet: semi-supervised multi-task decoders network for diabetic retinopathy segmentation." Scientific Reports. 13 (1). https://doi.org/10.1038/s41598-023-36311-0

Densely attention mechanism based network for COVID-19 detection in chest X-rays

Ullah, Zahid, Usman, Muhammad, Latif, Siddique and Gwak, Jeonghwan. 2023. "Densely attention mechanism based network for COVID-19 detection in chest X-rays." Scientific Reports. 13 (1). https://doi.org/10.1038/s41598-022-27266-9

Selective Deeply Supervised Multi-Scale Attention Network for Brain Tumor Segmentation

Rehman, Azka, Usman, Muhammad, Shahid, Abdullah, Latif, Siddique and Qadir, Junaid. 2023. "Selective Deeply Supervised Multi-Scale Attention Network for Brain Tumor Segmentation." Sensors. 23 (4). https://doi.org/10.3390/s23042346

Multitask Learning From Augmented Auxiliary Data for Improving Speech Emotion Recognition

Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja and Schuller, Bjorn W.. 2023. "Multitask Learning From Augmented Auxiliary Data for Improving Speech Emotion Recognition ." IEEE Transactions on Affective Computing. 14 (4), pp. 3164-3176. https://doi.org/10.1109/TAFFC.2022.3221749

Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition

Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja and Schuller, Bjorn. 2023. "Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition." IEEE Transactions on Affective Computing. 14 (3), pp. 1912-1926. https://doi.org/10.1109/TAFFC.2022.3167013

A survey on deep reinforcement learning for audio‑based applications

Latif, Siddique, Cuayahuitl, Heriberto, Pervez, Farrukh, Shamshad, Fahad, Ali, Hafiz Shehbaz and Cambria, Erik. 2023. "A survey on deep reinforcement learning for audio‑based applications." Artificial Intelligence Review: an international survey and tutorial journal. 56 (3), p. 2193–2240. https://doi.org/10.1007/s10462-022-10224-2

Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition

Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja, Epps, Julien and Schuller, Bjorn W.. 2022. "Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition." IEEE Transactions on Affective Computing. 13 (2), pp. 992-1004. https://doi.org/10.1109/TAFFC.2020.2983669

Privacy Enhanced Speech Emotion Communication using Deep Learning Aided Edge Computing

Ali, Hafiz Shehbaz, Hassan, Fakhar ul, Latif, Siddique, Manzoor, Habib Ullah and Qadir, Junaid. 2021. "Privacy Enhanced Speech Emotion Communication using Deep Learning Aided Edge Computing." IEEE International Conference on Communications Workshops (2021). Montreal, Canada 14 - 23 Jun 2021 United States. https://doi.org/10.1109/ICCWorkshops50388.2021.9473669

Controlling Prosody in End-to-End TTS: A Case Study on Contrastive Focus Generation

Latif, Siddique, Kim, Inyoung, Calapodescu, Ioan and Besacier, Laurent. 2021. "Controlling Prosody in End-to-End TTS: A Case Study on Contrastive Focus Generation." 25th Conference on Computational Natural Language Learning (CoNLL 2021). Punta Cana, Dominican Republic 10 - 11 Nov 2021 Stroudsburg, Pennsylvania. https://doi.org/10.18653/v1/2021.conll-1.42

Survey of Deep Representation Learning for Speech Emotion Recognition

Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja, Qadir, Junaid and Schuller, Bjorn. 2023. "Survey of Deep Representation Learning for Speech Emotion Recognition." IEEE Transactions on Affective Computing. 14 (2), pp. 1634-1654. https://doi.org/10.1109/TAFFC.2021.3114365

Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition

Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja and Schuller, Bjorn W.. 2020. "Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition." 21st Annual Conference of the International Speech Communication Association: Cognitive Intelligence for Speech Processing (INTERSPEECH 2020). Shanghai, China 25 - 29 Oct 2020 France. https://doi.org/10.21437/Interspeech.2020-3190

Augmenting Generative Adversarial Networks for Speech Emotion Recognition

Latif, Siddique, Asim, Muhammad, Rana, Rajib, Khalifa, Sara, Jurdak, Raja and Schuller, Bjorn W.. 2020. "Augmenting Generative Adversarial Networks for Speech Emotion Recognition." 21st Annual Conference of the International Speech Communication Association: Cognitive Intelligence for Speech Processing (INTERSPEECH 2020). Shanghai, China 25 - 29 Oct 2020 France. https://doi.org/10.21437/Interspeech.2020-3194

Federated Learning for Speech Emotion Recognition Applications

Latif, Siddique, Khalifa, Sara, Rana, Rajib and Jurdak, Raja. 2020. "Federated Learning for Speech Emotion Recognition Applications." 19th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN 2020). Sydney, Australia 21 - 24 Apr 2020 United States. https://doi.org/10.1109/IPSN48710.2020.00-16

Direct modelling of speech emotion from raw speech

Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja and Epps, Julien. 2019. "Direct modelling of speech emotion from raw speech." 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language (INTERSPEECH 2019). Graz, Austria 15 - 19 Sep 2019 France. https://doi.org/10.21437/Interspeech.2019-3252

Variational Autoencoders to Learn Latent Representations of Speech Emotion

Latif, Siddique, Rana, Rajib, Qadir, Junaid and Epps, Julien. 2018. "Variational Autoencoders to Learn Latent Representations of Speech Emotion." 19th Annual Conference of the International Speech Communication Association: Speech Research for Emerging Markets in Multilingual Societies (INTERSPEECH 2018). Hyderabad, India 02 - 06 Sep 2018 France. https://doi.org/10.21437/Interspeech.2018-1568

Transfer learning for improving speech emotion classification accuracy

Latif, Siddique, Rana, Rajib, Younis, Shahzad, Qadir, Junaid and Epps, Julien. 2018. "Transfer learning for improving speech emotion classification accuracy." 19th Annual Conference of the International Speech Communication Association: Speech Research for Emerging Markets in Multilingual Societies (INTERSPEECH 2018). Hyderabad, India 02 - 06 Sep 2018 France. https://doi.org/10.21437/Interspeech.2018-1625

Automated screening for distress: A perspective for the future

Rana, Rajib, Latif, Siddique, Gururajan, Raj, Gray, Anthony, Mackenzie, Geraldine, Humphris, Gerald and Dunn, Jeff. 2019. "Automated screening for distress: A perspective for the future." European Journal of Cancer Care. 28 (4). https://doi.org/10.1111/ecc.13033

Phonocardiographic sensing using deep learning for abnormal heartbeat detection

Latif, Siddique, Usman, Muhammad, Rana, Rajib and Qadir, Junaid. 2018. "Phonocardiographic sensing using deep learning for abnormal heartbeat detection." IEEE Sensors Journal. 18 (22), pp. 9393-9400. https://doi.org/10.1109/JSEN.2018.2870759

Mobile health in the Developing World: review of literature and lessons from a case study

Latif, Siddique, Rana, Rajib, Qadir, Junaid, Ali, Anwaar, Imran, Muhammad Ali and Younis, Muhammad Shahzad. 2017. "Mobile health in the Developing World: review of literature and lessons from a case study." IEEE Access. 5, pp. 11540-11556. https://doi.org/10.1109/ACCESS.2017.2710800

Deep Representation Learning for Speech Emotion Recognition

Download files

Published Version

204

141

4

5

Export as

Related outputs