Deep Representation Learning for Speech Emotion Recognition
PhD by Publication
Title | Deep Representation Learning for Speech Emotion Recognition |
---|---|
Type | PhD by Publication |
Authors | Latif, Siddique |
Supervisor | |
1. First | Prof Rajib Rana |
2. Second | Jiabao Zhang |
3. Third | Bjorn W. Schuller |
3. Third | Sara Khalifa |
Institution of Origin | University of Southern Queensland |
Qualification Name | Doctor of Philosophy (DPHD) |
Number of Pages | 104 |
Year | 2022 |
Publisher | University of Southern Queensland |
Place of Publication | Australia |
Digital Object Identifier (DOI) | https://doi.org/10.26192/w8w00 |
Abstract | The success of machine learning (ML) algorithms generally depends on the quality of data representation or features. Good representations of the data make it easier to develop machine learning predictors or even deep learning (DL) classifiers. In speech emotion recognition (SER) research, the emotion classifiers heavily depend on hand-engineered acoustic features, which are typically crafted with human domain knowledge. Automatic emotional representation learning from the speech is a challenging task because speech contains different attributes of the speaker (i.e., gender, age, emotion, etc.) along with the linguistic message. Recent advancements in DL have fuelled the area of deep representation learning from speech. The prime goal of deep representation learning is to learn the complex relationships from input data, usually through the nonlinear transformations. Research on deep representation learning has significantly evolved, however, very few studies have investigated emotional representation learning from speech using advanced DL techniques. In this thesis, I explore different deep representation learning techniques for SER to improve the performance and generalisation of the systems. I broadly solve two major problems: (1) how deep representation learning can be utilised to improve the performance of SER by utilising the unlabelled, synthetic, and augmented data; (2) how deep representation learning can be applied to design generalised and robust SER systems. To address these problems, I propose different deep representation learning techniques to learn from unlabelled, synthetic, and augmented data to improve the performance and generalisation of SER systems. I found that injecting the additional unlabelled, augmented, and synthetic data in SER systems help improve the performance of SER systems. I also show that adversarial self-supervised learning can improve cross-language SER and deeper architectures learn robust generalised representation for SER in noisy conditions. |
Keywords | deep representation learning; multi-task learning; semi-supervised learning; self-supervised learning; adversarialmachine learning; speech emotion recognition |
Related Output | |
Has part | Survey of Deep Representation Learning for Speech Emotion Recognition |
Has part | Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition |
Has part | Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition |
Has part | Augmenting Generative Adversarial Networks for Speech Emotion Recognition |
Has part | Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition |
Has part | Multitask Learning From Augmented Auxiliary Data for Improving Speech Emotion Recognition |
Contains Sensitive Content | Does not contain sensitive content |
ANZSRC Field of Research 2020 | 461103. Deep learning |
461106. Semi- and unsupervised learning | |
461104. Neural networks | |
460208. Natural language processing | |
461101. Adversarial machine learning | |
461102. Context learning | |
461104. Neural networks | |
Public Notes | File reproduced in accordance with the copyright policy of the publisher/author. |
Byline Affiliations | School of Mathematics, Physics and Computing |
https://research.usq.edu.au/item/w8w00/deep-representation-learning-for-speech-emotion-recognition
Download files
Published Version
Siddique Latif - Thesis_Redacted.pdf | ||
License: CC BY-NC-ND 4.0 | ||
File access level: Anyone |
156
total views78
total downloads5
views this month2
downloads this month