Variational Autoencoders to Learn Latent Representations of Speech Emotion
Paper
Paper/Presentation Title | Variational Autoencoders to Learn Latent Representations of Speech Emotion |
---|---|
Presentation Type | Paper |
Authors | Latif, Siddique (Author), Rana, Rajib (Author), Qadir, Junaid (Author) and Epps, Julien (Author) |
Journal or Proceedings Title | Proceedings of the 19th Annual Conference of the International Speech Communication Association (INTERSPEECH 2018) |
Number of Pages | 5 |
Year | 2018 |
Place of Publication | France |
Digital Object Identifier (DOI) | https://doi.org/10.21437/Interspeech.2018-1568 |
Web Address (URL) of Paper | https://www.isca-speech.org/archive/interspeech_2018/latif18_interspeech.html |
Conference/Event | 19th Annual Conference of the International Speech Communication Association: Speech Research for Emerging Markets in Multilingual Societies (INTERSPEECH 2018) |
Event Details | Rank A A A A A A A |
Event Details | 19th Annual Conference of the International Speech Communication Association: Speech Research for Emerging Markets in Multilingual Societies (INTERSPEECH 2018) Event Date 02 to end of 06 Sep 2018 Event Location Hyderabad, India |
Abstract | Learning the latent representation of data in unsupervised fashion is a very interesting process that provides relevant features for enhancing the performance of a classifier. For speech emotion recognition tasks, generating effective features is crucial. Currently, handcrafted features are mostly used for speech emotion recognition, however, features learned automatically using deep learning have shown strong success in many problems, especially in image processing. In particular, deep generative models such as Variational Autoencoders (VAEs) have gained enormous success in generating features for natural images. Inspired by this, we propose VAEs for deriving the latent representation of speech signals and use this representation to classify emotions. To the best of our knowledge, we are the first to propose VAEs for speech emotion classification. Evaluations on the IEMOCAP dataset demonstrate that features learned by VAEs can produce state-of-the-art results for speech emotion classification. |
Keywords | speech emotion classification, variational auto-encoders, deep learning, feature learning |
Contains Sensitive Content | Does not contain sensitive content |
ANZSRC Field of Research 2020 | 460212. Speech recognition |
461106. Semi- and unsupervised learning | |
461104. Neural networks | |
461103. Deep learning | |
Public Notes | Files associated with this item cannot be displayed due to copyright restrictions. |
Byline Affiliations | Information Technology University, Pakistan |
Institute for Resilient Regions | |
University of New South Wales | |
Institution of Origin | University of Southern Queensland |
https://research.usq.edu.au/item/q50z6/variational-autoencoders-to-learn-latent-representations-of-speech-emotion
239
total views14
total downloads3
views this month0
downloads this month