emoDARTS: Joint Optimization of CNN and Sequential Neural Network Architectures for Superior Speech Emotion Recognition
Article
Rajapakshe, Thejan, Rana, Rajib, Khalifa, Sara, Sisman, Berrak, Schuller, Björn W. and Busso, Carlos. 2024. "emoDARTS: Joint Optimization of CNN and Sequential Neural Network Architectures for Superior Speech Emotion Recognition." IEEE Access. 12, pp. 110492-110503. https://doi.org/10.1109/ACCESS.2024.3439604
Article Title | emoDARTS: Joint Optimization of CNN and Sequential Neural Network Architectures for Superior Speech Emotion Recognition |
---|---|
ERA Journal ID | 210567 |
Article Category | Article |
Authors | Rajapakshe, Thejan, Rana, Rajib, Khalifa, Sara, Sisman, Berrak, Schuller, Björn W. and Busso, Carlos |
Journal Title | IEEE Access |
Journal Citation | 12, pp. 110492-110503 |
Number of Pages | 12 |
Year | 2024 |
Publisher | IEEE (Institute of Electrical and Electronics Engineers) |
Place of Publication | United States |
ISSN | 2169-3536 |
Digital Object Identifier (DOI) | https://doi.org/10.1109/ACCESS.2024.3439604 |
Web Address (URL) | https://ieeexplore.ieee.org/document/10623665 |
Abstract | Speech Emotion Recognition (SER) is crucial for enabling computers to understand the emotions conveyed in human communication. With recent advancements in Deep Learning (DL), the performance of SER models has significantly improved. However, designing an optimal DL architecture requires specialised knowledge and experimental assessments. Fortunately, Neural Architecture Search (NAS) provides a potential solution for automatically determining the best DL model. The Differentiable Architecture Search (DARTS) is a particularly efficient method for discovering optimal models. This study presents emoDARTS, a DARTS-optimised joint CNN and Sequential Neural Network (SeqNN: LSTM, RNN) architecture that enhances SER performance. The literature supports the selection of CNN and LSTM coupling to improve performance. While DARTS has previously been used to choose CNN and LSTM operations independently, our technique adds a novel mechanism for selecting CNN and SeqNN operations in conjunction using DARTS. Unlike earlier work, we do not impose limits on the layer order of the CNN. Instead, we let DARTS choose the best layer order inside the DARTS cell. We demonstrate that emoDARTS outperforms conventionally designed CNN-LSTM models and surpasses the best-reported SER results achieved through DARTS on CNN-LSTM by evaluating our approach on the IEMOCAP, MSP-IMPROV, and MSP-Podcast datasets. |
Keywords | DARTS; Speech emotion recognition; neural architecture search; deep learning |
ANZSRC Field of Research 2020 | 461103. Deep learning |
Byline Affiliations | School of Mathematics, Physics and Computing |
Queensland University of Technology | |
University of Texas at Dallas, United States | |
University of Augsburg, Germany |
Permalink -
https://research.usq.edu.au/item/z9q1v/emodarts-joint-optimization-of-cnn-and-sequential-neural-network-architectures-for-superior-speech-emotion-recognition
Download files
Published Version
emoDARTS_Joint_Optimization_of_CNN_and_Sequential_Neural_Network_Architectures_for_Superior_Speech_Emotion_Recognition.pdf | ||
License: CC BY 4.0 | ||
File access level: Anyone |
12
total views5
total downloads9
views this month5
downloads this month