A Novel Policy for Pre-trained Deep Reinforcement Learning for Speech Emotion Recognition

Paper


Rajapakshe, Thejan, Rana, Rajib, Khalifa, Sara, Liu, Jiajun and Schuller, Bjorn W. 2022. "A Novel Policy for Pre-trained Deep Reinforcement Learning for Speech Emotion Recognition." Abramson, David and Dinh, Minh Ngoc (ed.) 2022 Australasian Computer Science Week (ACSW 2022). Brisbane, Australia 14 - 17 Feb 2022 United States. https://doi.org/10.1145/3511616.3513104
Paper/Presentation Title

A Novel Policy for Pre-trained Deep Reinforcement Learning for Speech Emotion Recognition

Presentation TypePaper
AuthorsRajapakshe, Thejan (Author), Rana, Rajib (Author), Khalifa, Sara (Author), Liu, Jiajun (Author) and Schuller, Bjorn W (Author)
EditorsAbramson, David and Dinh, Minh Ngoc
Journal or Proceedings TitleProceedings of the 2022 Australasian Computer Science Week (ACSW 2022)
Number of Pages10
Year2022
Place of PublicationUnited States
ISBN9781450396066
Digital Object Identifier (DOI)https://doi.org/10.1145/3511616.3513104
Web Address (URL) of Paperhttps://dl.acm.org/doi/10.1145/3511616.3513104
Web Address (URL) of Conference Proceedingshttps://dl.acm.org/doi/proceedings/10.1145/3511616
Conference/Event2022 Australasian Computer Science Week (ACSW 2022)
Event Details
2022 Australasian Computer Science Week (ACSW 2022)
Event Date
14 to end of 17 Feb 2022
Event Location
Brisbane, Australia
Abstract

Deep Reinforcement Learning (deep RL) has gained tremendous success in gaming but it has rarely been explored for Speech Emotion Recognition (SER). In the RL literature, policy used by the RL agent plays a major role in action selection, however, there is no RL policy tailored for SER. Also, an extended learning period is a general challenge for deep RL, which can impact the speed of learning for SER. In this paper, we introduce a novel policy, the 'Zeta policy' tailored for SER and introduce pre-training in deep RL to achieve a faster learning rate. Pre-training with a cross dataset was also studied to discover the feasibility of pre-training the RL agent with a similar dataset in a scenario where real environmental data is not available. We use 'IEMOCAP' and 'SAVEE' datasets for the evaluation with the problem of recognising four emotions, namely happy, sad, angry, and neutral. The experimental results show that the proposed policy performs better than existing policies. Results also support that pre-training can reduce training time and is robust to a cross-corpus scenario.

KeywordsSpeech Emotion Recognition, Reinforcement Learning, Deep Learning, Deep Reinforcement Learning, Machine Learning
Contains Sensitive ContentDoes not contain sensitive content
ANZSRC Field of Research 2020460212. Speech recognition
461105. Reinforcement learning
461103. Deep learning
Public Notes

Files associated with this item cannot be displayed due to copyright restrictions.

Byline AffiliationsSchool of Mathematics, Physics and Computing
Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia
Imperial College London, United Kingdom
Institution of OriginUniversity of Southern Queensland
Permalink -

https://research.usq.edu.au/item/q7336/a-novel-policy-for-pre-trained-deep-reinforcement-learning-for-speech-emotion-recognition

  • 95
    total views
  • 6
    total downloads
  • 3
    views this month
  • 0
    downloads this month

Export as

Related outputs

emoDARTS: Joint Optimization of CNN and Sequential Neural Network Architectures for Superior Speech Emotion Recognition
Rajapakshe, Thejan, Rana, Rajib, Khalifa, Sara, Sisman, Berrak, Schuller, Björn W. and Busso, Carlos. 2024. "emoDARTS: Joint Optimization of CNN and Sequential Neural Network Architectures for Superior Speech Emotion Recognition." IEEE Access. 12, pp. 110492-110503. https://doi.org/10.1109/ACCESS.2024.3439604
HGSOXGB: Hunger-Games-Search-Optimization-Based Framework to Predict the Need for ICU Admission for COVID-19 Patients Using eXtreme Gradient Boosting
Pinki, Farhana Tazmim, Awal, Md Abdul, Mumenin, Khondoker Mirazu, Hossain, Md. Shahadat, Faysal, Jabed Al, Rana, Rajib, Almuqren, Rajib, Ksibi, Amel and Samad, Md Abdus. 2023. "HGSOXGB: Hunger-Games-Search-Optimization-Based Framework to Predict the Need for ICU Admission for COVID-19 Patients Using eXtreme Gradient Boosting." Mathematics. 11 (18). https://doi.org/10.3390/math11183960
Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering
Siriwardhana, Shamane, Weerasekera, Rivindu, Wen, Elliott, Kaluarachchi, Tharindu, Rana, Rajib and Nanayakkara, Suranga. 2023. "Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering." Transactions of the Association for Computational Linguistics. 11, pp. 1-17. https://doi.org/10.1162/tacl_a_00530
A novel employability embedding framework for three-year bachelor’s programs
Rana, Rajib, Galligan, Linda, Fard, Rouz and McCredie, Tessa. 2023. "A novel employability embedding framework for three-year bachelor’s programs." Journal of Teaching and Learning for Graduate Employability. 14 (1), pp. 104-118. https://doi.org/10.21153/jtlge2023vol14no1art1604
Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review
Hossain, Elias, Rana, Rajib, Higgins, Niall, Soar, Jeffrey, Barua, Prabal Datta, Pisani, Anthony R. and Turner, Kathryn. 2023. "Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review." Computers in Biology and Medicine. 155. https://doi.org/10.1016/j.compbiomed.2023.106649
Multitask Learning From Augmented Auxiliary Data for Improving Speech Emotion Recognition
Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja and Schuller, Bjorn W.. 2023. "Multitask Learning From Augmented Auxiliary Data for Improving Speech Emotion Recognition ." IEEE Transactions on Affective Computing. 14 (4), pp. 3164-3176. https://doi.org/10.1109/TAFFC.2022.3221749
Speech Synthesis with Mixed Emotions
Zhou, Kun, Sisman, Berrak, Rana, R., Schuller, Bjorn W. and Li, Haizhou. 2023. "Speech Synthesis with Mixed Emotions." IEEE Transactions on Affective Computing. 14 (4), pp. 3120-3134. https://doi.org/10.1109/TAFFC.2022.3233324
Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition
Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja and Schuller, Bjorn. 2023. "Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition." IEEE Transactions on Affective Computing. 14 (3), pp. 1912-1926. https://doi.org/10.1109/TAFFC.2022.3167013
Emotion Intensity and its Control for Emotional Voice Conversion
Zhou, Kun, Sisman, Berrak, Rana, Rajib, Schuller, Bjorn W. and Li, Haizhou. 2023. "Emotion Intensity and its Control for Emotional Voice Conversion." IEEE Transactions on Affective Computing. 14 (1), pp. 31-48. https://doi.org/10.1109/TAFFC.2022.3175578
SigRep: Towards Robust Wearable Emotion Recognition with Contrastive Representation Learning
Dissanayake, Vipula, Seneviratne, Sachith, Rana, Rajib, Wen, Elliot, Kaluarachchi, Tharindu and Nanayakkara, Suranga. 2022. "SigRep: Towards Robust Wearable Emotion Recognition with Contrastive Representation Learning." IEEE Access. 10, pp. 18105-18120. https://doi.org/10.1109/ACCESS.2022.3149509
Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition
Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja, Epps, Julien and Schuller, Bjorn W.. 2022. "Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition." IEEE Transactions on Affective Computing. 13 (2), pp. 992-1004. https://doi.org/10.1109/TAFFC.2020.2983669
N-doped silk wadding-derived carbon/SnOx@reduced graphene oxide film as an ultra-stable anode for sodium-ion half/full battery
Sun, Yu, Yang, Yanling, Shi, Xiao-Lei, Suo, Guoquan, Xue, Fan, Liu, Jiajun, Lu, Siyu and Chen, Zhi-Gang. 2021. "N-doped silk wadding-derived carbon/SnOx@reduced graphene oxide film as an ultra-stable anode for sodium-ion half/full battery." Chemical Engineering Journal. https://doi.org/10.1016/j.cej.2021.133675
Survey of Deep Representation Learning for Speech Emotion Recognition
Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja, Qadir, Junaid and Schuller, Bjorn. 2023. "Survey of Deep Representation Learning for Speech Emotion Recognition." IEEE Transactions on Affective Computing. 14 (2), pp. 1634-1654. https://doi.org/10.1109/TAFFC.2021.3114365
Towards a Compressive-Sensing-Based Lightweight Encryption Scheme for the Internet of Things
Xue, Wanli, Luo, Chengwen, Shen, Yiran, Rana, Rajib, Lan, Guohao, Jha, Sanjay, Seneviratne, Aruna and Hu, Wen. 2021. "Towards a Compressive-Sensing-Based Lightweight Encryption Scheme for the Internet of Things." IEEE Transactions on Mobile Computing. 20 (10), pp. 3049-3065. https://doi.org/10.1109/TMC.2020.2992737
Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation using Fewer Labelled Audio Data
Haque, Kazi Nazmul, Rana, Rajib, Liu, Jiajun, Hansen, John H. L., Cummins, Nicholas, Busso, Carlos and Schuller, Bjorn W.. 2021. "Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation using Fewer Labelled Audio Data." IEEE ACM Transactions on Audio, Speech, and Language Processing. 29, pp. 2575-2590. https://doi.org/10.1109/TASLP.2021.3098764
Development Data of Mood Inference Engine
Rana, Rajib. Development Data of Mood Inference Engine. Springfield. https://doi.org/10.26192/PV1J-E485
Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition
Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja and Schuller, Bjorn W.. 2020. "Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition." 21st Annual Conference of the International Speech Communication Association: Cognitive Intelligence for Speech Processing (INTERSPEECH 2020). Shanghai, China 25 - 29 Oct 2020 France. https://doi.org/10.21437/Interspeech.2020-3190
Augmenting Generative Adversarial Networks for Speech Emotion Recognition
Latif, Siddique, Asim, Muhammad, Rana, Rajib, Khalifa, Sara, Jurdak, Raja and Schuller, Bjorn W.. 2020. "Augmenting Generative Adversarial Networks for Speech Emotion Recognition." 21st Annual Conference of the International Speech Communication Association: Cognitive Intelligence for Speech Processing (INTERSPEECH 2020). Shanghai, China 25 - 29 Oct 2020 France. https://doi.org/10.21437/Interspeech.2020-3194
High-fidelity audio generation and representation learning with guided adversarial autoencoder
Haque, Kazi Nazmul, Rana, Rajib and Schuller, Bjorn W.. 2020. "High-fidelity audio generation and representation learning with guided adversarial autoencoder." IEEE Access. 8, pp. 223509-223528. https://doi.org/10.1109/ACCESS.2020.3040797
Evaluating the performance of BSBL methodology for EEG source localization on a realistic head model
Saha, Sajib, Rana, Rajib, Nesterets, Yakov, Tahtali, Murat, de Hoog, Frank and Gureyev, Timur. 2017. "Evaluating the performance of BSBL methodology for EEG source localization on a realistic head model." International Journal of Imaging Systems and Technology. 27 (1), pp. 46-56. https://doi.org/10.1002/ima.22209
Federated Learning for Speech Emotion Recognition Applications
Latif, Siddique, Khalifa, Sara, Rana, Rajib and Jurdak, Raja. 2020. "Federated Learning for Speech Emotion Recognition Applications." 19th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN 2020). Sydney, Australia 21 - 24 Apr 2020 United States. https://doi.org/10.1109/IPSN48710.2020.00-16
Direct modelling of speech emotion from raw speech
Latif, Siddique, Rana, Rajib, Khalifa, Sara, Jurdak, Raja and Epps, Julien. 2019. "Direct modelling of speech emotion from raw speech." 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language (INTERSPEECH 2019). Graz, Austria 15 - 19 Sep 2019 France. https://doi.org/10.21437/Interspeech.2019-3252
Variational Autoencoders to Learn Latent Representations of Speech Emotion
Latif, Siddique, Rana, Rajib, Qadir, Junaid and Epps, Julien. 2018. "Variational Autoencoders to Learn Latent Representations of Speech Emotion." 19th Annual Conference of the International Speech Communication Association: Speech Research for Emerging Markets in Multilingual Societies (INTERSPEECH 2018). Hyderabad, India 02 - 06 Sep 2018 France. https://doi.org/10.21437/Interspeech.2018-1568
Transfer learning for improving speech emotion classification accuracy
Latif, Siddique, Rana, Rajib, Younis, Shahzad, Qadir, Junaid and Epps, Julien. 2018. "Transfer learning for improving speech emotion classification accuracy." 19th Annual Conference of the International Speech Communication Association: Speech Research for Emerging Markets in Multilingual Societies (INTERSPEECH 2018). Hyderabad, India 02 - 06 Sep 2018 France. https://doi.org/10.21437/Interspeech.2018-1625
A novel framework for distress detection through an automated speech processing system
Rana, Rajib, Gururajan, Raj, Mackenzie, Geraldine, Dunn, Jeff, Gray, Anthony, Zhou, Xujuan, Barua, Prabal Datta, Epps, Julien and Humphris, Gerald Michael. 2018. "A novel framework for distress detection through an automated speech processing system." 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2018). Santiago, Chile 03 - 06 Dec 2018 Los Alamitos, CA, United States. https://doi.org/10.1109/WI.2018.00-29
Automated screening for distress: A perspective for the future
Rana, Rajib, Latif, Siddique, Gururajan, Raj, Gray, Anthony, Mackenzie, Geraldine, Humphris, Gerald and Dunn, Jeff. 2019. "Automated screening for distress: A perspective for the future." European Journal of Cancer Care. 28 (4). https://doi.org/10.1111/ecc.13033
Phonocardiographic sensing using deep learning for abnormal heartbeat detection
Latif, Siddique, Usman, Muhammad, Rana, Rajib and Qadir, Junaid. 2018. "Phonocardiographic sensing using deep learning for abnormal heartbeat detection." IEEE Sensors Journal. 18 (22), pp. 9393-9400. https://doi.org/10.1109/JSEN.2018.2870759
Kryptein: A Compressive-Sensing-Based Encryption Scheme for the Internet of Things
Xue, Wanli, Luo, Chengwen, Lan, Guohao, Rana, Rajib, Hu, Wen and Seneviratne, Aruna. 2017. "Kryptein: A Compressive-Sensing-Based Encryption Scheme for the Internet of Things." Zhang, Pei, Dutta, Prabal and Xing, Guoliang (ed.) 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN 2017). Pittsburgh, Pennsylvania, USA 18 - 21 Apr 2017 United States. https://doi.org/10.1145/3055031.3055079
Demo Abstract: CScrypt - A Compressive-Sensing-Based Encryption Engine for the Internet of Things
Xue, Wanli, Luo, Chengwen, Rana, Rajib, Hu, Wen and Seneviratne, Aruna. 2016. "Demo Abstract: CScrypt - A Compressive-Sensing-Based Encryption Engine for the Internet of Things." 14th ACM Conference on Embedded Network Sensor Systems (SenSys '16). Stanford, Calif, USA 14 - 16 Nov 2016 https://doi.org/10.1145/2994551.2996525
IEEE Access special section editorial: health informatics for the developing world
Qadir, Junaid, Mujeeb-U-Rahman, Muhammad, Rehmani, Mubashir Husain, Pathan, Al-Sakib Khan, Imran, Muhammad Ali, Hussain, Amir, Rana, Rajib and Luo, Bin. 2017. "IEEE Access special section editorial: health informatics for the developing world." IEEE Access. 5, pp. 27818-27823. https://doi.org/10.1109/ACCESS.2017.2783118
EEG source localization using a sparsity prior based on Brodmann areas
Saha, Sajib, Nesterets, Yakov, Rana, Rajib, Tahtali, Murat, de Hoog, Frank and Gureyev, Timur. 2017. "EEG source localization using a sparsity prior based on Brodmann areas." International Journal of Imaging Systems and Technology. 27 (4), pp. 333-344. https://doi.org/10.1002/ima.22236
Mobile health in the Developing World: review of literature and lessons from a case study
Latif, Siddique, Rana, Rajib, Qadir, Junaid, Ali, Anwaar, Imran, Muhammad Ali and Younis, Muhammad Shahzad. 2017. "Mobile health in the Developing World: review of literature and lessons from a case study." IEEE Access. 5, pp. 11540-11556. https://doi.org/10.1109/ACCESS.2017.2710800
Guiding Ebola patients to suitable health facilities: an SMS-based approach
Trad, Mohamad-Ali, Jurdak, Raja and Rana, Rajib. 2015. "Guiding Ebola patients to suitable health facilities: an SMS-based approach." F1000Research. 4 (43). https://doi.org/10.12688/f1000research.6105.1
Context-driven mood mining
Rana, Rajib. 2016. "Context-driven mood mining." 14th International Conference on Mobile Systems, Applications, and Services (MobiSys 2016). Singapore 25 - 30 Jun 2016 United States. https://doi.org/10.1145/2938559.2938601
Gait velocity estimation using time-interleaved between consecutive passive IR sensor activations
Rana, Rajib, Austin, Daniel, Jacobs, Peter G., Karunanithi, Mohanraj and Kaye, Jeffrey. 2016. "Gait velocity estimation using time-interleaved between consecutive passive IR sensor activations." IEEE Sensors Journal. 16 (16), pp. 6351-6358. https://doi.org/10.1109/JSEN.2016.2577708
wHealth - transforming telehealth services
Rana, Rajib, Hume, Margee, Reilly, John and Soar, Jeffrey. 2016. "wHealth - transforming telehealth services." EAI Endorsed Transactions on Scalable Information Systems. 3 (8). https://doi.org/10.4108/eai.9-8-2016.151635
Real-time classification via sparse representation in acoustic sensor networks
Wei, Bo, Yang, Mingrui, Shen, Yiran, Rana, Rajib, Chou, Chun Tung and Hu, Wen. 2013. "Real-time classification via sparse representation in acoustic sensor networks." 11th ACM Conference on Embedded Networked Sensor Systems (SenSys 2013). Rome, Italy 11 - 15 Nov 2013 United States. https://doi.org/10.1145/2517351.2517357
Feasibility analysis of using humidex as an indoor thermal comfort predictor
Rana, Rajib, Kusy, Brano, Jurdak, Raja, Wall, Josh and Hu, Wen. 2013. "Feasibility analysis of using humidex as an indoor thermal comfort predictor." Energy and Buildings. 64, pp. 17-25. https://doi.org/10.1016/j.enbuild.2013.04.019
Ear-phone: a context-aware noise mapping using smart phones
Rana, Rajib, Chou, Chun Tung, Bulusu, Nirupama, Kanhere, Salil and Hu, Wen. 2015. "Ear-phone: a context-aware noise mapping using smart phones." Pervasive and Mobile Computing. 17 (Part A), pp. 1-22. https://doi.org/10.1016/j.pmcj.2014.02.001
Opportunistic and context-aware affect sensing on smartphones: the concept, challenges and opportunities
Rana, Rajib, Hume, Margee, Reilly, John, Jurdak, Raja and Soar, Jeffrey. 2016. "Opportunistic and context-aware affect sensing on smartphones: the concept, challenges and opportunities." IEEE Pervasive Computing. 15 (2), pp. 60-69. https://doi.org/10.1109/MPRV.2016.36
SimpleTrack: adaptive trajectory compression with deterministic projection matrix for mobile sensor networks
Rana, Rajib, Yang, Mingrui, Wark, Tim, Chou, Chun Tung and Hu, Wen. 2015. "SimpleTrack: adaptive trajectory compression with deterministic projection matrix for mobile sensor networks." IEEE Sensors Journal. 15 (1), pp. 365-373. https://doi.org/10.1109/JSEN.2014.2335210
Optimal sampling strategy enabling energy-neutral operations at rechargeable wireless sensor networks
Rana, Rajib, Hu, Wen and Chou, Chun Tung. 2015. "Optimal sampling strategy enabling energy-neutral operations at rechargeable wireless sensor networks." IEEE Sensors Journal. 15 (1), pp. 201-208. https://doi.org/10.1109/JSEN.2014.2337334
Nonuniform compressive sensing for heterogeneous wireless sensor networks
Shen, Yiran, Hu, Wen, Rana, Rajib and Chou, Chun Tung. 2013. "Nonuniform compressive sensing for heterogeneous wireless sensor networks." IEEE Sensors Journal. 13 (6), pp. 2120-2128. https://doi.org/10.1109/JSEN.2013.2248253
Novel activity classification and occupancy estimation methods for intelligent HVAC (heating, ventilation and air conditioning) systems
Rana, Rajib, Kusy, Brano, Wall, Josh and Hu, Wen. 2015. "Novel activity classification and occupancy estimation methods for intelligent HVAC (heating, ventilation and air conditioning) systems." Energy. 93 (1), pp. 245-255. https://doi.org/10.1016/j.energy.2015.09.002