Emotion Intensity and its Control for Emotional Voice Conversion
Article
Article Title | Emotion Intensity and its Control for Emotional Voice Conversion |
---|---|
ERA Journal ID | 200608 |
Article Category | Article |
Authors | Zhou, Kun (Author), Sisman, Berrak (Author), Rana, Rajib (Author), Schuller, Bjorn W. (Author) and Li, Haizhou (Author) |
Journal Title | IEEE Transactions on Affective Computing |
Journal Citation | 14 (1), pp. 31-48 |
Number of Pages | 18 |
Year | 2023 |
Publisher | IEEE (Institute of Electrical and Electronics Engineers) |
Place of Publication | United States |
ISSN | 1949-3045 |
Digital Object Identifier (DOI) | https://doi.org/10.1109/TAFFC.2022.3175578 |
Web Address (URL) | https://ieeexplore.ieee.org/document/9778970 |
Abstract | Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity. In EVC, emotions are usually treated as discrete categories overlooking the fact that speech also conveys emotions with various intensity levels that the listener can perceive. In this work, we aim to explicitly characterize and control the intensity of emotion. We propose to disentangle the speaker style from linguistic content and encode the speaker style into a style embedding in a continuous space that forms the prototype of emotion embedding. We further learn the actual emotion encoder from an emotion-labelled database and study the use of relative attributes to represent fine-grained emotion intensity. To ensure emotional intelligibility, we incorporate emotion classification loss and emotion embedding similarity loss into the training of the EVC network. As desired, the proposed network controls the fine-grained emotion intensity in the output speech. Through both objective and subjective evaluations, we validate the effectiveness of the proposed network for emotional expressiveness and emotion intensity control. |
Keywords | Emotional voice conversion, emotion intensity, sequence-to-sequence, perceptual loss, limited data, relative attribute |
Contains Sensitive Content | Does not contain sensitive content |
ANZSRC Field of Research 2020 | 460211. Speech production |
461103. Deep learning | |
Byline Affiliations | National University of Singapore |
Singapore University of Technology and Design | |
School of Mathematics, Physics and Computing | |
Imperial College London, United Kingdom | |
Institution of Origin | University of Southern Queensland |
https://research.usq.edu.au/item/q757x/emotion-intensity-and-its-control-for-emotional-voice-conversion
Download files
Published Version
Emotion_Intensity_and_its_Control_for_Emotional_Voice_Conversion.pdf | ||
License: CC BY 4.0 | ||
File access level: Anyone |
102
total views123
total downloads3
views this month1
downloads this month