Deep learning-based single-ended quality prediction for time-scale modified audio
Article
Article Title | Deep learning-based single-ended quality prediction for time-scale modified audio |
---|---|
ERA Journal ID | 10027 |
Article Category | Article |
Authors | Roberts, Timothy, Nicolson, Aaron and Paliwal, Kuldip K. |
Journal Title | Journal of the Audio Engineering Society |
Journal Citation | 69 (9), pp. 644-655 |
Number of Pages | 12 |
Year | 2021 |
Publisher | Audio Engineering Society |
Place of Publication | United States |
ISSN | 0004-7554 |
1549-4950 | |
Digital Object Identifier (DOI) | https://doi.org/10.17743/jaes.2021.0031 |
Web Address (URL) | https://aes2.org/publications/elibrary-page/?id=21461 |
Abstract | Objective evaluation of audio processed with Time-Scale Modification (TSM) has recently seen improvement with a labeled time-scaled audio dataset used to train an objective measure. This double-ended measure was an extension of Perceptual Evaluation of Audio Quality and required reference and test signals. In this paper two single-ended objective quality measures for time-scaled audio are proposed that do not require a reference signal. Internal representations of spectrogram and speech features are learned by either a Convolutional Neural Network (CNN) or a Bidirectional Gated Recurrent Unit (BGRU) network and fed to a fully connected network to predict Subjective Mean Opinion Scores. The proposed CNN and BGRU measures respectively achieve average Root Mean Square Errors of 0.61 and 0.58 and mean Pearson Correlation Coefficients of 0.77 and 0.79 to the time-scaled audio dataset. The proposed measures are used to evaluate TSM algorithms and comparisons are provided for 15 TSM implementations. A link to implementations of the objective measures is provided. |
Contains Sensitive Content | Does not contain sensitive content |
ANZSRC Field of Research 2020 | 400607. Signal processing |
461104. Neural networks | |
Public Notes | Files associated with this item cannot be displayed due to copyright restrictions. |
Byline Affiliations | Griffith University |
Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia |
https://research.usq.edu.au/item/zz13y/deep-learning-based-single-ended-quality-prediction-for-time-scale-modified-audio
13
total views1
total downloads2
views this month0
downloads this month