Deep learning-based single-ended quality prediction for time-scale modified audio

Article


Roberts, Timothy, Nicolson, Aaron and Paliwal, Kuldip K.. 2021. "Deep learning-based single-ended quality prediction for time-scale modified audio." Journal of the Audio Engineering Society. 69 (9), pp. 644-655. https://doi.org/10.17743/jaes.2021.0031
Article Title

Deep learning-based single-ended quality prediction for time-scale modified audio

ERA Journal ID10027
Article CategoryArticle
AuthorsRoberts, Timothy, Nicolson, Aaron and Paliwal, Kuldip K.
Journal TitleJournal of the Audio Engineering Society
Journal Citation69 (9), pp. 644-655
Number of Pages12
Year2021
PublisherAudio Engineering Society
Place of PublicationUnited States
ISSN0004-7554
1549-4950
Digital Object Identifier (DOI)https://doi.org/10.17743/jaes.2021.0031
Web Address (URL)https://aes2.org/publications/elibrary-page/?id=21461
Abstract

Objective evaluation of audio processed with Time-Scale Modification (TSM) has recently seen improvement with a labeled time-scaled audio dataset used to train an objective measure. This double-ended measure was an extension of Perceptual Evaluation of Audio Quality and required reference and test signals. In this paper two single-ended objective quality measures for time-scaled audio are proposed that do not require a reference signal. Internal representations of spectrogram and speech features are learned by either a Convolutional Neural Network (CNN) or a Bidirectional Gated Recurrent Unit (BGRU) network and fed to a fully connected network to predict Subjective Mean Opinion Scores. The proposed CNN and BGRU measures respectively achieve average Root Mean Square Errors of 0.61 and 0.58 and mean Pearson Correlation Coefficients of 0.77 and 0.79 to the time-scaled audio dataset. The proposed measures are used to evaluate TSM algorithms and comparisons are provided for 15 TSM implementations. A link to implementations of the objective measures is provided.

Contains Sensitive ContentDoes not contain sensitive content
ANZSRC Field of Research 2020400607. Signal processing
461104. Neural networks
Public Notes

Files associated with this item cannot be displayed due to copyright restrictions.

Byline AffiliationsGriffith University
Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia
Permalink -

https://research.usq.edu.au/item/zz13y/deep-learning-based-single-ended-quality-prediction-for-time-scale-modified-audio

  • 1
    total views
  • 1
    total downloads
  • 1
    views this month
  • 1
    downloads this month

Export as

Related outputs

Design of Objective Quality Measures for Time-Scale Modification of Audio
Roberts, Timothy. 2021. Design of Objective Quality Measures for Time-Scale Modification of Audio. PhD Thesis Doctor of Philosophy. Griffith University. https://doi.org/10.25904/1912/4070