Deep learning-based single-ended quality prediction for time-scale modified audio

Article


Roberts, Timothy, Nicolson, Aaron and Paliwal, Kuldip K.. 2021. "Deep learning-based single-ended quality prediction for time-scale modified audio." Journal of the Audio Engineering Society. 69 (9), pp. 644-655. https://doi.org/10.17743/jaes.2021.0031
Article Title

Deep learning-based single-ended quality prediction for time-scale modified audio

ERA Journal ID10027
Article CategoryArticle
AuthorsRoberts, Timothy, Nicolson, Aaron and Paliwal, Kuldip K.
Journal TitleJournal of the Audio Engineering Society
Journal Citation69 (9), pp. 644-655
Number of Pages12
Year2021
PublisherAudio Engineering Society
Place of PublicationUnited States
ISSN0004-7554
1549-4950
Digital Object Identifier (DOI)https://doi.org/10.17743/jaes.2021.0031
Web Address (URL)https://aes2.org/publications/elibrary-page/?id=21461
Abstract

Objective evaluation of audio processed with Time-Scale Modification (TSM) has recently seen improvement with a labeled time-scaled audio dataset used to train an objective measure. This double-ended measure was an extension of Perceptual Evaluation of Audio Quality and required reference and test signals. In this paper two single-ended objective quality measures for time-scaled audio are proposed that do not require a reference signal. Internal representations of spectrogram and speech features are learned by either a Convolutional Neural Network (CNN) or a Bidirectional Gated Recurrent Unit (BGRU) network and fed to a fully connected network to predict Subjective Mean Opinion Scores. The proposed CNN and BGRU measures respectively achieve average Root Mean Square Errors of 0.61 and 0.58 and mean Pearson Correlation Coefficients of 0.77 and 0.79 to the time-scaled audio dataset. The proposed measures are used to evaluate TSM algorithms and comparisons are provided for 15 TSM implementations. A link to implementations of the objective measures is provided.

Contains Sensitive ContentDoes not contain sensitive content
ANZSRC Field of Research 2020400607. Signal processing
461104. Neural networks
Public Notes

Files associated with this item cannot be displayed due to copyright restrictions.

Byline AffiliationsGriffith University
Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia
Permalink -

https://research.usq.edu.au/item/zz13y/deep-learning-based-single-ended-quality-prediction-for-time-scale-modified-audio

  • 13
    total views
  • 1
    total downloads
  • 2
    views this month
  • 0
    downloads this month

Export as

Related outputs

Underground LoRa sensor node for bushfire monitoring
Herring, Ben, Sharp, Tony, Roberts, Tim, Fastier-Wooller, Jarred, Kelly, Greg, Sahin, Oz, Thiel, David, Dao, Dao and Woodfield, Peter L.. 2022. "Underground LoRa sensor node for bushfire monitoring." Fire Technology. 58 (3), pp. 1087-1095. https://doi.org/10.1007/s10694-022-01224-3
An Objective Measure of Quality for Time-Scale Modification of Audio
Roberts, Timothy and Paliwal, Kuldip K.. 2021. "An Objective Measure of Quality for Time-Scale Modification of Audio." The Journal of the Acoustical Society of America. 149 (3), pp. 1843-1854. https://doi.org/10.1121/10.0003753
Design of Objective Quality Measures for Time-Scale Modification of Audio
Roberts, Timothy. 2021. Design of Objective Quality Measures for Time-Scale Modification of Audio. PhD Thesis Doctor of Philosophy. Griffith University. https://doi.org/10.25904/1912/4070
A Time-Scale Modification Dataset with Subjective Quality Labels
Roberts, Timothy and Paliwal, Kuldip K. 2020. "A Time-Scale Modification Dataset with Subjective Quality Labels." The Journal of the Acoustical Society of America. 148 (1), pp. 201-210. https://doi.org/10.1121/10.0001567
Time-scale modification using fuzzy epoch-synchronous overlap-add (FESOLA)
Roberts, Timothy and Paliwal, Kuldip K.. 2019. "Time-scale modification using fuzzy epoch-synchronous overlap-add (FESOLA)." 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, United States 20 - 23 Oct 2019 United States. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/waspaa.2019.8937258
Stereo time-scale modification using sum and difference transformation
Roberts, Timothy and Paliwal, Kuldip K.. 2019. "Stereo time-scale modification using sum and difference transformation." 2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS 2018). Cairns, Australia 17 - 19 Dec 2018 Australia. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/ICSPCS.2018.8631776
Frequency Dependent Time-Scale Modification
Roberts, Timothy and Paliwal, Kuldip K.. 2019. "Frequency Dependent Time-Scale Modification." 2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS 2018). Cairns, Australia 17 - 19 Dec 2018 Australia. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/ICSPCS.2018.8631764