Deep learning-based single-ended quality prediction for time-scale modified audio

Article

Roberts, Timothy, Nicolson, Aaron and Paliwal, Kuldip K.. 2021. "Deep learning-based single-ended quality prediction for time-scale modified audio." Journal of the Audio Engineering Society. 69 (9), pp. 644-655. https://doi.org/10.17743/jaes.2021.0031

Article Title	Deep learning-based single-ended quality prediction for time-scale modified audio
ERA Journal ID	10027
Article Category	Article
Authors	Roberts, Timothy, Nicolson, Aaron and Paliwal, Kuldip K.
Journal Title	Journal of the Audio Engineering Society
Journal Citation	69 (9), pp. 644-655
Number of Pages	12
Year	2021
Publisher	Audio Engineering Society
Place of Publication	United States
ISSN	0004-7554
	1549-4950
Digital Object Identifier (DOI)	https://doi.org/10.17743/jaes.2021.0031
Web Address (URL)	https://aes2.org/publications/elibrary-page/?id=21461
Abstract	Objective evaluation of audio processed with Time-Scale Modification (TSM) has recently seen improvement with a labeled time-scaled audio dataset used to train an objective measure. This double-ended measure was an extension of Perceptual Evaluation of Audio Quality and required reference and test signals. In this paper two single-ended objective quality measures for time-scaled audio are proposed that do not require a reference signal. Internal representations of spectrogram and speech features are learned by either a Convolutional Neural Network (CNN) or a Bidirectional Gated Recurrent Unit (BGRU) network and fed to a fully connected network to predict Subjective Mean Opinion Scores. The proposed CNN and BGRU measures respectively achieve average Root Mean Square Errors of 0.61 and 0.58 and mean Pearson Correlation Coefficients of 0.77 and 0.79 to the time-scaled audio dataset. The proposed measures are used to evaluate TSM algorithms and comparisons are provided for 15 TSM implementations. A link to implementations of the objective measures is provided.
Contains Sensitive Content	Does not contain sensitive content
ANZSRC Field of Research 2020	400607. Signal processing
	461104. Neural networks
Public Notes	Files associated with this item cannot be displayed due to copyright restrictions.
Byline Affiliations	Griffith University
	Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia

Permalink -

https://research.usq.edu.au/item/zz13y/deep-learning-based-single-ended-quality-prediction-for-time-scale-modified-audio

31
total views
1
total downloads
4
views this month
0
downloads this month

Export as

Related outputs

Underground LoRa sensor node for bushfire monitoring

Herring, Ben, Sharp, Tony, Roberts, Tim, Fastier-Wooller, Jarred, Kelly, Greg, Sahin, Oz, Thiel, David, Dao, Dao and Woodfield, Peter L.. 2022. "Underground LoRa sensor node for bushfire monitoring." Fire Technology. 58 (3), pp. 1087-1095. https://doi.org/10.1007/s10694-022-01224-3

An Objective Measure of Quality for Time-Scale Modification of Audio

Roberts, Timothy and Paliwal, Kuldip K.. 2021. "An Objective Measure of Quality for Time-Scale Modification of Audio." The Journal of the Acoustical Society of America. 149 (3), pp. 1843-1854. https://doi.org/10.1121/10.0003753

Design of Objective Quality Measures for Time-Scale Modification of Audio

Roberts, Timothy. 2021. Design of Objective Quality Measures for Time-Scale Modification of Audio. PhD Thesis Doctor of Philosophy. Griffith University. https://doi.org/10.25904/1912/4070

A Time-Scale Modification Dataset with Subjective Quality Labels

Roberts, Timothy and Paliwal, Kuldip K. 2020. "A Time-Scale Modification Dataset with Subjective Quality Labels." The Journal of the Acoustical Society of America. 148 (1), pp. 201-210. https://doi.org/10.1121/10.0001567

Time-scale modification using fuzzy epoch-synchronous overlap-add (FESOLA)

Roberts, Timothy and Paliwal, Kuldip K.. 2019. "Time-scale modification using fuzzy epoch-synchronous overlap-add (FESOLA)." 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, United States 20 - 23 Oct 2019 United States. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/waspaa.2019.8937258

Stereo time-scale modification using sum and difference transformation

Roberts, Timothy and Paliwal, Kuldip K.. 2019. "Stereo time-scale modification using sum and difference transformation." 2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS 2018). Cairns, Australia 17 - 19 Dec 2018 Australia. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/ICSPCS.2018.8631776

Frequency Dependent Time-Scale Modification

Roberts, Timothy and Paliwal, Kuldip K.. 2019. "Frequency Dependent Time-Scale Modification." 2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS 2018). Cairns, Australia 17 - 19 Dec 2018 Australia. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/ICSPCS.2018.8631764