Design of Objective Quality Measures for Time-Scale Modification of Audio

PhD Thesis

Roberts, Timothy. 2021. Design of Objective Quality Measures for Time-Scale Modification of Audio. PhD Thesis Doctor of Philosophy. Griffith University. https://doi.org/10.25904/1912/4070

Title	Design of Objective Quality Measures for Time-Scale Modification of Audio
Type	PhD Thesis
Authors	Roberts, Timothy
Supervisor	Prof. Kuldip Paliwal
	Dr Andrew Busch
Institution of Origin	Griffith University
Qualification Name	Doctor of Philosophy
Number of Pages	260
Year	2021
Publisher	Griffith University
Place of Publication	Australia
Digital Object Identifier (DOI)	https://doi.org/10.25904/1912/4070
Web Address (URL)	http://hdl.handle.net/10072/401637
Abstract	This dissertation describes the design of effective objective measures of quality for Time-Scale Modification (TSM). TSM methods are single channel algorithms that give poor results when applied to multi-channel signals, as the phase relationship between channels must be maintained. This dissertation proposes a method and additional variant for maintaining the phase relationship between channels and retaining the presence in the centre of the stereo signal. The method involves pre- and post-processing the signal, with the variant processing each frame for real-time suitability. Sum and difference transformations of the stereo signal are used for TSM and result in a large improvement in stereo phase coherence, consequently maintaining the stereo field. The proposed method produces a high quality stereo output and greatly improves quality over the independent channel processing method. A modification to the Epoch-Synchronous Overlap-Add (ESOLA) TSM algorithm is proposed in this dissertation. The proposed method, Fuzzy Epoch-Synchronous Overlap-Add, improves on the previous ESOLA method through cross-correlation of time-smeared epochs before overlap-adding. This reduces distortion and artefacts while the speaker's fundamental frequency is stable, as well as reducing artefacts during pitch modulation. The proposed method is tested against well-known TSM algorithms. It is preferred over ESOLA and gives similar performance to other TSM algorithms for voice signals. It is also shown that this algorithm can work effectively with solo instrument signals containing strong fundamental frequencies. No effective objective measure of quality for TSM exists. This dissertation details the creation, subjective evaluation and analysis of a dataset, for use in the development of an objective measure of quality for TSM. Comprising two parts, the training subset contains 88 source files processed using six TSM methods at 10 time-scales, while the testing subset contains 20 source files processed using three additional methods at four time-scales. The source material contains speech, solo harmonic and percussive instruments, sound effects and a range of music genres. 42,529 ratings were collected from 633 sessions using laboratory and remote collection methods. Analysis of results shows no correlation between age and quality of rating; equivalence between expert and non-expert listeners; negligible differences between participants with and without hearing issues; and negligible differences between testing modalities. Comparison of published objective measures and subjective scores shows the objective measures to be poor indicators of subjective quality. Initial results for a retrained objective measure of quality are presented with results approaching average loss and correlation values of subjective sessions. An objective measure of quality for time-scaled audio is proposed that makes use of the previously developed dataset and improves on reported results. The measure uses hand-crafted features and a fully connected network to predict subjective mean opinion scores. Basic and Advanced Perceptual Evaluation of Audio Quality features are used in addition to nine features specific to TSM artefacts. Six methods of alignment are explored, with interpolation of the reference magnitude spectrum to the length of the test magnitude spectrum giving the best performance. The proposed measure achieves an average Root Mean Squared Error (RMSE) of 0.490 and a mean Pearson Correlation Coeffcient (PCC) of 0.864, equivalent to 97th and 82nd percentiles of subjective sessions respectively. The proposed measure is used to evaluate TSM algorithms, finding that Elastique gives the highest objective quality for solo instrument and voice signals, while the Identity Phase-Locking Phase Vocoder gives the highest objective quality for music signals and the best overall quality. Two single-ended objective quality measures for time-scaled audio are also proposed. These measure do not require a reference signal, nor alignment. Data driven features are created by either a convolutional neural network (CNN) or a bidirectional gated recurrent unit (BGRU) network, and are fed to a fully-connected network to predict subjective mean opinion scores. The proposed CNN and BGRU measures achieve an average RMSE of 0.608 and 0.576, and a mean PCC of 0.771 and 0.794, respectively. The proposed measures are used to evaluate TSM algorithms, and comparisons are provided for 16 TSM implementations. A literature review is included with required background knowledge. It includes the fundamentals of sound perception, sound capture, digital signal processing, time-scale modification methods used within research, and subjective and objective measures of quality. Full implementation of all proposed methods and measures can be found at github.com/zygurt/TSM, while the labelled dataset is available at http://ieee-dataport.org/1987.
Keywords	time scale modification ; objective ; subjective ; quality ; opinion score
Contains Sensitive Content	Does not contain sensitive content
ANZSRC Field of Research 2020	400607. Signal processing
	461104. Neural networks
Public Notes	Files associated with this item cannot be displayed due to copyright restrictions.
Byline Affiliations	Griffith University

Permalink -

https://research.usq.edu.au/item/zz0z0/design-of-objective-quality-measures-for-time-scale-modification-of-audio

47
total views
1
total downloads
5
views this month
0
downloads this month

Export as

Related outputs

Underground LoRa sensor node for bushfire monitoring

Herring, Ben, Sharp, Tony, Roberts, Tim, Fastier-Wooller, Jarred, Kelly, Greg, Sahin, Oz, Thiel, David, Dao, Dao and Woodfield, Peter L.. 2022. "Underground LoRa sensor node for bushfire monitoring." Fire Technology. 58 (3), pp. 1087-1095. https://doi.org/10.1007/s10694-022-01224-3

Deep learning-based single-ended quality prediction for time-scale modified audio

Roberts, Timothy, Nicolson, Aaron and Paliwal, Kuldip K.. 2021. "Deep learning-based single-ended quality prediction for time-scale modified audio." Journal of the Audio Engineering Society. 69 (9), pp. 644-655. https://doi.org/10.17743/jaes.2021.0031

An Objective Measure of Quality for Time-Scale Modification of Audio

Roberts, Timothy and Paliwal, Kuldip K.. 2021. "An Objective Measure of Quality for Time-Scale Modification of Audio." The Journal of the Acoustical Society of America. 149 (3), pp. 1843-1854. https://doi.org/10.1121/10.0003753

A Time-Scale Modification Dataset with Subjective Quality Labels

Roberts, Timothy and Paliwal, Kuldip K. 2020. "A Time-Scale Modification Dataset with Subjective Quality Labels." The Journal of the Acoustical Society of America. 148 (1), pp. 201-210. https://doi.org/10.1121/10.0001567

Time-scale modification using fuzzy epoch-synchronous overlap-add (FESOLA)

Roberts, Timothy and Paliwal, Kuldip K.. 2019. "Time-scale modification using fuzzy epoch-synchronous overlap-add (FESOLA)." 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). New Paltz, United States 20 - 23 Oct 2019 United States. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/waspaa.2019.8937258

Stereo time-scale modification using sum and difference transformation

Roberts, Timothy and Paliwal, Kuldip K.. 2019. "Stereo time-scale modification using sum and difference transformation." 2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS 2018). Cairns, Australia 17 - 19 Dec 2018 Australia. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/ICSPCS.2018.8631776

Frequency Dependent Time-Scale Modification

Roberts, Timothy and Paliwal, Kuldip K.. 2019. "Frequency Dependent Time-Scale Modification." 2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS 2018). Cairns, Australia 17 - 19 Dec 2018 Australia. IEEE (Institute of Electrical and Electronics Engineers). https://doi.org/10.1109/ICSPCS.2018.8631764