Guided Disentangled Representation Learning from Audio data for Transfer Learning

PhD by Publication

Haque, Kazi Nazmul. 2024. Guided Disentangled Representation Learning from Audio data for Transfer Learning. PhD by Publication Doctor of Philosophy. University of Southern Queensland. https://doi.org/10.26192/z9y75

Supervisor
Title	Guided Disentangled Representation Learning from Audio data for Transfer Learning
Type	PhD by Publication
Authors	Haque, Kazi Nazmul
1. First	Prof Rajib Rana
2. Second	Prof Ji Zhang
Institution of Origin	University of Southern Queensland
Qualification Name	Doctor of Philosophy
Number of Pages	93
Year	2024
Publisher	University of Southern Queensland
Place of Publication	Australia
Digital Object Identifier (DOI)	https://doi.org/10.26192/z9y75
Abstract	In the field of machine learning, disentangled representation learning seeks to map high-dimensional data into a low-dimensional space where the underlying variational factors are both disentangled and easily separable. This thesis investigates the application of such representations, derived from unlabelled data to tasks where only limited labelled data is available. Specifically, I explore the domain of audio modelling, where the absence of supervision in learning representations from unlabelled data often results in representations that may not be optimally useful for downstream tasks, leading to potential resource wastage. To address this issue, I introduce the Guided Generative Adversarial Neural Network (GGAN), a novel model that utilises a modest amount of labelled data to guide the learning of relevant disentangled representations from a larger corpus of unlabelled data. While the representation learned through GGAN proves beneficial for the task at hand, its generalisation capabilities are limited, restricting the model's application to tasks similar to or closely related to the original one. To overcome this limitation, I propose a second model, the Guided Generative Adversarial Autoencoder (GAAE), which not only learns representations tailored to a specific downstream task but also captures the general attributes of the data, thereby being independent of the particular task. Both GGAN and GAAE are founded on the Generative Adversarial Network (GAN) architecture, leveraging the audio generalisation prowess of GANs for representation learning. Nevertheless, the models eschew working with 1D raw audio waveforms directly, instead utilising 2D spectrograms, a practice that recent research suggests may curtail the models' ultimate performance capabilities, representing a significant gap in the literature. This thesis confronts this issue head-on. Convolutional Neural Networks (CNNs), forming the structural backbone of both GGAN and GAAE, have historically faced challenges in generating raw audio waveforms via adversarial training. A foundational step in surmounting this hurdle involves a thorough examination of CNNs' ability to model raw audio waveforms, such as classification tasks. Moving strategically in this direction, I have proposed two cosine filter-based CNN models: the Cosine Convolution Neural Network (CosCovNN) and the Vector Quantised Cosine Convolutional Neural Network with Memory (VQCCM). These models have not only outclassed traditional CNN architectures but have also set a new benchmark in the field of audio classification.
Keywords	Machine Learning; Deep Learning; Generative Adversarial Neural Networks; Convolutional Neural Networks; Guided Representation Learning; TransferLearning
Related Output
Has part	Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation using Fewer Labelled Audio Data
Has part	High-fidelity audio generation and representation learning with guided adversarial autoencoder
Contains Sensitive Content	Does not contain sensitive content
ANZSRC Field of Research 2020	460302. Audio processing
	461106. Semi- and unsupervised learning
	461103. Deep learning
	461101. Adversarial machine learning
	461104. Neural networks
	461199. Machine learning not elsewhere classified
Public Notes	File reproduced in accordance with the copyright policy of the publisher/author/creator.
Byline Affiliations	School of Mathematics, Physics and Computing

Permalink -

https://research.usq.edu.au/item/z9y75/guided-disentangled-representation-learning-from-audio-data-for-transfer-learning

File access - Request a copy

Restricted files

Published Version

Under embargo until 18 Sep 2025

58
total views
0
total downloads
1
views this month
0
downloads this month

Export as

Related outputs

Feasibility of Mental Health Triage Call Priority Prediction Using Machine Learning

Rana, Rajib, Higgins, Niall, Haque, Kazi Nazmul, Burke, Kylie, Turner, Kathryn and Stedman, Terry. 2024. "Feasibility of Mental Health Triage Call Priority Prediction Using Machine Learning." Nursing Reports. 14 (4), pp. 4162-4172. https://doi.org/10.3390/nursrep14040303