Guided Disentangled Representation Learning from Audio data for Transfer Learning

PhD by Publication


Haque, Kazi Nazmul. 2024. Guided Disentangled Representation Learning from Audio data for Transfer Learning. PhD by Publication Doctor of Philosophy. University of Southern Queensland. https://doi.org/10.26192/z9y75
Title

Guided Disentangled Representation Learning from Audio data for Transfer Learning

TypePhD by Publication
AuthorsHaque, Kazi Nazmul
Supervisor
1. FirstProf Rajib Rana
2. SecondProf Ji Zhang
Institution of OriginUniversity of Southern Queensland
Qualification NameDoctor of Philosophy
Number of Pages93
Year2024
PublisherUniversity of Southern Queensland
Place of PublicationAustralia
Digital Object Identifier (DOI)https://doi.org/10.26192/z9y75
Abstract

In the field of machine learning, disentangled representation learning seeks to map high-dimensional data into a low-dimensional space where the underlying variational factors are both disentangled and easily separable. This thesis investigates the application of such representations, derived from unlabelled data to tasks where only limited labelled data is available. Specifically, I explore the domain of audio modelling, where the absence of supervision in learning representations from unlabelled data often results in representations that may not be optimally useful for downstream tasks, leading to potential resource wastage. To address this issue, I introduce the Guided Generative Adversarial Neural Network (GGAN), a novel model that utilises a modest amount of labelled data to guide the learning of relevant disentangled representations from a larger corpus of unlabelled data. While the representation learned through GGAN proves beneficial for the task at hand, its generalisation capabilities are limited, restricting the model's application to tasks similar to or closely related to the original one. To overcome this limitation, I propose a second model, the Guided Generative Adversarial Autoencoder (GAAE), which not only learns representations tailored to a specific downstream task but also captures the general attributes of the data, thereby being independent of the particular task. Both GGAN and GAAE are founded on the Generative Adversarial Network (GAN) architecture, leveraging the audio generalisation prowess of GANs for representation learning. Nevertheless, the models eschew working with 1D raw audio waveforms directly, instead utilising 2D spectrograms, a practice that recent research suggests may curtail the models' ultimate performance capabilities, representing a significant gap in the literature. This thesis confronts this issue head-on. Convolutional Neural Networks (CNNs), forming the structural backbone of both GGAN and GAAE, have historically faced challenges in generating raw audio waveforms via adversarial training. A foundational step in surmounting this hurdle involves a thorough examination of CNNs' ability to model raw audio waveforms, such as classification tasks. Moving strategically in this direction, I have proposed two cosine filter-based CNN models: the Cosine Convolution Neural Network (CosCovNN) and the Vector Quantised Cosine Convolutional Neural Network with Memory (VQCCM). These models have not only outclassed traditional CNN architectures but have also set a new benchmark in the field of audio classification.

KeywordsMachine Learning; Deep Learning; Generative Adversarial Neural Networks; Convolutional Neural Networks; Guided Representation Learning; TransferLearning
Related Output
Has partGuided Generative Adversarial Neural Network for Representation Learning and Audio Generation using Fewer Labelled Audio Data
Has partHigh-fidelity audio generation and representation learning with guided adversarial autoencoder
Contains Sensitive ContentDoes not contain sensitive content
ANZSRC Field of Research 2020460302. Audio processing
461106. Semi- and unsupervised learning
461103. Deep learning
461101. Adversarial machine learning
461104. Neural networks
461199. Machine learning not elsewhere classified
Public Notes

File reproduced in accordance with the copyright policy of the publisher/author/creator.

Byline AffiliationsSchool of Mathematics, Physics and Computing
Permalink -

https://research.usq.edu.au/item/z9y75/guided-disentangled-representation-learning-from-audio-data-for-transfer-learning

Restricted files

Published Version

  • 20
    total views
  • 0
    total downloads
  • 4
    views this month
  • 0
    downloads this month

Export as

Related outputs

Feasibility of Mental Health Triage Call Priority Prediction Using Machine Learning
Rana, Rajib, Higgins, Niall, Haque, Kazi Nazmul, Burke, Kylie, Turner, Kathryn and Stedman, Terry. 2024. "Feasibility of Mental Health Triage Call Priority Prediction Using Machine Learning." Nursing Reports. 14 (4), pp. 4162-4172. https://doi.org/10.3390/nursrep14040303