Guided Disentangled Representation Learning from Audio data for Transfer Learning
PhD by Publication
Title | Guided Disentangled Representation Learning from Audio data for Transfer Learning |
---|---|
Type | PhD by Publication |
Authors | Haque, Kazi Nazmul |
Supervisor | |
1. First | Prof Rajib Rana |
2. Second | Prof Ji Zhang |
Institution of Origin | University of Southern Queensland |
Qualification Name | Doctor of Philosophy |
Number of Pages | 93 |
Year | 2024 |
Publisher | University of Southern Queensland |
Place of Publication | Australia |
Digital Object Identifier (DOI) | https://doi.org/10.26192/z9y75 |
Abstract | In the field of machine learning, disentangled representation learning seeks to map high-dimensional data into a low-dimensional space where the underlying variational factors are both disentangled and easily separable. This thesis investigates the application of such representations, derived from unlabelled data to tasks where only limited labelled data is available. Specifically, I explore the domain of audio modelling, where the absence of supervision in learning representations from unlabelled data often results in representations that may not be optimally useful for downstream tasks, leading to potential resource wastage. To address this issue, I introduce the Guided Generative Adversarial Neural Network (GGAN), a novel model that utilises a modest amount of labelled data to guide the learning of relevant disentangled representations from a larger corpus of unlabelled data. While the representation learned through GGAN proves beneficial for the task at hand, its generalisation capabilities are limited, restricting the model's application to tasks similar to or closely related to the original one. To overcome this limitation, I propose a second model, the Guided Generative Adversarial Autoencoder (GAAE), which not only learns representations tailored to a specific downstream task but also captures the general attributes of the data, thereby being independent of the particular task. Both GGAN and GAAE are founded on the Generative Adversarial Network (GAN) architecture, leveraging the audio generalisation prowess of GANs for representation learning. Nevertheless, the models eschew working with 1D raw audio waveforms directly, instead utilising 2D spectrograms, a practice that recent research suggests may curtail the models' ultimate performance capabilities, representing a significant gap in the literature. This thesis confronts this issue head-on. Convolutional Neural Networks (CNNs), forming the structural backbone of both GGAN and GAAE, have historically faced challenges in generating raw audio waveforms via adversarial training. A foundational step in surmounting this hurdle involves a thorough examination of CNNs' ability to model raw audio waveforms, such as classification tasks. Moving strategically in this direction, I have proposed two cosine filter-based CNN models: the Cosine Convolution Neural Network (CosCovNN) and the Vector Quantised Cosine Convolutional Neural Network with Memory (VQCCM). These models have not only outclassed traditional CNN architectures but have also set a new benchmark in the field of audio classification. |
Keywords | Machine Learning; Deep Learning; Generative Adversarial Neural Networks; Convolutional Neural Networks; Guided Representation Learning; TransferLearning |
Related Output | |
Has part | Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation using Fewer Labelled Audio Data |
Has part | High-fidelity audio generation and representation learning with guided adversarial autoencoder |
Contains Sensitive Content | Does not contain sensitive content |
ANZSRC Field of Research 2020 | 460302. Audio processing |
461106. Semi- and unsupervised learning | |
461103. Deep learning | |
461101. Adversarial machine learning | |
461104. Neural networks | |
461199. Machine learning not elsewhere classified | |
Public Notes | File reproduced in accordance with the copyright policy of the publisher/author/creator. |
Byline Affiliations | School of Mathematics, Physics and Computing |
https://research.usq.edu.au/item/z9y75/guided-disentangled-representation-learning-from-audio-data-for-transfer-learning
Restricted files
Published Version
11
total views0
total downloads1
views this month0
downloads this month