Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation using Fewer Labelled Audio Data
Article
Article Title | Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation using Fewer Labelled Audio Data |
---|---|
ERA Journal ID | 34661 |
Article Category | Article |
Authors | Haque, Kazi Nazmul (Author), Rana, Rajib (Author), Liu, Jiajun (Author), Hansen, John H. L. (Author), Cummins, Nicholas (Author), Busso, Carlos (Author) and Schuller, Bjorn W. (Author) |
Journal Title | IEEE ACM Transactions on Audio, Speech, and Language Processing |
Journal Citation | 29, pp. 2575-2590 |
Number of Pages | 16 |
Year | 2021 |
Place of Publication | Piscataway, United States |
ISSN | 1558-7916 |
1558-7924 | |
2329-9290 | |
2329-9304 | |
Digital Object Identifier (DOI) | https://doi.org/10.1109/TASLP.2021.3098764 |
Web Address (URL) | https://ieeexplore.ieee.org/document/9492807 |
Abstract | The Generation power of Generative Adversarial Neural Networks (GANs) has shown great promise to learn representations from unlabelled data while guided by a small amount of labelled data. We aim to utilise the generation power of GANs to learn Audio Representations. Most existing studies are, however, focused on images. Some studies use GANs for speech generation, but they are conditioned on text or acoustic features, limiting their use for other audio, such as instruments, and even for speech where transcripts are limited. This paper proposes a novel GAN-based model that we named Guided Generative Adversarial Neural Network (GGAN), which can learn powerful representations and generate good-quality samples using a small amount of labelled data as guidance. Experimental results based on a speech [Speech Command Dataset (S09)] and a non-speech [Musical Instrument Sound dataset (Nsyth)] dataset demonstrate that using only 5\% of labelled data as guidance, GGAN learns significantly better representations than the state-of-the-art models. |
Keywords | Generators, Generative adversarial networks, Spectrogram, Data models, Training, Task analysis, Speech processing |
Related Output | |
Is part of | Guided Disentangled Representation Learning from Audio data for Transfer Learning |
Contains Sensitive Content | Does not contain sensitive content |
ANZSRC Field of Research 2020 | 460212. Speech recognition |
460302. Audio processing | |
Public Notes | This article is part of a UniSQ Thesis by publication. See Related Output. |
Byline Affiliations | University of Southern Queensland |
Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia | |
University of Texas, United States | |
King's College London, United Kingdom | |
Imperial College London, United Kingdom | |
Institution of Origin | University of Southern Queensland |
https://research.usq.edu.au/item/q6915/guided-generative-adversarial-neural-network-for-representation-learning-and-audio-generation-using-fewer-labelled-audio-data
Download files
105
total views880
total downloads1
views this month0
downloads this month