High-fidelity audio generation and representation learning with guided adversarial autoencoder
Article
Article Title | High-fidelity audio generation and representation learning with guided adversarial autoencoder |
---|---|
ERA Journal ID | 210567 |
Article Category | Article |
Authors | Haque, Kazi Nazmul (Author), Rana, Rajib (Author) and Schuller, Bjorn W. (Author) |
Journal Title | IEEE Access |
Journal Citation | 8, pp. 223509-223528 |
Article Number | 9272282 |
Number of Pages | 20 |
Year | 2020 |
Publisher | IEEE (Institute of Electrical and Electronics Engineers) |
Place of Publication | Piscataway, NJ, United States |
ISSN | 2169-3536 |
Digital Object Identifier (DOI) | https://doi.org/10.1109/ACCESS.2020.3040797 |
Web Address (URL) | https://ieeexplore.ieee.org/document/9272282 |
Abstract | Generating high-fidelity conditional audio samples and learning representation from unlabelled audio data are two challenging problems in machine learning research. Recent advances in the Generative Adversarial Neural Networks (GAN) architectures show great promise in addressing these challenges. To learn powerful representation using GAN architecture, it requires superior sample generation quality, which requires an enormous amount of labelled data. In this paper, we address this issue by proposing Guided Adversarial Autoencoder (GAAE), which can generate superior conditional audio samples from unlabelled audio data using a small percentage of labelled data as guidance. Representation learned from unlabelled data without any supervision does not guarantee its' usability for any downstream task. On the other hand, during the representation learning, if the model is highly biased towards the downstream task, it losses its generalisation capability. This makes the learned representation hardly useful for any other tasks that are not related to that downstream task. The proposed GAAE model also address these issues. Using this superior conditional generation, GAAE can learn representation specific to the downstream task. Furthermore, GAAE learns another type of representation capturing the general attributes of the data, which is independent of the downstream task at hand. Experimental results involving the S09 and the NSynth dataset attest the superior performance of GAAE compared to the state-of-the-art alternatives. |
Keywords | audio generation, representation learning, generative adversarial neural network, guided generative adversarial autoencoder |
ANZSRC Field of Research 2020 | 460212. Speech recognition |
461104. Neural networks | |
461103. Deep learning | |
Byline Affiliations | School of Sciences |
Imperial College London, United Kingdom | |
Institution of Origin | University of Southern Queensland |
https://research.usq.edu.au/item/q63y2/high-fidelity-audio-generation-and-representation-learning-with-guided-adversarial-autoencoder
Download files
106
total views93
total downloads1
views this month1
downloads this month