An Auto-Encoder with Genetic Algorithm for High Dimensional Data: Towards Accurate and Interpretable Outlier Detection
Article
Article Title | An Auto-Encoder with Genetic Algorithm for High Dimensional Data: Towards Accurate and Interpretable Outlier Detection |
---|---|
ERA Journal ID | 123944 |
Article Category | Article |
Authors | Li, Jiamu, Zhang, Ji, Bah, Mohamed Jaward, Wang, Jian, Zhu, Youwen, Yang, Gaoming, Li, Lingling and Zhang, Kexin |
Journal Title | Algorithms |
Journal Citation | 15 (11) |
Article Number | 429 |
Number of Pages | 22 |
Year | 2022 |
Publisher | MDPI AG |
Place of Publication | Switzerland |
ISSN | 1999-4893 |
Digital Object Identifier (DOI) | https://doi.org/10.3390/a15110429 |
Web Address (URL) | https://www.mdpi.com/1999-4893/15/11/429 |
Abstract | When dealing with high-dimensional data, such as in biometric, e-commerce, or industrial applications, it is extremely hard to capture the abnormalities in full space due to the curse of dimensionality. Furthermore, it is becoming increasingly complicated but essential to provide interpretations for outlier detection results in high-dimensional space as a consequence of the large number of features. To alleviate these issues, we propose a new model based on a Variational AutoEncoder and Genetic Algorithm (VAEGA) for detecting outliers in subspaces of high-dimensional data. The proposed model employs a neural network to create a probabilistic dimensionality reduction variational autoencoder (VAE) that applies its low-dimensional hidden space to characterize the high-dimensional inputs. Then, the hidden vector is sampled randomly from the hidden space to reconstruct the data so that it closely matches the input data. The reconstruction error is then computed to determine an outlier score, and samples exceeding the threshold are tentatively identified as outliers. In the second step, a genetic algorithm (GA) is used as a basis for examining and analyzing the abnormal subspace of the outlier set obtained by the VAE layer. After encoding the outlier dataset’s subspaces, the degree of anomaly for the detected subspaces is calculated using the redefined fitness function. Finally, the abnormal subspace is calculated for the detected point by selecting the subspace with the highest degree of anomaly. The clustering of abnormal subspaces helps filter outliers that are mislabeled (false positives), and the VAE layer adjusts the network weights based on the false positives. When compared to other methods using five public datasets, the VAEGA outlier detection model results are highly interpretable and outperform or have competitive performance compared to current contemporary methods. |
ANZSRC Field of Research 2020 | 460299. Artificial intelligence not elsewhere classified |
Byline Affiliations | Nanjing University of Aeronautics and Astronautics, China |
School of Mathematics, Physics and Computing | |
Zhejiang Lab, China | |
Anhui University of Science and Technology, China | |
Zhengzhou University of Aeronautics, China |
https://research.usq.edu.au/item/z024x/an-auto-encoder-with-genetic-algorithm-for-high-dimensional-data-towards-accurate-and-interpretable-outlier-detection
Download files
66
total views26
total downloads5
views this month1
downloads this month