Uncertainty assessment based on data decomposition and Boruta-driven extreme gradient boosting to predict spatiotemporal urban air dust heavy metal index
Article
Article Title | Uncertainty assessment based on data decomposition and Boruta-driven extreme gradient boosting to predict spatiotemporal urban air dust heavy metal index |
---|---|
ERA Journal ID | 210171 |
Article Category | Article |
Authors | Seifi, Akram, Soltani-Gerdefaramarzi, Somayeh and Ali, Mumtaz |
Journal Title | Atmospheric Pollution Research |
Journal Citation | 16 (11) |
Number of Pages | 18 |
Year | 2025 |
Publisher | Elsevier |
Place of Publication | United Kingdom |
ISSN | 1309-1042 |
Digital Object Identifier (DOI) | https://doi.org/10.1016/j.apr.2025.102654 |
Web Address (URL) | https://www.sciencedirect.com/science/article/pii/S1309104225002569?via%3Dihub |
Abstract | Accurate prediction of urban air dust pollutants is essential for public health and environmental management. Achieving reliable predictions of the air pollution due to heavy metals existence in these areas is extremely important. This study for the first time develop an ensemble approach based on multivariate variational model decomposition (MVMD) and extreme gradient boosting (XGBoost) integrated with Bayesian optimizer of Optuna and different feature selection techniques to predict the spatiotemporal distribution of pollution load index (PLI) in Yazd urban area, Iran. For comparison, gated recurrent unit (GRU) network, adaptives neuro-fuzzy-inference system (ANFIS), and multilayer perceptron (MLP) models were are develpoed. Variables including meteorological data, heavy metals concentration of roof dust, and distance to pollution sources were gathered. The seasonal data of variables were analyzed using Boruta feature selection approach (BFSA), SHapley additive explanations (SHAP), and Wavelet methods to identify valuable and easily accessible variables to predict PLI index. The results confirmed that the BFSA has high capability for selecting the most important features over SHAP, and wavelet techniques, that provides cost-effective input vector of Max WD, Min RH, Cd, and Zn with readily available variables. Morover, the XGBoost model shows high prediction accuracy for PLI in terms of R2 = 0.90, RMSE = 0.08, and MAE = 0.06. Furthermore, by stationarity test of multivariate variational mode decomposition (MVMD) method applied to all input variables, the Max WD and Min RH were decompossed into three intrinsic mode functions (IMFs). These IMFs along with Cd and Zn were used as input vector in the XGBoost to create the final model for predicting temporal uncertainty and generate seasonal urban spatiotemporal maps. The evaluation of uncertainties demonstrated that the MVMD-XGBoost effectively captured 83.33 %, 96.67 %, 63.33 %, and 68.97 % of observed data within the 95 % confidence interval in spring, summer, autumn, and winter seasons, respectively. Findings from this study allow decision-makers to reduce air pollution monitoring costs and enhance control measures by leveraging readily available variables. |
Keywords | Air pollution; Boruta feature selection; Readily available data; Uncertainty; Extreme gradient boosting |
Contains Sensitive Content | Does not contain sensitive content |
ANZSRC Field of Research 2020 | 461103. Deep learning |
460104. Applications in physical sciences |
https://research.usq.edu.au/item/zyy17/uncertainty-assessment-based-on-data-decomposition-and-boruta-driven-extreme-gradient-boosting-to-predict-spatiotemporal-urban-air-dust-heavy-metal-index
Download files
Published Version
Uncertainty assessment based on data decomposition and Boruta-driven extreme gradient boosting.pdf | ||
License: CC BY 4.0 | ||
File access level: Anyone |
24
total views5
total downloads21
views this month5
downloads this month