Deep Ensemble Machine Learning Framework for the Estimation of P[M.sub.2.5] Concentrations.

Citation metadata

From: Environmental Health Perspectives(Vol. 130, Issue 3)
Publisher: National Institute of Environmental Health Sciences
Document Type: Report
Length: 9,417 words
Lexile Measure: 1500L

Document controls

Main content

Abstract :

BACKGROUND: Accurate estimation of historical P[M.sub.2.5] (particle matter with an aerodynamic diameter of less than 2.5 [micro]m) is critical and essential for environmental health risk assessment. OBJECTIVES: The aim of this study was to develop a multiple-level stacked ensemble machine learning framework for improving the estimation of the daily ground-level P[M.sub.2.5] concentrations. METHODS: An innovative deep ensemble machine learning framework (DEML) was developed to estimate the daily P[M.sub.2.5] concentrations. The framework has a three-stage structure: At the first stage, four base models [gradient boosting machine (GBM), support vector machine (SVM), random forest (RF), and eXtreme gradient boosting (XGBoost)] were used to generate a new data set of P[M.sub.2.5] concentrations for training the next-stage learners. At the second stage, three meta-models [RF, XGBoost, and Generalized Linear Model (GLM)] were used to estimate P[M.sub.2.5] concentrations using a combination of the original data set and the predictions from the first-stage models. At the third stage, a nonnegative least squares (NNLS) algorithm was employed to obtain the optimal weights for P[M.sub.2.5] estimation. We took the data from 133 monitoring stations in Italy as an example to implement the DEML to predict daily P[M.sub.2.5] at each 1 km x 1 km grid cell from 2015 to 2019 across Italy. We evaluated the model performance by performing 10-fold cross-validation (CV) and compared it with five benchmark algorithms [GBM, SVM, RF, XGBoost, and Super Learner (SL)]. RESULTS: The results revealed that the P[M.sub.2.5] prediction performance of DEML [coefficients of determination ([R.sup.2]) = 0.87 and root mean square error (RMSE) =5.38 [micro]g/[m.sup.3]] was superior to any benchmark models (with [R.sup.2] of 0.51, 0.76, 0.83, 0.70, and 0.83 for GBM, SVM, RF, XGBoost, and SL approach, respectively). DEML displayed reliable performance in capturing the spatiotemporal variations of P[M.sub.2.5] in Italy. DISCUSSION: The proposed DEML framework achieved an outstanding performance in P[M.sub.2.5] estimation, which could be used as a tool for more accurate environmental exposure assessment.

Source Citation

Source Citation   

Gale Document Number: GALE|A696826095