Automated Machine Learning to Evaluate the Information Content of Tropospheric Trace Gas Columns for Fine Particle Estimates Over India: A Modeling Testbed

Zhonghua Zheng; Arlene M. Fiore; Daniel M. Westervelt; George P. Milly; Jeff Goldsmith; Alexandra Karambelas; Gabriele Curci; Cynthia A. Randles; Antonio R. Paiva; Chi Wang; Qingyun Wu; Sagnik Dey

Automated Machine Learning to Evaluate the Information Content of Tropospheric Trace Gas Columns for Fine Particle Estimates Over India: A Modeling Testbed

Zhonghua Zheng ,
Arlene M. Fiore ,
Daniel M. Westervelt ,
George P. Milly ,
Jeff Goldsmith ,
Alexandra Karambelas ,
Gabriele Curci ,
Cynthia A. Randles ,
Antonio R. Paiva ,
Chi Wang ,
Qingyun Wu ,
Sagnik Dey

Journal of Advances in Modeling Earth Systems | March 2023

下载 BibTex

India is largely devoid of high-quality and reliable on-the-ground measurements of fine particulate matter (PM_2.5). Ground-level PM_2.5 concentrations are estimated from publicly available satellite Aerosol Optical Depth (AOD) products combined with other information. Prior research has largely overlooked the possibility of gaining additional accuracy and insights into the sources of PM using satellite retrievals of tropospheric trace gas columns. We evaluate the information content of tropospheric trace gas columns for PM_2.5 estimates over India within a modeling testbed using an Automated Machine Learning (AutoML) approach, which selects from a menu of different machine learning tools based on the data set. We then quantify the relative information content of tropospheric trace gas columns, AOD, meteorological fields, and emissions for estimating PM_2.5 over four Indian sub-regions on daily and monthly time scales. Our findings suggest that, regardless of the specific machine learning model assumptions, incorporating trace gas modeled columns improves PM_2.5 estimates. We use the ranking scores produced from the AutoML algorithm and Spearman’s rank correlation to infer or link the possible relative importance of primary versus secondary sources of PM_2.5 as a first step toward estimating particle composition. Our comparison of AutoML-derived models to selected baseline machine learning models demonstrates that AutoML is at least as good as user-chosen models. The idealized pseudo-observations (chemical-transport model simulations) used in this work lay the groundwork for applying satellite retrievals of tropospheric trace gases to estimate fine particle concentrations in India and serve to illustrate the promise of AutoML applications in atmospheric and environmental research.

论文与出版物下载

FLAML: A Fast Library for AutoML and Tuning

15 12 月, 2020

FLAML is a Python library designed to automatically produce accurate machine learning models with low computational cost. It frees users from selecting learners and hyperparameters for each learner. FLAML is powered by a new, cost-effective hyperparameter optimization and learner selection method invented by Microsoft Research.

下载数据