Intraurban NO2 hotspot detection via clustering of in-situ, remote, and modeled air quality data products
- Anastasia Montgomery ,
- Madeleine Daepp ,
- Marah I Abdin ,
- Pallavi Choudry ,
- Sara Malvar ,
- Scott Counts ,
- Daniel E Horton
Novel air quality data sources promise unprecedented insights on intra-urban variations in air pollution by enabling stakeholders to identify and mitigate hotspots. However, sparse regulatory networks limit validation of novel datasets, resulting in pollutant exposure estimates that are likely to be noisy and difficult to cross-analyze across platforms. In this study, we identify and evaluate clusters of NO2 using the Getis-Ord G* statistic across Chicago, IL using three novel air quality datasets: (1) a two-way coupled WRF-CMAQ simulation performed at 1.3 km resolution; (2) the TropOMI satellite instrument; and (3) a high-density network of low-cost air quality sensors deployed through the Microsoft Eclipse project. We identify a large, statistically significant cluster of heightened exposures that is observed across all three data sources, enabling us to report with high confidence the presence of a “true” hotspot, despite a dearth of regulatory data in the affected area. Moreover, using the temporally fine-grained data sets (WRF-CMAQ and Eclipse), we observe that the hotspot is consistent across dominant wind directions. By analyzing the disagreement across clusters, we may systematically analyze the reasons for divergence. For example, a hotspot that emerges in the observational datasets but not modeled dataset enables us to interrogate model biases with respect to underlying emissions and meteorological performance. To contrast, hotspots simulated by WRF-CMAQ but not observed by sensors enable us to prioritize new locations for sensor deployment. This work offers an example of how researchers can utilize and build confidence in multiple sources of novel air quality data. As such, these complementary tools can be used to both evaluate confidence in policy-relevant insights and to interrogate and improve discrepancies across datasets.