Supporting clinicians to diagnose and assess Covid-19 severity using AI and Chest X-rays
Overview
Microsoft Research’s Project InnerEye team in Cambridge (UK) worked with University Hospitals Birmingham NHS Foundation Trust to develop deep learning models to analyze anonymized chest X-Rays and chest computed tomography (CT) scans to assist clinicians in determining disease severity, aid decision making, and improve our understanding of the disease. This collaborative project in Microsoft’s Studies In Pandemic Preparedness program, part of our COVID-19 response efforts where researchers at Microsoft worked with teams around the world to address the current situation and better prepare for future pandemic, supported by our AI for Health team.
COVID-19 X-rays have been a recommended procedure for patient triaging and resource management in intensive care units (ICUs) throughout the COVID-19 pandemic. The machine learning efforts to augment this workflow have, however, been long challenged due to deficiencies in reporting, model evaluation, and failure mode analysis. A recent study (opens in new tab) showed that 415 COVID-19 medical imaging projects had deficiencies that limited their use outside of the research lab. To address some of those shortcomings, we worked closely with our clinical partners to model radiological features with a human-interpretable class hierarchy that aligns with the radiological decision process. A DenseNet-121 backbone was first pre-trained with BYOL self-supervision on the public NIH-CXR dataset (opens in new tab) – technical details and how to do this using InnerEye-DeepLearning are here (opens in new tab). This was then fine-tuned with cross-validation using a private COVID-19 training dataset from four hospitals of the UHB group collected during the first COVID-19 wave (March to June 2020). The UHB team used the Azure Machine Learning DICOM image labelling tool to significantly speed up a labelling study involving several clinical annotators, who were asked to classify 400 chest X-rays. The collected labels were then used to benchmark the model’s performance.
The developed model outperforms the clinicians across all defined sub-tasks with respect to the reference labels. To better understand the model’s failure patterns, we employed an error analysis tool in Azure Machine Learning. This tool trains a decision tree to identify partitions of data on which the model underperforms, according to attributes that are most predictive of mistakes. It is based on work by our MSR colleagues Besmira Nushi, Ece Kamar, and Eric Horvitz: Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing System Failure – Microsoft Research.
This kind of error analysis is not often found in healthcare-related ML studies, we believe it is crucial for providing transparency and actionable insights about a model’s behavior. The analysis may also be useful after deployment if presented as reliability information alongside the model’s predictions.
You can read the full paper – “Hierarchical Analysis of Visual COVID-19 Features from Chest Radiographs (opens in new tab)” that was presented as part of ICML’s 1st workshop on “Interpretable Machine Learning in Healthcare (opens in new tab)”.