OSAIRIS: Lessons Learned from the Hospital-Based Implementation and Evaluation of an Open-Source Deep-Learning Model for Radiotherapy Image Segmentation

  • Alexandra Constantinou ,
  • Andrew Hoole ,
  • David C. Wong ,
  • G. S. Sagoo ,
  • ,
  • ,
  • Tom Griffiths ,
  • Amy Edwards ,
  • Andrew Robinson ,
  • Liam Stubbington ,
  • Niall Bolger ,
  • Yvonne Rimmer ,
  • Thiraviyam Elumalai ,
  • K. T. Jayaprakash ,
  • Richard Benson ,
  • Ian Gleeson ,
  • Rebecca Sen ,
  • Louisa Stockton ,
  • Tian Wang ,
  • Stephanie Brown ,
  • E. Gatfield ,
  • C. Sanghera ,
  • Alexandros Mourounas ,
  • Barry Evans ,
  • Anita Anthony ,
  • Renteng Hou ,
  • Marian Toomey ,
  • K. Wildschut ,
  • Aviva Grisby ,
  • Gill Barnett ,
  • Rose McMullen ,
  • Raj Jena

Clinical Oncology |

Several studies report the benefits and accuracy of using autosegmentation for organ at risk (OAR) outlining in radiotherapy treatment planning. Typically, evaluations focus on accuracy metrics, and other parameters such as perceived utility and safety are routinely ignored. Here we report our finding from the implementation and clinical evaluation of OSAIRIS, an open-source AI model for radiotherapy image segmentation, that was carried out as part of its development into a medical device. The device contours OARs in the head and neck and male pelvis (referred to as the prostate model), and is designed to be used as a time-saving workflow device, alongside a clinician. Unlike standard evaluation processes, which heavily rely on accuracy metrics alone, our evaluation sought to demonstrate the tangible benefits, quantify utility and assess risk within a specific clinical workflow. We evaluated the time-saving benefit this device affords to clinicians, and how this time-saving might be linked to accuracy metrics, as well as the clinicians’ assessment of the usability of the OSAIRIS contours in comparison to their colleagues’ contours and those from other commercial AI contouring devices. Our safety evaluation focused on whether clinicians can notice and correct any errors should they be included in the output of the device.
We found that OSAIRIS affords a significant time-saving of 36% (5.4 ± 2.1 minutes) when used for prostate contouring and 67% (30.3 ± 8.7 minutes) for head and neck contouring. Combining editing time data with accuracy metrics, we found the Hausdorff distance best correlated with editing-time, outperforming dice, the industry-standard, with a Spearman correlation coefficient of 0.70, and a Kendall coefficient of 0.52. Our safety and risk-mitigation exercise showed that anchoring bias is present when clinicians edit AI-generated contours, with the effect seemingly more pronounced for some structures over others. Most errors, however, were corrected by clinicians, with 72% of the head and neck errors 81% of the prostate errors removed in the editing step. Notably, our blinded clinician contour rating exercise showed that gold standard clinician contours are not rated more highly than the AI-generated contours.
We conclude that evaluations of AI in a clinical setting must consider the clinical workflow in which the device will be used, and not rely on accuracy metrics alone, in order to reliably assess the benefits, utility and safety of the device. The effects of human-AI inter-operation must be evaluated to accurately assess the practical usability and potential uptake of the technology, as demonstrated in our blinded clinical utility review. The clinical risks posed by the use of the device must be studied and mitigated as far as possible, and our ‘Mystery Shopping’ experiment provides a template for future such assessments.

论文与出版物下载

InnerEye – Deep Learning

22 9 月, 2020

This is a deep learning toolbox to train models on medical images (or more generally, 3D images). It integrates seamlessly with cloud computing in Azure.

InnerEye Inference API

14 5 月, 2021

InnerEye-Inference is a AppService webapp in python to run inference on medical imaging models trained with the InnerEye-DeepLearning toolkit. You can also integrate this with DICOM using the InnerEye-EdgeGateway.

InnerEye-CreateDataset

28 9 月, 2020

InnerEye-CreateDataset contains tools to convert medical datasets in DICOM-RT format to NIFTI. Datasets converted using this tool can be consumed directly by InnerEye-DeepLearning.

InnerEye-DICOM-RT

13 4 月, 2021

InnerEye-DICOM-RT contains tools to convert medical datasets in NIFTI format to DICOM-RT. Datasets converted using this tool can be consumed directly by InnerEye-DeepLearning. Most of the work is done by a .NET Core 2.1 project in RTConvert, written in C#. There is a very lightweight wrapper around this so that it can be consumed from Python. The wrapper relies on the PyPI package https://pypi.org/project/dotnetcore2/ which wraps up .NET Core 2.1.