HomeNews & TopicsRadiologyPredicting performance of artificial intelligence tools

Predicting performance of artificial intelligence tools

Published on

Scientists at UHN’s Toronto General Hospital Research Institute (TGHRI) have developed an improved method for evaluating the performance of artificial intelligence (AI) models across various health care settings.

As health care datasets become larger and more complex, the use of AI for the analysis of these datasets is gaining traction. 

Medical information can take the form of unstructured data such as medical images, electrocardiograms (ECGs), and text from clinical notes. Despite advancements in AI that have produced tools capable of analyzing medical images and clinical language, it remains challenging to predict their effectiveness in different health care settings without testing on new and varied data from each setting.

“For AI tools to be truly safe and effective for patient care, they must perform reliably across different situations and patient groups, a concept known as generalizability, which requires accurate performance estimation,” says Cathy Ong Ly, doctoral student at TGHRI and co-first author of the study. 

“We sought to address this challenge of estimating AI model accuracy by analyzing 13 datasets across different modalities such as X-rays, CT scans, ECGs, clinical notes, and lung sound recordings.”

When the team tested various AI models on this data, they found that their performance was often overestimated by about 20 per cent on average.

“We propose that this overestimation is due to data acquisition bias (DAB), a natural occurrence when data for these studies is retrospectively collected from regular medical care,” says Dr. Chris McIntosh, a scientist at TGHRI and senior author of the study.

(L to R), Cathy Ong Ly and Balagopal Unnikrishnan are doctoral students in the lab of Dr. Chris McIntosh, a scientist at UHN’s Toronto General Hospital Research Institute. Photo: UHN Research Communications

“Generally speaking, AI might focus on irrelevant patterns in the data instead of what really matters for the task,” adds Dr. McIntosh,” who is also an assistant professor in the Department of Medical Biophysics at the University of Toronto (U of T).

“Different hospital departments may use different equipment or settings and have different patient acquisition conditions,” says Dr. McIntosh, who also holds the Chair in Artificial Intelligence and Medical Imaging at the Joint Department of Medical Imaging at UHN and the Department of Medical Imaging at U of T. “These variations, which might be imperceptible to researchers and clinicians, can be detected by AI algorithms. 

“When models are trained on this data, they might rely on these subtle differences – like how a medical image was taken – rather than the actual medical content, to make predictions.”

An example of this bias is how patients suspected of having interstitial lung disease are often directed towards specific imaging techniques meant to confirm the diagnosis, while those without suspicion get more general scans.

The algorithm will appear highly accurate at the hospital the data was trained on, but when deployed for clinical care at another hospital with different scanners, the accuracy will drop, potentially putting patients at risk.

To address this issue, the researchers developed and proposed an open-source accuracy estimate called PEst that corrects for bias and provides more accurate estimates of a model’s external performance.

“Our method, which corrects for hidden patterns and biases in the data, predicts models performance on new datasets with an accuracy margin within four per cent of the actual results,” says Balagopal Unnikrishnan, doctoral student at TGHRI and co-first author of the study.

Given how crucial the accuracy of AI models is in health care, where recommendations can significantly impact patient outcomes, these findings will help enable safer and more widespread use of AI and support the development of new medical AI technology. 

This study was a truly multidisciplinary effort across UHN to measure the impact of these biases in a diverse array of modalities and diseases.

This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), The Princess Margaret Cancer Foundation, and UHN Foundation. Data for this study was supported by foundation investments in the Digital Cardiovascular Health Platform including UHN’s Peter Munk Cardiac Centre and Ted Rogers Centre for Heart Research and MIRA through Cancer Digital Intelligence.

Dr. Chris McIntosh is an Assistant Professor in the Department of Medical Biophysics at the University of Toronto (U of T). He holds the Chair in Artificial Intelligence and Medical Imaging at the Joint Department of Medical Imaging at UHN and the Department of Medical Imaging at U of T.

Latest articles

New approach opens door to better-targeted treatments and faster drug discovery for complex diseases

McGill University researchers have developed an artificial intelligence tool that can identify small groups of cells most responsible...

Canadian Cancer Society urges lowering colorectal cancer screening age to 45

The Canadian Cancer Society (CCS) is calling on provinces and territories to lower the...

Unleashing natural killer cells against cancer

Scientists have developed a strategy to boost the cancer-fighting power of natural killer (NK)...

Doctors report false health information, lack of health data sharing put patient care at risk

Canada's doctors are concerned that false health information and disconnected health systems are putting...

More like this

First-in-Canada case of sustained HIV remission

HN Summary • A first-in-Canada case shows a patient achieving sustained HIV remission following a...

UHN researchers investigate new therapies as colon cancer rises among young patients

HN Summary • Colorectal cancer is rising among younger adults, prompting UHN researchers to investigate...

HHS lung cancer patient thrives thanks to research trial

HN Summary • A Nurse Practitioner (NP) pilot in Niagara Health’s ED has significantly reduced...

Robotic-assisted knee replacement surgery showing higher rate of complications

HN Summary • A large Ontario-based study found robotic-assisted total knee replacement is linked to...

A single question leads to better patient experience in the Emergency Department

HN Summary • A Nurse Practitioner (NP) pilot in Niagara Health’s ED has significantly reduced...

Osler transforms surgical wait times through innovation and leadership

HN Summary • Osler reduced surgical wait times significantly post-pandemic, now completing over 96% of...