In a usual management setting, after a person has had a heart attack or stroke, algorithmic risk models are used to calculate the risk of death for the patient. These algorithms or models utilize various factors such as age of the patient, gender, previous history, family history, ethnicity etc.
Treatment of the patient is often guided by these models. A new study has shown that in many cases these models fail to predict the risks accurately. This may lead to treatment choices that are unnecessary or ineffective and even risky for the patients. The new study was published in the Digital Medicine. The study is titled, “Identifying unreliable predictions in clinical risk models”.
Lead author Collin Stultz, a professor of electrical engineering and computer science at MIT and a cardiologist at Massachusetts General Hospital and professor of health sciences and technology, a member of MIT’s Institute for Medical Engineering and Sciences and Research Laboratory of Electronics, and an associate member of the Computer Science and Artificial Intelligence Laboratory, said, “Every risk model is evaluated on some dataset of patients, and even if it has high accuracy, it is never 100 percent accurate in practice. There are going to be some patients for which the model will get the wrong answer, and that can be disastrous.” He worked alongside other researchers from MIT, the MIT-IBM AI Lab, and the University of Massachusetts Medical School. One of the authors was MIT graduate student Paul Myers who led the study. They looked at the success of the model’s ability is predicting the risk. They said that validation of these models can help the physicians choose the appropriate treatment for these patients.
The team explains that these computer models can help predict the risk of adverse events in the patients of heart attack and stroke. These are machine-learning algorithms that look at the datasets and factors of each of the patients to assess the health outcomes. Stultz explained that these algorithms are usually accurate, “very little thought has gone into identifying when a model is likely to fail.” “We are trying to create a shift in the way that people think about these machine-learning models. Thinking about when to apply a model is really important because the consequence of being wrong can be fatal,” he added. He said that a high risk patient may not be classified correctly and thus may not receive the treatment he or she deserved and someone who is low risk if misclassified may receive unnecessarily risky treatment.
For this study they used the commonly used GRACE risk score (Global Registry of Acute Coronary Events) the team looked at the effects of the model on large databases of patients to see the accuracy of the model in its predictions. The model helps predict the risk of death of a patient six months after an acute coronary event. This model uses factors such as age, blood pressure, heart rate and other clinical signs and symptoms, explained the researchers. To see if the model was accurate they generated an “unreliability score” for the model. The score was between 0 and 1. If the score was close to 1, it meant that the prediction was more unreliable. Two risk score models could be compared using this unreliability score said the researchers.
Stultz said that if both risk models came up with different predictions, it meant that the reliability of either score was low. He added, “What we show in this paper is, if you look at patients who have the highest unreliability scores — in the top 1 percent — the risk prediction for that patient yields the same information as flipping a coin. For those patients, the GRACE score cannot discriminate between those who die and those who don’t. It’s completely useless for those patients.”
The team wrote in conclusion, “Using data from more than 40,000 patients in the Global Registry of Acute Coronary Events (GRACE), we demonstrate that patients with high unreliability scores form a subgroup in which the predictive model has both decreased accuracy and decreased discriminatory ability.”
The team found that several factors in the dataset could be predictors of unreliability of the scores and this included the age of the patient as well as other heart disease risk factors. Stultz said that this study model helped compare two risk models without creating a new model to assess the risk of the patient. He explained, “You don’t need access to the training dataset itself in order to compute this unreliability measurement, and that’s important because there are privacy issues that prevent these clinical datasets from being widely accessible to different people.”
As a next step, the team is now designing and reworking on the risk score model so that it is more reliable and predicts the patient outcome more accurately. They plan on including more patients in the database so that the models could be retrained and utilized in a more fruitful manner say the researchers. Stultz said, “If the model is simple enough, then retraining a model can be fast. You could imagine a whole suite of software integrated into the electronic health record that would automatically tell you whether a particular risk score is appropriate for a given patient, and then try to do things on the fly, like retrain new models that might be more appropriate.”
Myers, P.D., Ng, K., Severson, K. et al. Identifying unreliable predictions in clinical risk models. npj Digit. Med. 3, 8 (2020). https://doi.org/10.1038/s41746-019-0209-7, https://www.nature.com/articles/s41746-019-0209-7