November 2021 Investor NewsletterHealthcare Friends -Today, let’s talk about the Achilles’ heel of healthcare AI — quality standards. Artificial intelligence dramatically alters the way clinicians work, but quality standards for healthcare AI are confusing and disorganized at best, despite billions of dollars having been invested over the past decade. Two out of three health system AI initiatives fail — usually due to underperformance and excessive algorithm noise.Underperformance stems from limited validation. An algorithm needs to perform well on the data set it was trained on,andthat performance must be generalizable across patient populations. Validation on the training data set establishes the baseline, but validation on the health system’s specific patient population is often missed.
94% of published AI algorithms are not properly validated.
Algorithms that don’t perform well in the real-world harm patients, and the noise related to false positives cause clinicians to pull away from the entire technology category. When this happens too frequently, it causes even responsibly developed and validated solutions to suffer from the same suspicion, leading to a chilling effect on the field.So as clinicians and health systems grow impatient and disenchanted with AI, how do we avoid the next healthcare AI winter?
Key takeaways:
Reactions to EpicASTAT investigation on an Epic algorithmintended to predict sepsis in seriously ill patients revealed that it routinely triggered false alarms and failed to identify the condition in advance.Here are the highlights from the article:
But, it’s not entirely Epic’s fault. As far as many providers are concerned, if you don’t have AI, you aren’t relevant. The promises being made by AI companies battling for awareness in a crowded space can be misleading and extreme.Healthcare’s ongoing AI Gold Rushhas pressured vendors to churn out algorithms left and right, without sufficient objective review of the algorithm’s performance. This isn’t helped by the lack of standardization of quality for Healthcare AI.Non-validated healthcare AI is also worsening healthcare inequality.A recent Science paperfound that a UnitedHealth AI algorithm was prioritizing care for healthier white patients over sicker black patients because it was trained to predict healthcarecostsoverillness. The study found that black patients generate lesser medical expenses because of reduced access to adequate medical care, regardless of their sickness level (measured by the number of chronic conditions).Current Validation StandardsLeading healthcare AI vendors are trained to continuously produce algorithms, which learn from massive datasets, that help clinicians across diagnosis and peer-review. These are the current metrics algorithm developers are concerned with...
While these metrics are highly technical and can verify an algorithm’s accuracy from the dataset it was built on, they carry little water when it comes to the algorithm’s performance in real-world patient care. For example, it is the norm, not the exception, for a health system’s patient population to deviate significantly from the population the algorithm was trained on.One particularly infamous example of this is IBM Watson’s oncology AI system, which was trained at Memorial Sloan Kettering and then deployed at MD Anderson without considering the significant differences between the two institutions’ oncology practice patterns. You can learn more about the resulting disaster in amulti-part 2017 Stat exposethat reads very similarly to this recent 2021 Epic sepsis coverage.What Metrics Should Be IncludedHealthcare already faces more pain points than most industries in adopting machine learning (ML) solutions. Between data security and lack of awareness of available tools, the last thing a health system needs is to realize that the algorithms they finally implemented are wasting clinician time with inaccurate results.
Answering these questions for ML adoption requires metrics that measure the success of algorithm deployment, not just development. So what are the validation metrics they should also be considering?
In the absence of industry, academic, or regulatory rigor in the space, health systems are entirely responsible for verifying that AI is effective for their patient population — a responsibility they readily acknowledge they are woefully unprepared to manage. Some notable thought leaders in the space (aside from Ferrum, of course) include the American College of RadiologyAI-LAB platformfor ongoing, local evaluation of commercial AI models.Developing an AI implementation strategy has now become a universal component of health system IT viability. However, without first resolving this fundamental trust issue that divides health systems and AI vendors, I can't think of a path forward for meaningful AI use in mainstream healthcare.Until next time,Pelu TranCEO, Ferrum health
Healthcare AI news you might have missed