We all know that 2023 was the year of generative AI. For those interested in a high-quality timeline of non-healthcare AI in 2023, I recommend this summary.
What did that mean for healthcare? The physicians at Machine Learning for MDs are some of the most knowledgeable experts at the intersection of healthcare and AI, and can speak from actual clinical and research experience about the technology. They were generous enough to add their thoughts below:
The Rise of AI Scribes
Matt Sakumoto, MD: “The area with the most eyeballs and the most attention has been the AI Scribes. Crowded space, but a strong commitment from most large health systems to pick something/anything to decrease the documentation burden for their clinicians.”
AI scribes, broadly described as technology that listens into a conversation and puts it into a medical note format, were implemented this year. You can hear the silent cheers of the mostly clinic-based primary care doctors as their time spent typing in the EHR decreased. There’s still a lot of room for improvement with many of them; some still require copy and pasting the note into the EHR, the majority don’t put in orders based on the physician’s conversation, and they’re often tethered to a physical computer so don’t capture many of the phone calls and other physician interactions that need to be documented. However, it feels like such a relief to most clinicians - as well as a screamingly obvious use case for generative AI - that it still seems like a big win. Mayo Clinic’s partner claims a “10 percentage point increase in efficiency agreement among providers using speech recognition between surveys and >90% reduction in documentation created by transcription”. The Northern California Permanente Medical Group started using Nabla for outpatient visits. Anecdotal reports from friends who use these systems are glowing.
Flawed headlines about AI being better than doctors
AI passed the USMLE
We kicked off the year with announcements about doctors being replaced by AI because they had passed the USMLE exams, answering correctly about 60% of the time. Subsequent models including GPT4 and Med-PaLM 2 scored in the 85-90% range. As multiple experts and studies have pointed out, USMLE knowledge is really a small part of what doctors know and an even smaller part of what they actually do. As I like to point out, I have had plenty of medical students and residents who can quote exam-style facts but struggle to perform some of the basic physician skills.
Chatbots vs doctors
On April 28, JAMA published a study of chatbot vs physician responses on Reddit, with the conclusion in the actual article “using AI assistants might improve responses, lower clinician burnout, and improve patient outcomes” but the conclusion in the popular press being “Chatbots are more empathetic than doctors”. The study compared written responses from physicians and chatbots to real-world health questions posted on Reddit’s AskDocs forum. The evaluators preferred the chatbot responses 79% of the time and rated them as higher quality and more empathetic. Not surprisingly, physician responses were shorter than a chatbot, because physicians have limited time and actual patients to see, and the study didn’t evaluate accuracy of responses. Inaccurate but empathetic responses don’t really help.
Health tech crashes
Health tech investment had some big years with several $1B+ acquisitions.
But current predictions are that the valuations will fall from $29B in 2021 to $10B in 2023. Note: Bessemer’s report on healthtech is a great read.
Chatbots force us to confront inherent societal biases
LLMs reflect the information they’re trained with and therefore hold a mirror to our biases as a society. These biases can be devastating to many, and there are clear examples in the job market and the financial sector of AI reflecting and sometimes perpetuating these biases.
In one highly publicized example, ChatGPT interprets ambiguous sentences as the doctor being male and the nurse being female, even insisting there’s a typographical error when the logical conclusion is that the nurse is male.
The growing societal awareness of bias in general and specifically in AI has already led to changes in non-AI tools in healthcare. Recently the American College of Cardiology removed race from its atherosclerotic cardiovascular disease risk algorithm.
AI for administrative uses
Billing, coding and prior authorizations have been increasing use cases for AI. McKinsey estimates that 50-75% of prior authorization tasks could be handled by AI.
Though many celebrated the decreased time spent on prior authorizations by clinics, the insurance companies also started using AI to deny benefits. A lawsuit alleges that United Health “pressured its medical staff to cut off payments for seriously ill patients in lockstep with a computer algorithm’s calculations, denying rehabilitation care for older and disabled Americans as profits soared” and used a tool with a 90% error rate. We can expect to see similar cases in the coming years as companies push the limits of what is acceptable use of AI in service of profit.
My other expectation for AI-billing: the scribes and other AI-enabled billing services will increase the percentages of higher level visit codes, which will lead insurers to decrease reimbursements for those codes so that a level 2 code will be billed similarly to a current level 1, etc. You heard it here first!
The FDA issued guidance on AI in Software as a Medical Device
The FDA published guidance on AI/ML software in SaMD, which basically requires information about how and when the algorithm would change based on ‘learning’. For now it’s still a radiology-heavy field:
“79% of devices authorized in 2023 are in Radiology (85)
9% in Cardiovascular (10)
5% in Neurology (5)
4% in Gastroenterology/Urology (4)
2% in Anesthesiology (2)
1% each in Ear, Nose and Throat (1), and Ophthalmic (1).
As Patricia Raciti, MD notes, there are many non-radiology use cases for AI: “In 2023, Paige received FDA clearance for its Class II device, Paige Prostate Detect, for which the FDA created a new category of device: ‘software algorithm device to assist users in digital pathology is an in vitro diagnostic device intended to evaluate acquired scanned pathology whole slide images.’ The AI is assistive so its standalone performance numbers (while pretty darn good) don’t matter—it’s the human+AI that matters, ultimately, for the patient’s diagnosis. And showing that human’s diagnostic ability improved with the use of the AI showed that the user interface made sense, the AI found the right focus of cancer on a huge slide and displayed it to the human in a non-confusing and clear way, so that they could choose to accept or reject it.”
Governmental and major organizations make statements on healthcare AI
The two major governments to propose AI action were the UK and the US. The US issued an Executive Order on AI that ordered the Department of Health and Human Services to focus on (among many health-related issues) personalized medicine, healthcare data, and health equity:
"advance responsible AI innovation by a wide range of healthcare technology developers that promotes the welfare of patients and workers in the healthcare sector"
(i) collaborating with appropriate private sector actors through HHS programs that may support the advancement of AI-enabled tools that develop personalized immune-response profiles for patients, consistent with section 4 of this order;
(ii) prioritizing the allocation of 2024 Leading Edge Acceleration Project cooperative agreement awards to initiatives that explore ways to improve healthcare-data quality to support the responsible development of AI tools for clinical care, real-world-evidence programs, population health, public health, and related research; and
(iii) accelerating grants awarded through the National Institutes of Health Artificial Intelligence/Machine Learning Consortium to Advance Health Equity and Researcher Diversity (AIM-AHEAD) program and showcasing current AIM-AHEAD activities in underserved communities.
It also ordered the Department of Veterans affairs to:
“advance the development of AI systems that improve the quality of veterans’ healthcare, and in order to support small businesses’ innovative capacity"
"(i) host two 3-month nationwide AI Tech Sprint competitions;”
Note that the VA recently announced it recruited its millionth veteran for its Million Veteran Program, which will help researchers determine how “genes, lifestyle, military experiences, and exposures affect health and wellness in Veterans.”
The WHO issued a statement on healthcare AI listing key ethical principles for the use of AI in healthcare.
The AMA came out with a statement similarly focused on ethical implementation of AI and the importance of physicians in the loop. Some interesting outtakes:
Health systems “should not utilize AI systems or technologies that introduce overall or disparate risk that is beyond their capabilities to mitigate”, which I absolutely agree with and also wonder how the health system will be able to identify that threshold.
They also state that “implementation and utilization of AI should avoid exacerbating clinician burden”, which is a real concern. Hopefully we all learned from EHR implementation.
AI predictions previously unrecognized patterns
Similar to the finding a few years ago that AI could predict sex based on a retinal exam, AI tools have found patterns that were previously unrecognized:
Parkinson’s disease based on blood samples, images of patient-derived stem cells, voice, and disease severity based on videos of finger tapping
Diabetes based on voice
Overall a great year for healthcare AI with lots of opportunities to come. Next week I’ll review the advances ML for MDs is most excited about for 2024!