All the data we cannot capture
How can AI get it right if we don't give it the most helpful information?
“I just keep burping, doc,” he said as he burped. He was scheduled for an EGD, the 15th of 20 patients I had on my schedule as an anesthesiologist that day at a busy academic medical center. I was eight months pregnant, exhausted, and my feet were so swollen that they somehow enveloped my Danskos.
Most of the patients in the GI suite had similar complaints of dyspepsia and abdominal pain, and this patient’s story was familiar. He had been admitted two days before with a kidney infection but had been complaining about his burping so much that the team gave in and ordered an EGD. His labs were pretty normal, and he had the expected constellation of ailments for a gentleman his age.
As we were talking, I started to feel uncomfortable. He just looked…bad. Objectively, he seemed better off than many of my other patients. He was awake and talking, his vitals were stable, and his only concern was burping. But I just had a bad feeling about him, based on…nothing that I could actually identify.
I called his attending and had an awkward conversation that mostly consisted of me saying I had absolutely no evidence for anything bad, but I was more worried than I should be about taking this patient to the OR. I had been out of residency about five years by then and had much less concern about possibly looking like an idiot than I would have when I had just finished training.
“Is there anything else you’re worried about with him?” I asked, “Maybe something that’s not in his chart?”
I could hear the mouse click five stories above me as he pulled up the patient’s chart.
“Well, we did send a troponin last night but it wasn’t back this morning,” he said. More mouse clicks. “Uh oh, that’s bad, it came back just after he was sent for his EGD and it’s positive. Sorry about that, we’ll cancel the EGD order.”
Whether you call it luck or clinical intuition, I’m positive an AI model with the same information I had wouldn’t have flagged this patient as concerning. Moreover, if I had an AI assistant giving me a low risk score for him, I might have overridden my intuition and not made that phone call.
Clinical Intuition
My experience in the GI suite was not unusual. Clinical intuition is common, not very well-studied, and fairly predictive. One study of general practitioners noted that their “feeling ‘there is something wrong’ is the best predictor among all signs and symptoms” of pediatric infections”.
Intuition is attributed to the non-analytical reasoning type of decision- making, often referred to as System 1, which is fast and automatic. In contrast, analytical reasoning or System 2, is deliberative and rational. These modes of thinking were popularized by the behavioral economist Daniel Kahneman. Physicians need to recognize patterns all the time: of patient histories, symptoms, and responses. The more data they have as tacit knowledge, the more nuanced their System 1 decision-making is.
Kannengiesser, Udo. Design Thinking, Fast and Slow: A Framework for Kahneman's Dual-System Theory in Design, doi: 10.1017/dsj.2019.9
When I saw the patient burping, something about him didn’t fit in my expected pattern, activating System 1. My analytical side, System 2, searched for evidence to support my decision and couldn’t find any in his normal labs and vital signs. But some nuance - perhaps his voice, his tone, or some other small detail - pushed me to override my analytical system.
Crucial clinical data is lost in the EHR
It would be impossible to write down every detail about a patient, and much of it isn’t written down for a variety of legitimate reasons. Clinicians worry about malpractice litigation and conveying vague or unhelpful information (like “he looks bad”). The advent of the open medical records means that there’s increased concern about inadvertently insulting the patient when writing something in the EHR.
Additionally, uncaptured is distinct from unstructured; about 80% of healthcare information traditionally has been functionally uncaptured from a data science standpoint because it was so time-consuming and expensive to look for information in unstructured text in the EHR.
Doc-to-doc communication
How many times have you been given a heads up about a patient? I bet it’s more than you can count. I’ve had other doctors give me heads up about patient behavior issues, sensitive social context they didn’t want to put in the chart, and everything from “her husband is a malpractice lawyer” to “her whole family is now on opioids from different providers” to “I doubt he’ll make it much longer but the family feels guilty”.
Nurse-to-doc communication
The gap between what’s in the chart and what’s actually important is even greater with nursing-physician communication. For some reason we’ve decided that the people who spend the most time with patients should describe them in a series of checkboxes. Narrative nursing summaries are almost non-existent outside fields like hospice or a few multidisciplinary conditions.
Any good physician will tell you how much they depend on nursing assessments to make decisions. There’s a saying that you can tell who the bad residents will be in the first week: the ones who ignore or disparage the nurses. It’s simply not possible to be a great physician without listening to nurses. The AI models, though, don’t have access to this tremendous amount of data nurses share verbally with physicians. Patients with the same click boxes can look wildly different when the nurse describes the patient.
Data was lost with the dissolution of the housecall
One of my favorite parts of my hospice fellowship was getting to go into peoples’ homes. I got to see their neighbors, their pets, and how hard it would be for them to go up their bedroom stairs as the cancer grew farther into their bones. The end of the era of housecalls happened during a time when Western medicine focused on efficiency, and there was no concern at the time that less context would lead to worse care.
Can we capture some of this precious information?
AI models are giant pattern-recognition machines; they should theoretically be great at an analogous version of System 1 “thinking”. But right now we’re training them with only part of a patient’s story. I worry about predictions being made by AI models that don’t have access to this kind of informal data. When I used to run quality reviews, we always talked to the team as the first step, which was a tacit acknowledgement that most of the important data points probably weren’t in the EHR. As long as that informal knowledge is missing, I seriously doubt AI models will be as accurate as they could be, especially with big-picture tasks like predicting readmissions and pain scores.
So much of patient outcomes relies on subtle details that are lost in a blunt instrument like an EHR. The AI scribes have a great opportunity to capture some of this data as I mentioned last week in the form of patient affect and voice quality. They also may capture some subtle details in the patient interview that otherwise would have been lost or glossed over. The challenge as we start to rely more on AI models for healthcare is how to non-invasively, privately capture some of this data to make better predictions, while also continuing to rely on our intuition and judgment. And continuing to acknowledge that our clinical intuition, in its very human form, can save lives.
Spot on. Most important data for chronic disease mgmt and better patient outcomes does not exist in EHR, claims, labs, RPM, machines right now.