AI Performance: Transparency in ECG-interpretation companies
Physician-friendly AI concepts by Machine Learning for MDs
A quick review of our month on AI Performance:
In Week 1, we discussed some metrics that are commonly used to evaluate AI performance
In Week 2, we looked at two papers on AI algorithms for ECGs.
This week, we’ll look at six companies that use AI algorithms for ECG analysis. We’ll focus mostly on metrics related to performance:
Primary user focus: Who is this product for?
Knowing the primary user will help you decide how high the performance has to be - if there’s an expert vs novice, you may not demand the performance be as high, for example
Data transparency: Can you tell what data was the model trained on?
The amount and variety of data the model was trained on will affect its performance in different practice settings
Performance: Can you tell how well the algorithm performs, ideally in a variety of settings, and with metrics like sensitivity, specificity, PPV, NPV, etc.
Explainability: Can providers see and understand why the algorithm is giving them that result?
Knowing why the model is performing as it is will affect how it’s used
Integration: Can you tell if the product can be easily integrated into your existing workflow and if it is compatible with your existing systems?
If you can’t get results easily, it won’t be used and the performance doesn’t matter
Let’s take a big picture view of these companies/products
Metric transparency table
Now let’s evaluate the metrics for each company according to this rubric for simplicity
Rubric:
0= No mention at all
1= Mentions in passing
2=Very minimal information
3=Some information provided but no details
4=Reasonable amount of information, missing key details
5=Comprehensive information provided
Detailed analysis of companies/products:
CardioAI
Primary user focus: “The web platform to help physicians or similarly qualified health care professionals to evaluate ECG findings.”
Data transparency: “The data from a ten-year population study has been used to make CardioAI® real.” Doesn’t say what the characteristics of these patients are.
Model transparency: “All records were initially labelled semi-automatically using the patent pending technology, and then verified by qualified and experienced cardiologists. Whereupon true labels were used to train the neural networks.” It sounds like the model was trained with some supervised learning plus human-reinforcement learning. “The software will mark all the required data on the tracing. The software also allows the physician to suggest a correction, e.g. to classify the event as the dominant beat.” They’re basically using the cardiologists to do continued human reinforcement learning.
Performance: “CardioAI® provides accurate and detailed annotation of stress, rest, and Holter electrocardiograms.” No information about what ‘accurate’ means.
Explainability: “The decision-making intelligence allows a physician to see a rationale behind the AI’s decision for a cardiac event as easy as clicking on the info button.” That’s actually pretty cool - so maybe the doctor can learn from the software too, or feel more comfortable overruling it.
Integration/compatibility: “Uploading can be performed automatically if CardioAI® has been integrated into an EHR system, or a device with ECG data acquisition software.” Those are some good compatibility options.
Anumana
Primary user focus: “Using state-of-the-art AI to decode subtle electrical signals in ECGs for earlier disease detection”. Not clear, maybe more broad than just physicians
Data transparency: “Access to multi-modal raw data from 11M+ patients through our parent company nference”. Nference collaborates with Mayo Clinic for their deidentified data. No information about patient characteristics in that group.
Model transparency: “Anumana develops and validates SaMD solutions with strategic partnerships for commercial deployment” It sounds like Mayo researchers develop some promising algorithms and Amunama commercializes them. But doesn’t directly say that they’re using the same algorithms that were used in the published papers or how/in what population they’re validated.
Performance: “State-of-the-art AI algorithms analyze standard ECGs to screen for silent cardiac changes that indicate a multitude of underlying cardiovascular disease, to power earlier diagnosis and intervention.” No direct performance data, but multiple papers published by academic partners on ECG algorithms including detection of low EF, like the paper here with AUC 0.9: “artificial intelligence-enhanced electrocardiogram to detect left ventricular systolic dysfunction had an AUC of 0.903 for the total cohort”). Also has some pre-submission phase algorithms for cardiac amyloidosis and pulmonary HTN.
Explainability: Nothing.
Integration/compatibility: “Deployment infrastructure is easily accessible for the clinical care team directly from the existing clinical workflow, and maintains zero-footprint status to enhance efficiency and reduce costs.” Not a lot of detail there but sounds like a good idea
PMCardio
Primary user focus: “PMcardio is an AI-powered, certified class IIb medical device enabling you to accurately diagnose and treat 38 cardiovascular diseases like an expert cardiologist.” Focus on EMTs/triage and primary care. I’d strongly disagree with the “treat” part of that statement.
Data transparency: They provided detailed information on each of the 6 datasets they used including number of patients in each, country of origin, age of patient, etc. “All participating physicians prospectively labeled every ECG.” I had to fill out a form including my email address to get a copy of their clinical validation stats.
Model transparency: “Participating physicians prospectively labeled all ECG test cases…details on annotation are available in the appendix” Fairly comprehensive information about labeling. Less information about the algorithm itself
Performance: By far the most comprehensive performance metrics of any company. Uses Matthews Correlation Coefficient as a summative metric, includes PPV, NPV, Sensitivity, Specificity and MCC for each disorder (STEMI, NSTEMI, etc). It makes clinical recommendations based on answers to some questions in the app (like is there PCI availability) but no performance data on that aspect.
Explainability: Nothing
Integration/compatibility: “Using image recognition, PMcardio can interpret ECG from any device. Simply take a photo of an ECG or upload it.” Seems very handy, not clear how it would integrate into an EHR
Of note, PMCardio has an OMI algorithm “coming soon”, likely based on or similar to the paper we reviewed last week!
MyoVista
Primary user focus: Primary care providers for screening of impaired LV relaxation via ECG
Data Transparency: Nothing
Model transparency: “HeartSciences uses continuous wavelet transform mathematics to extract the frequency content from the input ECG signal. This additional valuable information is used as part of the input in developing the AI models and helps improve overall model performance.” So…no idea what data they’re using or anything about the algorithm
Performance: “In May 2023, “today announces completion of patient enrollment for its MyoVista wavECG pivotal study for FDA De Novo submission.” Presumably there will be some kind of data on performance after those results are analyzed.
Explainability: “The wavECG LV Relaxation Algorithm Result is provided in both indicator and statement form and displays the result as either “Highly Negative”, “Negative”, “Borderline”, “Positive” or “Highly Positive” for LV relaxation abnormality. Does not sound very explainable
Integration/compatibility: Standalone device that performs the “wavECG analysis” for LV dysfunction plus a standard ECG interpretation. “EHR and HL7 compatible”. So can integrate into EHR.
Cardiomatics
Primary user focus: “Leading cardiologists and practitioners across the globe have successfully implemented the Cardiomatics AI-based platform”. Looks like mostly non-emergent use for cardiologists and PCPs doing Holter monitors; turnaround time is a few hours.
Data Transparency: “algorithms…have analyzed more than 3.5 million hours of ECG recordings.” No information on what kinds of patients.
Model transparency: Nothing
Performance: No data that I could find.
Explainability: Nothing.
Integration/compatibility: “Our solution is a truly hardware-agnostic platform for arrhythmia diagnostics, compatible with more than 25 common ECG recording devices.” Have to upload the ECG, send it to the platform, then receive a report back.
Idoven
Primary user focus: “The Willem ECG Analysis Platform is a cloud-based software application intended for use for the assessment of arrhythmias by qualified healthcare professionals using ECG data in subjects of 18 years of age and over.” Looks like they’re marketing to other health tech companies to integrate into products, for clinical trials, and for remote monitoring products including for insurers.
Data Transparency: “1.25M ECG hours in database, “10,200+ patients monitored in clinical trials”
Model transparency: “manually annotated for the sole purpose of AI training”
Performance: “Idoven applied its AI algorithms to a dataset of heart patients of over two years and demonstrated a reduction in false positive detections by 98%, with a sensitivity of 97% and specificity of 95%.” This is a very vague statement - reduction in false positive detections of what?
Explainability: Nothing
Integration/compatibility: “Our solution is device neutral and made available to customer via API and non-API tools to enable bi-directional data exchange.” That’s pretty good compatibility.
Conclusions
Now let’s draw some general conclusions about transparency for ECG analysis products:
The companies are not that focused on showing where they’re getting their data or how their products work.
Yes, they’re proprietary models, but knowing what kinds of patients the models were trained on is really important
The companies are not focused on showing how well their products work
This may in part because they’re not FDA approved and don’t want to make specific claims, but several of the companies are approved in Europe and still have very limited or non-specific performance data
Explainability is really not a priority
Interoperability and compatibility seem “easier” in many ways because they’ve been solved in other contexts
Summary
We’ve now seen that most companies do a pretty bad job at being transparent about many of the important aspects you’d want to know about how the AI actually works. I guess you could attribute this lack of reporting to the lack of a clear standard, though likely there’s just not a strong enough demand and market forces to make transparency worth it. Yet.
Next week, we’ll discuss the future of AI performance metrics.
I am very excited by this newsletter existing.