Healthcare and Data Brokers

The Sacklers show up in another unsavory business

Mar 05, 2024

Data Brokers

The increasing value of personal data has been well-publicized with social media companies and others, whose confusing fine print allows them to use a huge amount of information about people. But fewer people realize the big business of healthcare data brokers.

The sale of patient data goes beyond medical records. Companies like 23andme also sell “customer” data, and health apps like fitness trackers usually contain some agreement that allows them to sell data as well. Even mental health information is easily available for as little as $275 for 5000 records.

U.S. marketing spend on third-party audience data 2021 | Statista

Brief history of data brokerage regulation

This kind of data aggregation goes back years, including LBJ’s failed proposal for a national database in the 1960s and subsequent legislation that mandated transparency but did not actually stop anyone from aggregating a lot of data. In 1990, one company tried to sell a CD ROM with the personal information of 120 million Americans, which was stopped before it went to market. There’s still no registry of data brokers, so these companies can operate mostly out of sight, which is how they like it.

Since then, 12 states have enacted consumer privacy laws. California was the first, and several legislatures currently have bills that could become law in the near future. Most of them are modeled on California’s law, several focus on childrens’ data, and they vary as to whether consumers can delete or correct data. Indiana’s includes a provision that brokers that handle more than 100,000 consumer records must register with the state.

https://pro.bloomberglaw.com/insights/privacy/state-privacy-legislation-tracker

Notably, data protections are excluded if the data is deidentified, with California’s law defining deidentified as “information that cannot reasonably be used to infer information about or otherwise be linked to, a particular consumer”.

Physician data is also up for sale

The data aggregation doesn’t stop at patients, of course. All physicians produce an enormous amount of data because we are constantly writing and ordering things. Imagine how much data you would assume Amazon was collecting on you if you wrote 500 words about every purchase (patient note) you made, why you made it, and bought a bunch of related stuff (ordering drugs and tests) immediately afterwards. You should assume that same information is being aggregated about you by data brokers in some form. The nice state of Vermont actually tried to protect doctors from this with a statute restricting the sale of physician prescribing patterns, but lost on First Amendment grounds in the 2011 case of Sorrell vs IMS Health.

The rise of the giant data aggregators

IMS Health is the original aggregator of patient data, started with partners that included Arthur Sackler, whose heirs have been in the news frequently. IMS merged with another data aggregation company in 2017 to form IQIVIA, which has a market cap of $43B. With a B. In a particularly creepy example, I found an article from IQIVIA called “Physician-Level Digital Data: What Are Physicians Reading About Most?” which purports to track our reading habits on an individual level. It boasts that their tool “follows HCPs on their self-navigated online journeys, which reveals important insights into the content they consume—some of it surprising.” Although I have some sense that my data is being constantly pulled into some analytic machine, the idea that some algorithms are tracking me specifically because of my physician status bothers me.

Epic is a major data aggregator

Cosmos is Epic’s data aggregation platform, with “234 million patient records from over 10.4 billion encounters, representing patients in all 50 states.” The platform “includes patient-generated health data (PGHD), birth records, vitals, and social drivers like transportation and financial security assessments.”

Epic created Epic Research to do studies with this database, which is not publicly available. It doesn’t publish in peer-reviewed journals (and thus doesn’t need to make its datasets available to reviewers), and it’s not clear to me how they’re prioritizing the use of this enormous trove of valuable information. There’s something irksome about a private company holding the information of the vast majority of the citizenry and not doing more to make it available to researchers. Of course, Epic claims they’re focused on privacy, but even the former CMS Director said publicly that “ “The disingenuous efforts by certain private actors to use privacy—vital as it is—as a pretext for holding patient data hostage is an embarrassment to the industry.”

Who knew EHR contracting could be so interesting?

My main question is why Epic owns all this data in the first place. When people are asked who owns their data, I’m quite sure “the EHR software company” is not a likely answer. As far as I can tell, it’s related to contractual agreements with Epic and their health system partners, which have apparently signed away their patients’ information in the process of paying a ton of money for an electronic medical record. And some of the health systems would have to pay to get access to those records - records created with hours and hours of patient, physician, and staff time and energy - if they changed EHRs. This would be like Microsoft owning all the Powerpoint slides you had ever made, and if you switched to Google Slides you’d have to pay Microsoft to use the slides you had made.

To make it even more confusing, one author points out that data ownership may vary in each of the following situations:

“Data created personally by the patient and not used at all by the doctor (something in the tethered PHR)
Data created personally by the patient but used by the doctor for care of the patient
Data mutually created and managed by the patient and doctor (transactional data like claims codes, messages between employees of the physician practice, etc.)
Data institutionally created by the doctor but available to the patient (e.g. the doctor’s private “transactional” notes)
Data mutually created by the patient, doctor, and insurance company (mostly transactional)
Data mutually created by others that patient didn’t directly interact with (covered by HIPAA BA)”

Is data really de-identified?

We’ve seen that de-identified data is treated differently than identifiable data. But modern AI algorithms make re-identification much more possible. One study of re-identification of chest X-rays with AI models demonstrated the ability “to identify whether two frontal chest X-ray images are from the same person with an AUC of 0.9940 and a classification accuracy of 95.55% [and can] reveal the same person even ten and more years after the initial scan”

Even 30 years ago it wasn’t that hard to re-identify patients:
“Harvard University professor Latanya Sweeney used such methods when she was a graduate student at the Massachusetts Institute of Technology in 1997 to identify then Massachusetts governor William Weld in publicly available hospital records. All she had to do was compare the supposedly anonymous hospital data about state employees to voter registration rolls for the city of Cambridge, where she knew the governor lived. Soon she was able to zero in on certain records based on age and gender that could have only belonged to Weld and that detailed a recent visit he made to a hospital, including his diagnosis and the prescriptions he took home with him.”

Why aren’t patients and physicians getting something in exchange for all the data they’re producing?

When we log in to Gmail or Instagram, we’re making a tacit deal with the company: I get this free service, and the company gets (some of) my data. But with healthcare data, that equation is missing a crucial piece: compensation for the patients who make up these databases. There’s a rising trade in consumers selling their data directly to companies. A third of B2C companies have paid surveys, for example, that are a straightforward exchange of cash for personal information. Why are patients not entitled to some kind of compensation for providing their digital bodies to corporations that profit from them? The patients who provide the most data are presumably those that have the most medical problems and therefore the patients from whom the data brokers are profiting the most.

Towards Empowering Patients and Protecting Data

The current state of patient data ownership and governance calls for a multi-faceted approach to empower patients and protect their data. This involves advocating for transparency, revisiting and clarifying the terms of EHR system contracts, and fostering a legal and ethical framework that prioritizes patient consent and control over their health information.

The parallel between cocaine's regulatory history and the current state of patient data regulation is not just in the societal impact but also in the need for a nuanced approach to governance. Just as the early 20th century saw a societal shift in understanding cocaine's risks and benefits, leading to comprehensive regulation, today's digital age calls for a similar reevaluation of how we handle patient data.

The goal is not to stifle innovation or to put an end to data-driven advances in healthcare. On the contrary, the aim should be to foster an environment where innovation thrives but does so within a framework that respects patient autonomy and privacy. This includes clarifying ownership of medical records, ensuring patients have a say in how their data is used, and perhaps most importantly, educating the public about the value and vulnerabilities of their medical information.

Sarah McKean

Mar 5, 2024

I found your point about patients and physicians getting compensation for their data interesting. I also wonder if we are counterintuitively less likely to offer our data up through a paid survey because of the clear transactional nature of it; we become alerted of the fact that our data is being taken. When our data is taken from under our noses, we don't have to consciously make the decision to give it away.

Expand full comment

G M

Mar 8, 2024

There's a LOT of money spent in "cleaning and repurposing" data for PHI - imagine how much of this could be used to actually pay for healthcare - perhaps by incentivizing physicians and patients to share their data, instead of acting as though the work needed to collect it isn't equally if not important. Would require redesigning the old ETL process, but given healthcare's money problems, might be worth it.

Machine Learning for MDs by Sarah Gebauer, MD

Discussion about this post

Ready for more?