AI in Physician Productivity and Performance
How we're measured and how other industries are using AI for employee monitoring
Physician Productivity
As a doctor, you are probably too aware of how productive you are at any moment. You likely know how productive you are compared to your colleagues. You might even know exactly how much money your productivity is making you, and how much money you’re losing if you are less productive. I certainly know how much money I’m “losing” when I’m waiting for a case to start.
This productivity model started in the late 80s, when researchers sought to standardize a way to pay doctors. Then they went around and asked doctors to basically rate a bunch of tasks on four dimensions:
Time
Mental effort and judgment
Technical skill and physical effort
Psychological stress
Isn’t it incredible that someone actually thought to include Psychological Stress as one of the dimensions of productivity? That you are considered more productive when stressed? And that they assume that exerting more mental effort makes you more productive?
Along the way, being productive went from a concept about goals to a very concrete payment model. Being productive has turned into code for making money. Previously, these were related, but separate, concepts. A doctor could be productive by helping a patient even if it didn’t culminate with putting a note in a chart, for example.
Crucially, this billing doesn’t measure how well you did the task, just how hard it was and how long it took you.
Physician Performance
After this pay standardization based on a bizarre formula for productivity, the government realized that it was possible for a doctor to take a long time and think hard but still not actually do a good job. Then quality metrics, often with some kind of financial incentive, were introduced. These “pay for performance” (P4P) metrics have taken many forms over the years in countries as diverse as China, Tanzania, and Germany. The term P4P usually excludes clinical RVUs, and includes quality, patient satisfaction, and resource utilization.
The results of pay for performance are mixed and the studies vary widely in measures studied, amount of money involved, and how often they give feedback. One large review estimated a 5% improvement in quality due to P4P plans, but studies at the VA and the US system haven’t shown a good correlation between paying for performance and individual physician quality or improvement.
Several companies have tried to analyze and capture physician quality based on evidence-based metrics and billing data (disclosure: I’ve consulted for Garner Health) and I think they do a much better job than CMS metrics at identifying high-quality physicians.
Current state
If we were all salespeople, measuring performance and productivity would be much easier. A sale is a concrete outcome that’s aligned with the company’s goal of making money. In medicine, we have two primary ways to measure physician performance. and quality assessments
RVUs: based on how many patients physicians see or procedures they do.
RVUs are roughly equivalent to salespeople being paid on commission; the more sales they make, the more money they make.
Similarly, the more patients doctors see, the more money they make
Quality metrics: A patient visit (the “sale” equivalent) may not be the same quality from one physician to the next.
Since there’s not a straightforward way to measure the quality of care a physician provides, we’ve made some up.
Unlike most free market systems, the better products (higher quality care) don’t get paid more. CMS has made some moves to add or subtract money based on quality, but the amounts aren’t meaningful yet.
Gaming the system
All metrics are vulnerable to Goodhart’s law:
“When a measure becomes a target, it ceases to be a good measure”
In other words, people start to game the system. Data from healthcare supports this concept: the metric being measured improves, and stops improving once it stops being measured.
We all know colleagues who have gamed the RVU system, or who have stopped seeing more difficult patients to improve their quality metrics (and therefore their paychecks). Behavioral economists will tell you that’s completely predictable and not an indictment of their character.
Any useful approach to measuring physician performance would involve frequent conversations about what metrics to use, access to high-quality data, consistent physician feedback, and awareness of unintended consequences.
Productivity and Performance in Other Fields
Of course, tracking performance and productivity is not unique to healthcare. Many companies reward people who bring in money like sales and investment banking but struggle to measure and improve performance and productivity in jobs and fields that require a lot of teamwork with outcomes that aren’t directly tied to the bottom line.
The field of productivity management has exploded as AI has increased measurement and analysis capabilities. Already, some of the productivity management practices sound like something out of a dystopian science fiction novel.
The Rise of Big Brother
In 2018, a Gartner survey showed 53% of large corporations were using some kind of employee tracking software. This trend accelerated during COVID when managers couldn’t track employees in the office as easily. Microsoft rolled out an employee productivity score with 73 parameters including how often they had their camera on during meetings and the number of hours spent on Microsoft software.
Most of the tracking software relies on computers to track employee time and productivity; some use keystrokes, and some take photos using the company’s computer to make sure the employee is at his desk. The data is detailed enough that some employers can intervene if the software shows that an employee was not actively working on a document or email, for example.
Some big companies have gone even farther with ambient monitoring; Walmart won a patent for a system that can listen for sounds like “rustling of bags” to determine if employees are busy working. Call centers monitor not only the number of minutes each employee is on the phone, but some now incorporate “sentiment analysis”, or the AI-rated index of how happy a customer sounds. And the systems aren’t just listening passively; Amazon uses a wearable haptic device to nudge employees’ hands in the right direction.
Summary
As with any technology, AI can be implemented oppressively or helpfully. I’m hopeful we can advocate for AI to help physicians mitigate the impact of a growing physician shortage and to improve the care we provide.
Over the next few weeks we’ll look at:
Models for measuring software engineer performance and productivity
The ways physician clinical performance is assessed
The future: best and worst case scenarios for using AI in physician performance and productivity monitoring and evaluation
If you’re a physician, join us at the ML for MDs Slack group, where we share resources and knowledge about the intersection of AI in healthcare.