Tesla is in the midst of conducting an unprecedented social experiment: testing drivers of its cars to see if they are safe enough operators to receive the company’s Full Self-Driving (FSD) Beta software update, which expands the car’s autonomous capabilities, most notably on city streets.
The company is automatically evaluating humans based on a safety score composed of five factors, including forward collision warnings per thousand miles driven, aggressive turning, and forced autopilot disengagements.
While the societal conversation around artificial intelligence tends to focus on machine abilities, Tesla’s experiment turns the spotlight onto the human: Is the driver responsible enough to be given the superpower?
As medical researchers, we realized this question may be at the heart of an exciting paradigm for making AI-assisted medicine a success, though it also poses additional questions: Are safety scores accurate and fair? Will human improvements be durable after the evaluation period once the incentive is earned? After all, interventions evaluated in the pristine setting of clinical studies often underwhelm when deployed in the real world, as shown in studies of drug adherence or weight loss maintenance.
As with self-driving cars, medical AI will not stop physicians who lack common sense from making out-of-context mistakes. If positioned between clear lane lines on the wrong side of the road, the car might drive itself without warning. Without oncoming traffic, the safety score may not even penalize the driver for such an egregious mistake.
In medicine, naively deployed machine learning models are no substitute for human attention and common sense. Such attention is necessary to understand how medical AI exploits context, which may include where data come from, when measurements are made, or the use of problematic labels like race, even when these labels appear to be hidden to human experts.
As opposed to being the sought-after expert, much of AI today is more like an eager and dutiful medical student who jumps on every decision an expert clinician makes and then predicts the next step the clinician would have taken anyway. Such behavior may be helpful as an explanatory or educational tool, but it means that context will always remain key.
AI is good at being ceaselessly vigilant, remembering everything it has seen, executing a harrowingly technical and often narrow task, and ruthlessly exploiting contextual information to improve performance. Given these properties, where in medicine — and for whom — should AI be expected to shine, and what might effective human-machine collaboration look like?
The experience with self-driving cars suggests that AI may improve the parts of medical behavior for which doctors are lax, tired, forgetful, or only intermittently attentive: things like ventilator pressure adjustment in the intensive care unit, individualized dosing of drugs, and anticipation of adverse drug reactions. The self-driving experience also suggests, perhaps counterintuitively at first, that the priority may be to equip only those physicians who are excellent in their ability to work safely in tandem with medical AI, which may not correspond to oft-cited measures of medical expertise. It is unwise to have AI do the part of the job that requires contextual awareness and common sense. AI is also not great at understanding human motivation or values. Instead, medical AI may provide a safety net only when physicians are doing their part in a human-machine partnership.
Tesla’s experiment also shows the power of human incentives, like receiving the FSD Beta software update, at least over the short term. The equivalent of the next full self-driving update for overburdened physicians may be AI automatically writing the clinical note after listening to a patient-physician encounter, or largely handling the billing adjudication process with an insurance company. Such benefits generate immediate short-term rewards rather than underspecified or illusory long-term promises.
In a dystopian path, a human performance-and-reward system in the hands of self-interested bureaucrats or governments could lead to clinicians being used or abused. It is disturbingly easy to imagine a scenario in which a physician sees a patient for longer than 10 minutes and an AI system effectively penalizes the physician by no longer helping communicate with the insurance company, or by reducing physician reimbursement. A physician “safety score” may incentivize extensive and unnecessary overtesting of patients with low prior probabilities of disease.
In medicine, now is the time to make sure medical versions of the full self-driving performance-and-reward system is used for good and not to make physicians cogs in the machine, a horde of Charlie Chaplins in Modern Times. Doing so is essential to ensure that effective human-machine collaboration is good for patients, good for doctors, and good for medical economics.
Arjun K. Manrai is in the Computational Health Informatics Program at Boston Children’s Hospital and is an assistant professor of pediatrics and biomedical informatics at Harvard Medical School. Isaac S. Kohane is professor and founding chair of the Department of Biomedical Informatics at Harvard Medical School.