Last year, headlines describing research into artificial intelligence (AI) were eye-catching, to say the least.
The idea that AI-powered chatbots might be able to generate relevant answers to patient questions is not surprising at first glance. After all, ChatGPT boasts of passing the Wharton MBA final exam, writing a book in a few hours, and composing original music.
But are they more empathetic than doctors? ah. Before assigning the final honors regarding quality and empathy to one side or another, let's take another look.
What tasks is AI taking on in the medical field?
Already, the list of medical applications for AI is rapidly growing, including writing doctor's notes, suggesting diagnoses, helping read X-rays and MRI scans, and monitoring real-time health data such as heart rate and oxygen levels. contained.
But the idea that AI-generated answers might be more empathetic than a real doctor struck me as both surprising and sad. Can even the most advanced machines outperform doctors in demonstrating this important, especially human, virtue?
Can AI provide appropriate answers to patient questions?
That's an interesting question.
Imagine you call your doctor's office with a question about one of your medications. Later that day, a clinician from your health care team will call you again to discuss it.
Now imagine another scenario. Ask a question via email or text and receive a computer-generated answer using AI within minutes. How do the quality of medical answers in these two situations compare? And how do they compare in terms of empathy?
To answer these questions, the researchers collected 195 questions and answers from anonymous users of online social media sites and posed them to doctors who volunteered to answer them. The questions were later sent to his ChatGPT and the chatbot's answers were collected.
A panel of three doctors or nurses then rated both sets of responses for quality and empathy. Panelists were asked, “Which answer was better?” On a 5-point scale. Quality rating options are very poor, poor, acceptable, good, or very good. The empathy rating options were: “not empathetic,” “slightly empathetic,” “moderately empathetic,” “empathetic,” and “very empathetic.”
What did the research find?
The results were not as good. Almost 80% of responses considered ChatGPT to be better than doctors.
- Good or very high quality answers: ChatGPT received these ratings in 78% of responses, while physicians only gave ratings in 22% of responses.
- Empathetic or highly empathetic answers: ChatGPT's score was 45% and physician's score was 4.6%.
In particular, response length was much shorter for doctors (average 52 words) than for ChatGPT (average 211 words).
Like I said, it's not even close. So were these breathtaking headlines appropriate after all?
Not so fast: A key limitation of this AI research
This study was not designed to answer two important questions:
- Will AI responses provide accurate medical information and improve patient health while avoiding confusion and harm?
- Will patients accept the idea that a bot might answer the questions they ask their doctors?
And it had some significant limitations.
- Evaluate and compare answers: Raters applied untested subjective criteria for quality and empathy. The point is, they don't do an actual evaluation. Accuracy Of the answers. Answers regarding fabrication, an issue raised by ChatGPT, were also not evaluated.
- Differences in answer length: A more detailed answer may seem to reflect patience or concern. Therefore, high ratings of empathy may have more to do with word count than true empathy.
- Incomplete blinding: To minimize bias, raters were blinded to whether responses came from physicians or ChatGPT. This is a common research technique called “blinding.” However, AI-generated communications did not always sound exactly like humans, and the AI's responses were quite lengthy. Therefore, raters may not have been blinded for at least some responses.
conclusion
Could doctors learn something about expressing empathy from AI-generated answers? Perhaps. Does AI work well as a collaborative tool that generates answers for doctors to review and modify? Indeed, some health systems are already leveraging AI in this way.
However, it seems premature to rely on AI answers to patient questions without solid evidence of their accuracy or actual oversight by medical professionals. This study was not designed to provide either.
ChatGPT agrees with this, by the way. I asked if ChatGPT is better at answering medical questions than doctors. The answer was no.
More research will be needed to know when the AI genie will be able to answer patients' questions. We may not be there yet, but we're getting closer.
I want more information
About the research? Read answers created by doctors and chatbots, including answers to concerns about the effects of swallowing toothpicks.