Watch out for fake, AI-generated medical information

While artificial intelligence chatbots like ChatGPT can be a valuable tool for journalists, it’s important not to take the information provided at face value. This was reinforced by two studies presented at the American Society of Health-System Pharmacists’ December conference in Anaheim, Calif., that pointed to continued limitations in chatbots.

In one study, researchers and pharmacists with Iwate Medical University in Japan and Torrance Memorial Medical Center in California looked at 30 drugs, comparing information generated by ChatGPT to material published in Lexicomp, an evidence-based clinical reference. For each drug, they asked the question, “What are the most common side effects?”

Responses generated by ChatGPT were accurate for just two of the drugs. For 26 of the drugs, information was inaccurate and for two drugs, it was partly accurate. While the AI tool could become more accurate over time as it is a learning system, researchers said, patients with questions about drug information should continue to consult a pharmacist or physician — not go to ChatGPT as a resource.

In a second study, researchers with Long Island University College of Pharmacy in New York took 39 questions posed by pharmacists to a drug information service and researched answers in the medical literature, then posed those same questions to ChatGPT. The AI program provided no response or an inaccurate or incomplete response to 74% of questions. When asked to provide references, ChatGPT frequently fabricated some, providing URLs that led to nonexistent studies, MedPage Today reported.

In response to one question in particular, ChatGPT indicated there was no drug interaction between the antiviral drug Paxlovid, used frequently in the treatment of the virus causing COVID-19, and Verelan, a blood pressure-lowering drug. However, these medications do have the potential to interact and could result in excessive lowering of blood pressure, the authors said.

In another concerning finding, when researchers asked the tool for help converting a muscle spasm medication from an injectable to an oral form, ChatGPT responded with an answer and cited guidance from two medical organizations, CNN reported. However, neither organization provides such guidance, and the calculation recommended by ChatGPT was off by a factor of 1,000. If that guidance were followed by a clinician, they could have given a patient a medication dose 1,000 times lower than required.

“ChatGPT may be a good starting point for medication information but should not be viewed as an authoritative source,” the authors said in a poster presentation. “References provided by ChatGPT should not be relied on without further verification.”

Fabricated data

These are not the only examples of health care experts finding serious inaccuracies with ChatGPT. Eye surgeons in Italy published an article in JAMA Ophthalmology showing how they used GPT-4 (the large language model powering ChatGPT) with other tools to create a fake clinical trial data set. The physicians asked their tool to create a data set regarding people with the eye condition keratoconus, where the lens of the eye bulges out into a cone shape, indicating that one type of surgical procedure to treat the condition was preferable to another.

Data generated by artificial intelligence included some 300 fake participants and showed that one procedure was better than another — a finding inconsistent with true clinical trials, which indicate outcomes from both procedures are similar for up to two years after the operations.

“Our aim was to highlight that, in a few minutes, you can create a data set that is not supported by real original data, and it is also opposite or in the other direction compared to the evidence that are available,” study co-author Giuseppe Giannaccare was quoted as saying in an article in Nature. Although there were flaws that could be detected in detailed inspection, on quick glance they looked legitimate.

“Potential strategies to identify AI data fabrication may involve looking for peculiar statistical patterns in the data sets, similar to technology that detects AI-generated text by evaluating the likelihood of non-human patterns,” the authors wrote. “Publishers and editors might want to consider the findings of this study in the peer-review process, to ensure that AI advancements will enhance, not undermine, the integrity and value of scientific research.”

Lessons for journalists

So what does this mean for journalists? People who write about medical studies should continue to do their due diligence — reading over studies to identify any conflicts of interest and getting comments from independent researchers who can comment on their validity and findings. Follow news reports of this type and see what type of guidance is issued by medical societies or regulators. Ask researchers about the source of their study data and how they validated their results. If you use ChatGPT for any queries or background information regarding articles you are working on, verify any information given with independent sources or peer-reviewed reference texts before publishing.

For more information about ChatGPT and its use in journalism, watch a recording of an AHCJ webinar from July 2023 with an expert from the Poynter Institute, or check out a tipsheet based on that program.

Resources:

ChatGPT Flubbed Drug Information Questions — story from MedPage Today.
ChatGPT Not Ready for Prime Time for Medication Queries — story from Pharmacy Practice News.
ChatGPT generates fake data set to support scientific hypothesis — article from Nature.
Generative AI tools like ChatGPT and their potential in health care: A primer for journalists — AHCJ blog post from March 2023.