AI-generated X-rays stump radiologists: What does it mean for patient safety?

Karen Blum

Share:

A doctor holding up an x-ray. In a new study, only 41% of radiologists were able to notice that an x-ray was AI-generated.

Photo by Anna Shvets via Pexels

You may have heard about a radiology study that garnered significant press attention last month, showing that even experienced radiologists struggle to distinguish between real and AI-generated X-ray images. 

In the study, published in the journal Radiology, only 41% of radiologists noticed that some of the images appeared off. Even after being informed that some of the images were AI-generated, the average accuracy in distinguishing between real and deepfake images remained at 75%. A deepfake is a computer-generated image, video or audio recording that looks or sounds as if it is real. 

The majority (76%) of radiologists participating in the study didn’t know that tools like ChatGPT could be used to produce realistic-looking X-ray images.

It’s an interesting study for journalists covering tech in health care because it highlights how realistic AI-generated images are becoming, underscores the need for tools and methods for doctors to identify errant images, and serves as a warning about potential dangers to patient safety if images like these are left unchecked. 

About the study 

Mickael Tordjman, M.D., a postdoctoral fellow at Mount Sinai Hospital in New York City, and colleagues asked 17 radiologists from six countries to assess X-ray images of different parts of the body. The radiologists ranged from residents still in training to those with up to 40 years of experience.

In the first phase, radiologists reviewed 154 X-rays of the chest, spine and extremities, including 77 authentic images and 77 synthetic ones generated by ChatGPT. Radiologists were asked to assess the technical quality of each image and provide a diagnosis, and were asked if they noticed anything unusual about the images. Seven of 17 radiologists (41%) spontaneously reported the presence of AI-generated images. 

Are you smarter than a radiologist?

Want to see examples of the images used in this study? Try this quiz from the study authors. 

In the second phase, radiologists were told that some of the images were AI-generated. They were asked to categorize each image in the dataset as real or AI-generated, and to rate their level of confidence in their decisions. The average accuracy in identifying real versus AI images was about 75%, with individual accuracy ranging from 59% to 92%. The level of confidence did not differ for real versus fake images. 

In a third phase, radiologists reviewed 110 X-rays of the chest, 55 real and 55 generated by RoentGen, an AI tool for X-ray images. They were again asked to categorize each image as real or synthetic. The average reader accuracy was 70%. 

Researchers also tested four large language models’ ability to detect real versus synthetic images, finding accuracy rates ranging from 57% to 85% depending on the tool and type of image. Ironically, even GPT-4o, the tool used to generate synthetic images, failed to reliably recognize its own outputs, the authors noted. 

The potential hits to patient safety 

The study demonstrates that large language models like ChatGPT can be used to generate highly realistic medical images, or deepfakes, which can be helpful for training purposes but also raise concerns about potential misuse, the authors wrote. While this study queried radiologists only, any physician who is prepared to see medical images should be aware of AI’s ability to produce images, Tordjman told STAT

“There’s a lot of hacking of health systems going on in the past few years, where hackers try to steal patients’ data,” he said in the interview. “In the future, it’s very possible that they will try to inject fake medical data — which will be even worse than stealing, because then you will not be able to differentiate which part of the medical chart is real and which part is synthetic.”

Another concern is that someone could make a fake insurance claim using a fake radiograph, he said. 

The findings emphasize the need for clinician training and for dedicated tools to mitigate the risks of deepfake X-rays, Tordjman and colleagues wrote. One strategy is to use invisible watermarks, which embed identity data within images, either to say the image is created by ChatGPT or for real images, to list the name of the hospital or technician doing the X-ray, to help ensure authenticity. 

Meanwhile, the researchers noted some clues that recurred in many AI-generated images, Forbes reported

For example, the images can look too perfect, Tordjman told the magazine: “Bones are overly smooth, spines unnaturally straight, lungs overly symmetrical, blood vessel patterns excessively uniform, and fractures appear unusually clean and consistent, often limited to one side of the bone.” And, some companies are working on solutions to detect synthetic medical images, Tordjman told STAT. 

Resources

Karen Blum

Karen Blum

Karen Blum is AHCJ’s health beat leader for AI and Patient Safety. She’s a health and science journalist based in the Baltimore area and has written health IT stories for numerous trade publications.