AI images distort medical reality

iStock.com/da-kuk

As I have written before, I’m not a great fan of artificial intelligence (AI) in medicine and healthcare. Sure, it works well at quickly categorising large amounts of results, for example in radiology. And it helps draw doctors’ attention to anomalies in scans and thereby speeds up diagnosis. But when it comes to the consultation and the introduction of chatbots into the equation, I remain to be convinced.

So my attention was drawn to a slightly offbeat study in the British Medical Journal that looked at the accuracy of AI-generated images of doctors.

Dr Sati Heer-Stavert, a GP and Associate Clinical Professor at University of Warwick Medical School, set out to see how these images compared with actual workforce statistics. He looked at whether images of UK doctors differed from their US counterparts. And he wondered whether the identity of the NHS influenced how doctors are depicted.

Dr Heer-Stavert used OpenAI’s ChatGPT (GPT-5.1 Think- ing) to generate images of doctors across a range of com- mon UK and US medical specialties. From the following prompt, “Against a neutral background, generate a single photorealistic headshot of [an NHS/a UK/a US] doctor whose specialty is [X],” the first image from each chat was selected. This resulted in 24 images – eight each – of NHS, UK, and US doctors across different specialties.

Out of the 24 generated images, just six – representing one-quarter – portray female doctors. These female representations were confined to the fields of obstetrics and gynaecology, as well as paediatrics, across the NHS, UK, and US categories. Among the eight US doctors shown, six (75 per cent) were depicted as white, while the remaining two – both from ethnic minority backgrounds – were also the only female doctors represented in the US group.

Also, a clear contrast emerged between the NHS and UK image sets. Prompts using ‘NHS’ produced doctors who all appeared to belong to ethnic minority groups, while those generated with the term ‘UK’ depicted doctors as white.

These findings differ from current workforce data. Fig- ures from 2024 indicate that around 40 per cent of doctors on the UK specialist register are women, with obstetrics and gynaecology and paediatrics having the greatest female representation – both exceeding 60 per cent. Similarly, in the US, approximately 40 per cent of physicians are women, and these same two specialties also show a higher proportion of female doctors.

The images produced using the ‘NHS’ prompt do not align with real workforce demographics. In the US, for example, 56 per cent of doctors identify as white and 19 per cent as Asian. However, existing literature indicates that AI tends to favour depictions of doctors who appear white.

This small study suggests that even minor changes in prompts can produce markedly different AI-generated depictions of doctors. It also highlights how, when given brief or simplistic instructions, generative AI systems may default to stereotyped portrayals.

“Generative processes have the potential to exaggerate and reinforce existing stereotypes when used in real-world settings such as recruitment campaigns,” the author writes. “Stereotypical depictions may shape patients’ expectations, create dissonance when they encounter genuine clinicians, and reinforce prejudice against certain doctors.”

“AI-generated images of doctors should be carefully prompted and aligned against workforce statistics to reduce disparity between the real and the rendered,” he concludes.

In a response to the article, two Austrian doctors highlight an additional aspect of the images that appears to differ sharply from real-world data.

The Professors of Surgery and Psychiatry at the Sigmund Freud Private University Vienna note that two of the three images depicting surgeons and psychiatrists show these specialists with stethoscopes draped around their necks. While conceding that surgeons may use stethoscopes for auscultating intestinal sounds, they say the routine pres- ence of such instruments in psychiatric settings is difficult to explain.

While physical examination skills should be developed in residents in psychiatry, they say (somewhat tongue in cheek) they have not been able to “find any evidence for their specific use (auscultation of brain sounds?!) in psychiatry”.

Both the original research and the perceptive reader comments pose significant questions about the accuracy of AI.

They certainly do nothing to challenge my personal scepticism about generative intelligence.

AI images distort medical reality

Leave a Reply Cancel reply

Latest

The rising workload of modern-day general practice

Task-based medicine and the NCHD experience

Healthcare is not a machine

Why more patients are ending up in emergency departments

The Pogue of Nenagh

Latest Issue

Medical Independent 5th May 2026

We must do more to retain our healthcare staff

Don’t put off the inevitable

Trending Articles

Consultant contract leading to greater ‘flexibility’ – HSE

The Mercedes E-Class: A new era

Medicine is turning into a day-job not a vocation

Laboratory medicine under the microscope

Download the Free Mindo App

Read the latest Mindo eCopy

The Medical Independent

News

Comment

Clinical

Podcasts

Life

News Team

Societies

Journals

Quick Links