AI recognition of patient race in medical imaging: a modelling study

AI recognition of patient race in medical imaging: a modelling study(thelancet.com)

76 points by aunterste 4 years ago | 169 comments

The interpretation part hit home: "The results from our study emphasise that the ability of AI deep learning models to predict self-reported race is itself not the issue of importance. However, our finding that AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images, often when clinical experts cannot, creates an enormous risk for all model deployments in medical imaging."

Animats 4 years ago | |

"Predict self-reported race". Not race from DNA. (That's routinely available from 23andMe, and is considered an objective measurement.[1]) They should have collected both. Now they don't know what they've measured.

[1] https://www.nytimes.com/2021/02/16/opinion/23andme-ancestry-...

aulin 4 years ago | |

what's this enormous risk they're talking about? racial bias in x-ray reading? race can be a risk factor in plenty of diseases, why should we actively try to remove this information from medical images?

matthewdgreen 4 years ago | | |

"This issue creates an enormous risk for all model deployments in medical imaging: if an AI model relies on its ability to detect racial identity to make medical decisions, but in doing so produced race-specific errors, clinical radiologists (who do not typically have access to racial demographic information) would not be able to tell, thereby possibly leading to errors in health-care decision processes."

KaiserPro 4 years ago | | |

> racial bias in x-ray reading?

no, it implies there is a signal in the dataset that could be something other than clinical. This means that until they can pinpoint the cause, or the thing the AI is detecting, all the other things it predicts are suspect.

ie if the AI thinks the subject is west african, then it might be more inclined to diagnose something related to sickle cell.

Or north western european woman in her mid 60s vs a japanese woman might get widly different bone density readings for the same level of "blob" (most medical imaging is divining the meaning of blobs and smears )

fumblebee 4 years ago | | |

My first thought here is to relate this to the problem of early colour film, which was largely tested and validated with only light skin tones in mind. Once it was put out into the wild, folks with darker skin tones found the product to be total crap. Why? Because there was a glaring OOD (Out of Distribution) problem during testing.

Similarly, if the train/test sets used here - for X-ray based diagnostics - using Machine Learning relies only on specific races, then the performance might be worse for other races, given that there's a new discriminatory variable in play.

The obvious solution here is to reduce bias by ensuring race is part of the dataset used for training and testing. Which, due to PII laws in play, may actually be quite challenging! Fascinating tradeoff imo.

ibejoeb 4 years ago | | |

I don't get it either. It's accurate. It would be a problem if it got it wrong, which could, for example, underweight quantitative genetic data and adversely influence differential diagnosis.

Retric 4 years ago | | |

AI is driven by the training sets, but the goal is to find the underling issues.

Suppose AI #1 got a higher score on the training data and AI #2 had a more accurate diagnosis. Obviously you want #2 but if there is bias in the training data based on race and the AI has access to race then eventually you overfit into #1.

pdpi 4 years ago | | |

ML models are great tools, but they're way too much of a black box. What you have here is a model that's predicting something you think it shouldn't have been possible to predict, and you can't simply ask it where that prediction comes from. Absent an explanation for how the model is doing this, you have to consider the possibility that whatever is poisoning that prediction will also poison others.

dekhn 4 years ago | | |

yep, the case for "enormous risk" hasn't been well articulated. It's been repeated a lot, but of all the problems in medical care, this isn't one of the larger ones.

unsupp0rted 4 years ago | | |

What if it turns out that humans have identifiable biological differences among genetic sub-groups, ethnicities, etc? It would be anarchy in the social sciences.

sim7c00 4 years ago | | |

soon they will want to remove race indicators for photographs and tik tok videos. who knows, maybe its racist to be of a race >.>

nerdponx 4 years ago | |

I suspect this is a "tank vs sky" problem. The article says that the bright areas of bone are not the most important for predicting race. What if it's some features of different hospitals and x-ray setups?

Also did they release their code and anonymized data? If not, it's impossible to tell if this is a bug.

If I got this result in my work, I would check it 10k times over because it defies belief. Even allowing subtle skeletal differences in different ethnic groups, the differences in this case are not in the bone and at least sometimes not visible to the human eye. Unless there is an undiscovered difference in radio-opacity across ethnicities, the result doesn't make sense.

nerdponx 4 years ago | | |

Replying to my own post because I can't edit it anymore.

Apparently this is a known and persistent affect across a variety of other medical images, tests, and scans. Not just for a "race" but for ethnic groups in general, as well as biological sex. So this might actually just be an "AI hit piece" that otherwise confirms an unpalatable but persistent and strong effect in the literature. The causes seem to be badly understudied, in part due of the obvious need for delicacy and respect around such topics.

This result is tremendously implausible to me, but I am finding quite a few articles documenting similar phenomena across things like retina scans and brain MRIs.

MontyCarloHall 4 years ago |

Not too surprising that physical differences across ethnicities are literally more than skin deep. It wouldn’t be shocking that a model could identify one’s ethnicity based on, for example, a microscope image of their hair; why should bone be any different?

I’m more surprised that the distinguishing features haven’t been obvious to trained radiographers for decades. It would be cool to see a followup to this paper that identifies salient distinguishing features. Perhaps a GAN-like model could work—given the trained classifier network, train 1) a second network to generate images that when fed to the classifier, maximize the classification for a given ethnicity, and 2) a third network to discriminate real from fake X-Ray images (to avoid generating noise that happens to minimize the classifier’s loss function). I wonder if the generator would yield images with exaggerated features specific to a given ethnicity, or whether it would yield realistic but uninterpretable images.

eklitzke 4 years ago | |

I think it's more likely the case that (a) most radiographers aren't trained in medical school to look for distinguishing racial features (why would they be?) and (b) in most cases the radiologist knows or can easily guess the race of the patient anyway so there's no need to try to guess it from X-ray imaging data. There are a lot of anatomical features related to race that have been known since before radiology has been a field, it's just not pertinent to the job of most radiologists.

uberwindung 4 years ago |

..”In this modelling study, we defined race as a social, political, and legal construct that relates to the interaction between external perceptions (ie, “how do others see me?”) and self-identification, and specifically make use of self-reported race of patients in all of our experiments.”

Garbage research.

axg11 4 years ago | |

Perfect example of citations-driven research. The authors aren’t motivated by a genuinely interesting scientific question (“are anatomical differences between genetically distinct groups of people visible in X-rays?”). Instead, the authors know that training a classifier to predict race will generate controversial headlines and tweets. All publicity, positive or negative, leads to more citations.

colinmhayes 4 years ago | | |

> genetically distinct groups of people

Is race a genetically distinct marker though? I guess if you limit the sample enough it is, but I've always thought of race as more of a continuous quality than a distinct one.

sudosysgen 4 years ago | |

This is the only reasonable possible way to do it. Races are fluid and ill-defined constructs, so self-identification is the best you can do for ground truth.

groby_b 4 years ago | |

And you are qualified to make that assessment because...?

dang 4 years ago |

The submitted title ("AI identifies race from xray, researchers don't know how") broke the site guidelines by editorializing. Submitters: please don't do that - it eventually causes your account to lose submission privileges.

From the guidelines (https://news.ycombinator.com/newsguidelines.html):

"Please use the original title, unless it is misleading or linkbait; don't editorialize."

gus_massa 4 years ago | |

It's the title of the Vice article about the same topic. https://www.vice.com/en/article/wx5ypb/ai-can-guess-your-rac... (It was posted last year.) (No idea why the OP used one title and another URL.) (The title of Vice is a bad title anyway.)

croes 4 years ago | | |

This or similar is the title on multiple sites.

https://www.boston.com/news/health/2022/05/18/scientists-cre...

https://nationalpost.com/health/health-and-wellness/ai-can-t...

https://www.sciencealert.com/ai-can-predict-people-s-race-fr...

https://www.iflscience.com/technology/ai-can-identify-race-f...

https://www.bostonglobe.com/2022/05/13/business/mit-harvard-...

Just a small collection

dang 4 years ago | | |

Good catch!

Imnimo 4 years ago |

The fact that the model seems to be able to make highly accurate predictions even on the images in Figure 2 (including HPF 50 and LPF 10) makes me skeptical. It feels much more probable that this is a sign of data leakage than that the underlying true signal is so strong that it persists even under these transformations.

https://arxiv.org/pdf/2011.06496.pdf

Compare the performance under high pass and low pass filters in this paper on CIFAR-10. Is it really the case that differentiating cats from airplanes is so much more fragile than predicting race from chest x-rays?

jl6 4 years ago |

> Models trained on low-pass filtered images maintained high performance even for highly degraded images. More strikingly, models that were trained on high-pass filtered images maintained performance well beyond the point that the degraded images contained no recognisable structures; to the human coauthors and radiologists it was not clear that the image was an x-ray at all.

What voodoo have they unearthed?

proto-n 4 years ago | |

I tend to not believe unbelievable results in machine learning. It's too easy to unintenionally cause some kind of information leakage. I haven't read the paper in detail though, so their experimentation setup could be foolproof, this is not a critique of this paper specifically.

sidewndr46 4 years ago | | |

This reminds me of the ML research that could predict sex from an iris. It turns out they were using entire photos of eyes to do this. There are so many obvious cues to pick up on in that case, like eyeliner, eyelashes being uniform (or fake), trimmed eyebrows, general makeup on the skin, etc.

dragonwriter 4 years ago | | |

> I tend to not believe unbelievable results

That seems tautologically true.

JumpCrisscross 4 years ago | |

> What voodoo have they unearthed?

Curious for the take not of a neuro-ophthalmologist. If they too are stumped, this may be a path to a deeper understanding our visual system.

Simple transformations obviously discernible to us blind computer vision. (CAPTCHAs.) There may be analogs for human vision which don’t present in the natural world. Evidence of such artefacts would partially validate our current path for artificial intelligence, as it suggests the aforementioned failures of our primitive AIs have analogs in our own.

civilized 4 years ago | |

I think there's a significantly greater than zero chance that they simply botched their ML pipeline horribly and would get their 0.98 AUCs from completely blank images.

6gvONxR4sf7o 4 years ago | |

I think it’s pretty straightforward. Imagine the fourier transforms of some recognizeable audio signals. Maybe a symphony and a traffic jam. They’ll look totally different, even to the naked eye. If you chop off the low frequency components, you can still probably tell which fourier spectrum is which. But now do the same thing in time domain (high-pass filter the audio). It probably won’t be clear that you’re listening to a symphony anymore.

Der_Einzige 4 years ago | |

... Adversarial examples.

It's a whole field of research, and it's pretty trivial to generate them for most classes of ML models. It's actually quite difficult to create robust models that DON'T have this problem...

tomp 4 years ago |

If you’re interested in “hard to describe features that can be learned with enough expiration”, look up chick sexing

https://en.wikipedia.org/wiki/Chick_sexing#Vent_sexing

Beltiras 4 years ago | |

Interesting field. You have to breed a couple of males to maintain the species. If you were to pick those from the mis-sexed group I suppose natural selection would reduce the classifying feature over time. I wonder if poultry farms pick a couple of the male-classified birds to maintain a stock of well identifiable males and kill all the mis-classified males.

civilized 4 years ago |

It would be nice to see more genuine, enthusiastic scientific curiosity to understand how the ML algorithms are doing this, rather than just abject terror and alarm.

SpicyLemonZest 4 years ago | |

It seems like the reason the researchers in this paper are concerned is precisely that they tried and failed to understand how the ML algorithms are doing this. If they’d discovered that white people have a subtly distinctive vertebra shape the model was detecting, it would have been much more of “oh, we discovered a neat fact”.

civilized 4 years ago | | |

I don't think they tried very hard at all. I see no meaningful use of modern explanation tools.

There are lots of known ways in which people of different races are different physiologically. Probably even more unknown ways.

There could also be differences in imaging technology used in different communities, as others have suggested. I'd be a bit surprised if something like that could create such a strong signal but it's on the table.

tejohnso 4 years ago |

"This issue creates an enormous risk for all model deployments in medical imaging: if an AI model relies on its ability to detect racial identity to make medical decisions, but in doing so produced race-specific errors, clinical radiologists would not be able to tell, thereby possibly leading to errors in health-care decision processes."

Why would a model rely on its ability to detect racial identity to make decisions?

What kind of errors are race-specific?

daniel-cussen 4 years ago |

It could actually be the skin, it's designed to block rays, it might also have a different x-ray opacity, and that can be judged from the whole picture in particular where there's several layers of melanin, or there's transitions from melanin to very little like on hands and feet. Eyelids too, if they're retracted. And at the perimeter, the profile, different angle for the ray.

And the intention is for melanin to block x-rays too, block all rays, not just UV but deeper. Well it has a spectrum, that cannot be denied. And if you're taking all the pixels in an image, there might be aggregate effects as I described. You get a few million pixels, let AI use every part of the buffalo of the information of the picture, and you can get skin color through x-rays.

The question is what this says about Africans with light-skin strictly because of albinism, ie lack of pigmentation, but otherwise totally African.

mensetmanusman 4 years ago |

What does this mean in terms of race being a social construct/concept?

hellohowareu 4 years ago |

Simply go to google image and search: "skeletal racial differences".

subspecies are found across species-- they happen based on geographic dispersion and geographic isolation, which humans underwent for tens and hundreds of thousands of years.

Welcome to the sciences of anatomy, anthropology, and forensics.

other differences:

- slow twitch vs fast twitch muscle

- teeth shape

- shapes and colors of various parts

- genetic susceptibility to & advantages against specific diseases

Just like Darwin's finches of the Gallapogos, humans faced geographic dispersion resulting in genetic, diet (e.g. hunter-gatherer vs farmer & malnutrition), and geographical (e.g. altitude) differences which over the course of millennia affect anatomical differences. We can see this effect across all biota: bacteria, plants, animals, and yes, humans.

help keep politics out of science.

bb123 4 years ago |

One idea is that there is some difference in the x-rays themselves that could potentially be explained by racial disparities in access to (and quality of) healthcare. Maybe white people tend to visit hospitals with newer, better equipment or better trained radiographers and the model is picking up on differences in the exposures from that.

krona 4 years ago | |

> We also showed that the ability of deep models to predict race was generalised across different clinical environments, medical imaging modalities, and patient populations, suggesting that these models do not rely on local idiosyncratic differences in how imaging studies are conducted for patients with different racial identities.

MontyCarloHall 4 years ago | |

They mostly accounted for this:

>Race prediction performance was also robust across models trained on single equipment and single hospital location on the chest x-ray and mammogram datasets

Sure, it’s possible that bias due to the radiographer is the culprit, but this seems unlikely.

Beltiras 4 years ago | |

That's an interesting confounding variable. I think it's disproven by the fact that the AUC is too high given your hypothesis.

redox99 4 years ago | |

These results seem too accurate to be explained only by a correlation to the medical equipment used.

mathieubordere 4 years ago |

I mean, if color of skin, form of eyes and other visible, "mechanical" characteristics can be different it's not that big of a leap to observe that certain non-visible characteristics can differ too between humans.

samatman 4 years ago |

Physiologies are created by genetics, and differences in ancestry are the basis for self-identified race.

Ordinary computer vision can also identify race fairly accurately, the high pass filter thing is merely pointing out that ML classifiers don't work like human retinas.

It's astonishing how many epicycles HN comments are trying to introduce into a finding that anyone would have predicted. Research which confirms predictable things is valuable of course, but no apple carts have been upset.

bitcurious 4 years ago |

I would guess a causal chain through environmental factors, given how much archeologists are able to tell about prehisotric humans’ lives based on bone samples.

Bone density, micro fractures and deviations in shape. The mongols had famously had bowed legs from spending a majority of their waking lives on horseback.

oaktrout 4 years ago |

I recall seeing a paper in the early 2010s with an algorithm that could discriminate between white and Asian based on head MRI images. I'm having trouble finding it now, but this finding to me is not too surprising.

ppqqrr 4 years ago |

So there’s material differences that supports certain prejudices; big surprise, turns out human societies have been (and still is) working very hard for thousands of years to craft those differences - isolating, separating, enslaving, oppressing, exiling their scapegoat “others”. The question is not whether the differences are real, but whether we can prevent AI from being used to perpetuate those differences. TBH, we don’t stand a chance; we live in a society where most people cannot even wrap their heads around why it shouldn’t perpetuate those differences.

kerblang 4 years ago |

> Importantly, if used, such models would lead to more patients who are Black and female being *incorrectly* identified as healthy

I think this is the point a lot of people are missing; they think, "So what if 'black' correlates to unhealthy and the model notices? It's just seeing the truth!"

However, I'm still wondering how this incorrectness works; can anyone explain?

Edit: Clue: The AI is predicting self-reported race, and the authors indicated that self-reported race correlates poorly to actual genetic differences.

KaiserPro 4 years ago | |

My guess is that they are using an american dataset. This I would suspect encodes socioeconomic data into the samples. ie rich people, have access to better diagnostics, get seen earlier and are treated sooner. Conversely poorer present later and with more obvious symptoms. also the type of system used to take the images would also be strongly correlated.

ars 4 years ago |

If this is true I suspect a human could be trained the same way.

I read once that a radiologist can't always explain what they see in an image that leads them to one diagnosis or another, they say that after seeing many of them they just know.

So I suspect the same could be done for race. This would be a super interesting thing to try with some college students - pay them to train for a few days on images and see how they do.

omgJustTest 4 years ago |

Given the complexity of datasets, and what is known about the quality of medical scanners, is it possible that underserved communities (ie higher noise scanners) serve a specific community that is heavily skewed in race distributions?

cdot2 4 years ago | |

"our finding that AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images"

It doesn't seem like noise in the images is a factor

orangepurple 4 years ago | | |

Imaging artifacts may persist despite corruption

HWR_14 4 years ago |

A lot of people are proposing simple reasons why this could be the case. They did so last year when the study that inspired this got published.

Maybe this needs to be updated from physicists: https://xkcd.com/793/

wittycardio 4 years ago |

I don't trust medical journals or experimental AI research to be particularly scientific so I'll just throw this into the meaningless bin for now.