[1] https://www.nytimes.com/2021/02/16/opinion/23andme-ancestry-...
no, it implies there is a signal in the dataset that could be something other than clinical. This means that until they can pinpoint the cause, or the thing the AI is detecting, all the other things it predicts are suspect.
ie if the AI thinks the subject is west african, then it might be more inclined to diagnose something related to sickle cell.
Or north western european woman in her mid 60s vs a japanese woman might get widly different bone density readings for the same level of "blob" (most medical imaging is divining the meaning of blobs and smears )
Similarly, if the train/test sets used here - for X-ray based diagnostics - using Machine Learning relies only on specific races, then the performance might be worse for other races, given that there's a new discriminatory variable in play.
The obvious solution here is to reduce bias by ensuring race is part of the dataset used for training and testing. Which, due to PII laws in play, may actually be quite challenging! Fascinating tradeoff imo.
Suppose AI #1 got a higher score on the training data and AI #2 had a more accurate diagnosis. Obviously you want #2 but if there is bias in the training data based on race and the AI has access to race then eventually you overfit into #1.
Also did they release their code and anonymized data? If not, it's impossible to tell if this is a bug.
If I got this result in my work, I would check it 10k times over because it defies belief. Even allowing subtle skeletal differences in different ethnic groups, the differences in this case are not in the bone and at least sometimes not visible to the human eye. Unless there is an undiscovered difference in radio-opacity across ethnicities, the result doesn't make sense.
Apparently this is a known and persistent affect across a variety of other medical images, tests, and scans. Not just for a "race" but for ethnic groups in general, as well as biological sex. So this might actually just be an "AI hit piece" that otherwise confirms an unpalatable but persistent and strong effect in the literature. The causes seem to be badly understudied, in part due of the obvious need for delicacy and respect around such topics.
This result is tremendously implausible to me, but I am finding quite a few articles documenting similar phenomena across things like retina scans and brain MRIs.
I’m more surprised that the distinguishing features haven’t been obvious to trained radiographers for decades. It would be cool to see a followup to this paper that identifies salient distinguishing features. Perhaps a GAN-like model could work—given the trained classifier network, train 1) a second network to generate images that when fed to the classifier, maximize the classification for a given ethnicity, and 2) a third network to discriminate real from fake X-Ray images (to avoid generating noise that happens to minimize the classifier’s loss function). I wonder if the generator would yield images with exaggerated features specific to a given ethnicity, or whether it would yield realistic but uninterpretable images.
Garbage research.
Is race a genetically distinct marker though? I guess if you limit the sample enough it is, but I've always thought of race as more of a continuous quality than a distinct one.
From the guidelines (https://news.ycombinator.com/newsguidelines.html):
"Please use the original title, unless it is misleading or linkbait; don't editorialize."
https://www.boston.com/news/health/2022/05/18/scientists-cre...
https://nationalpost.com/health/health-and-wellness/ai-can-t...
https://www.sciencealert.com/ai-can-predict-people-s-race-fr...
https://www.iflscience.com/technology/ai-can-identify-race-f...
https://www.bostonglobe.com/2022/05/13/business/mit-harvard-...
Just a small collection
https://arxiv.org/pdf/2011.06496.pdf
Compare the performance under high pass and low pass filters in this paper on CIFAR-10. Is it really the case that differentiating cats from airplanes is so much more fragile than predicting race from chest x-rays?
What voodoo have they unearthed?
That seems tautologically true.
Curious for the take not of a neuro-ophthalmologist. If they too are stumped, this may be a path to a deeper understanding our visual system.
Simple transformations obviously discernible to us blind computer vision. (CAPTCHAs.) There may be analogs for human vision which don’t present in the natural world. Evidence of such artefacts would partially validate our current path for artificial intelligence, as it suggests the aforementioned failures of our primitive AIs have analogs in our own.
It's a whole field of research, and it's pretty trivial to generate them for most classes of ML models. It's actually quite difficult to create robust models that DON'T have this problem...
There are lots of known ways in which people of different races are different physiologically. Probably even more unknown ways.
There could also be differences in imaging technology used in different communities, as others have suggested. I'd be a bit surprised if something like that could create such a strong signal but it's on the table.
Why would a model rely on its ability to detect racial identity to make decisions?
What kind of errors are race-specific?
And the intention is for melanin to block x-rays too, block all rays, not just UV but deeper. Well it has a spectrum, that cannot be denied. And if you're taking all the pixels in an image, there might be aggregate effects as I described. You get a few million pixels, let AI use every part of the buffalo of the information of the picture, and you can get skin color through x-rays.
The question is what this says about Africans with light-skin strictly because of albinism, ie lack of pigmentation, but otherwise totally African.
subspecies are found across species-- they happen based on geographic dispersion and geographic isolation, which humans underwent for tens and hundreds of thousands of years.
Welcome to the sciences of anatomy, anthropology, and forensics.
other differences:
- slow twitch vs fast twitch muscle
- teeth shape
- shapes and colors of various parts
- genetic susceptibility to & advantages against specific diseases
Just like Darwin's finches of the Gallapogos, humans faced geographic dispersion resulting in genetic, diet (e.g. hunter-gatherer vs farmer & malnutrition), and geographical (e.g. altitude) differences which over the course of millennia affect anatomical differences. We can see this effect across all biota: bacteria, plants, animals, and yes, humans.
help keep politics out of science.
>Race prediction performance was also robust across models trained on single equipment and single hospital location on the chest x-ray and mammogram datasets
Sure, it’s possible that bias due to the radiographer is the culprit, but this seems unlikely.
Ordinary computer vision can also identify race fairly accurately, the high pass filter thing is merely pointing out that ML classifiers don't work like human retinas.
It's astonishing how many epicycles HN comments are trying to introduce into a finding that anyone would have predicted. Research which confirms predictable things is valuable of course, but no apple carts have been upset.
Bone density, micro fractures and deviations in shape. The mongols had famously had bowed legs from spending a majority of their waking lives on horseback.
I think this is the point a lot of people are missing; they think, "So what if 'black' correlates to unhealthy and the model notices? It's just seeing the truth!"
However, I'm still wondering how this incorrectness works; can anyone explain?
Edit: Clue: The AI is predicting self-reported race, and the authors indicated that self-reported race correlates poorly to actual genetic differences.
I read once that a radiologist can't always explain what they see in an image that leads them to one diagnosis or another, they say that after seeing many of them they just know.
So I suspect the same could be done for race. This would be a super interesting thing to try with some college students - pay them to train for a few days on images and see how they do.
It doesn't seem like noise in the images is a factor
Maybe this needs to be updated from physicists: https://xkcd.com/793/
If the AI is also implicitly learning to detect race from the images, it's going to learn an association that people of race X usually have tumors and people of race Y usually do not.
The problem here is that the people training the model and the clinical radiologists interpreting data from the model may not realize that race was a confounding factor in training, so they'll be unaware that the model may make racial inferences in the real world data.
If people of race X really do have a higher incidence rate for a specific type of cancer than race Y, maybe this is OK. But if the issue is that there was bias in the training/validation data set that was unknown to the people building the model, and in the real world people of race X and race Y have exactly the same incidence rate for this type of cancer, then this is going to be a problem because it's likely to introduce race-specific errors.
See e.g. https://www.ucsf.edu/news/2021/09/421466/new-kidney-function...
Frankly, even a freshly arrived alien from Mars or Titan could easily tell Icelanders, Mongols and Xhosa apart, without knowing anything about our culture. The fact that there has been a lot of interbreeding/admixture since the Age of Sail began, does not mean that there aren't meaningful biological differences between the original groups, which still obviously exist.
An analogy: much like the existence of twilight does not render the concept of night and day a 'social construct' either. We attach certain social meanings to those natural phenomena, and a 'working day' can easily stretch into 'astronomical night' (all too often!), but that does not mean that 'night' and 'day' do not exist outside of our cultural reference framework.
There is a social concept of 'race' which corresponds to the 'working day' concept in this analogy, e.g. 'BIPOC', claiming Asians as 'white adjacent' or classifying North Africans or Jews as 'white', even though they may not necessarily look white. But this is almost certainly not what the AI identified. This social concept of race would confuse a Martian alien unless he started to study the social and racial history of the U.S., and possibly even afterwards. It definitely confuses me, a random observer from Central Europe.
The social definition is used because that's a most scientifically meaningful and useful definition that avoids many of the issues with biological race realism.
Nations are uncontroversially recognized as a social constructs. However I'm certain that AI could also detect images taken outdoors in Mexico vs those in Finland. Additionally I, a US citizen, cannot simply declare that I am now a citizen of France and expect to get a French passport.
However it also means that what a nation is, is not set in stone for eternity. It means that different people can debate about the precise definitions of about what defines a particular nation. It means that Czechoslovakia can become the Czech republic and Slovakia. It means that not everyone agrees if Transnistria is an independent nation. It means that the EU can decide that a German citizen can have the same passport as a French citizen.
As a more controversial example, this is also the case when people talk about gender being a "social construct". It doesn't mean that we can simply pretend like the ideas "men" and "women" doesn't exist (as people both declare and fear). But it does mean there is some flexibility in these terms and we as a society can choose how we want these ideas to evolve.
Society is a complex and powerful part of our reality, arguably more impactful on us from day to day than most of physics (after all we did survive for hundreds of thousands of years without even understanding the basics of physics). Therefore something being classified as a "social construct" doesn't mean it "isn't real". Even more important is that individuals cannot choose who social construct evolve. I cannot, for example, declare that since taxes are a social construct, I'm not paying them anymore. We can however, as a society, change what and how these constructs are interpreted.
Race picks specific and arbitrary differences , for example hispanic is a different race in US society but black and white based on skin color are as well, indians and east asians are also one "race".
Ethnicities are not social constructs but race is. The AI finds ethnic differences and correlates them with self-percievied social/racial classification.
"Race" as the evil social construct it is, takes ethnic differences and intrprets them to mean some ethnicities are different races of humans than others as in not just different ancestors but differently created or evolved despite all evidence and major religion saying all humans are one species (homosapiens) that have a common homosapien ancestor.
I thought all this was obvious but the social climate recently is very weird.
From national geographic: “Race” is usually associated with biology and linked with physical characteristics such as skin color or hair texture. “Ethnicity” is linked with cultural expression and identification.
https://www.healthit.gov/isa/taxonomy/term/741/uscdi-v2
https://www.healthit.gov/isa/taxonomy/term/746/uscdi-v2
(I'm not claiming that this is an optimal approach, just pointing out how it works in most software today.)
Let the social "culture war" rage on. The only war I see going on in the west (U.S. mostly) is a _lack_ of culture.
In fact it can be medically harmful to think this way.
They discourage using race as a source of any physiological signal. They do allow using genetics, but the relevant situations are the many many ones where genetic testing isn't possible or doesn't yet provide useful signal.
Unaccountable institutions get captured very easily, and the race cult that's swept through our educated class has been a very powerful one.
[1] https://www.ama-assn.org/press-center/press-releases/new-ama...
An interesting question in the U.S. is "who is considered white?" There was a Supreme Court case in which someone who was literally from the Caucasus was ruled not white. This is why it's sociological, not scientific.
https://www.sceneonradio.org/episode-40-citizen-thind-seeing...
To give a contrived example; if I say people with ring fingers over 3 inches long are Longfings and people wkth ring fingers 3 inches or less are Shortfings, and then out society treats people differently based on being Longfing or Shortfing, this is a social construct that is causing problems for people based on a contrived criteria that has no real meaning. The same is true of race.
Sure, it’s possible that bias due to the radiographer is the culprit, but this seems unlikely.
This is a very contrived way to say that people share characteristics with other people. The real question is why people don't say that I belong to the six-foot tall bad-knees race.
£10 says that its not that. Anatomy is extraordinarily hard, and AI isn't that good, yet. Sure different races have different layouts, but often that's only really obvious post mortem. (ie when you can yank out the bones and look at them, there are of course corner cases where high res CAT/MRI scans can pull out decent skeletal imagery in 3D) There are other cases, but that should be easy to account for.
If I had to bet, and I knew where the data was coming from, I'd say its probably picking up on the style of imaging, rather than anything anatomical. Not all x-rays have bones in, and not all bones differ reliably to detect race.
> keep politics out of science.
Yes, precisely, which is why the experiment needs to be reproduced, and theories tested through experimentation. The reason why this is important is because unless we workout where this trait is coming from, we cannot be sure the diagnosis is correct. For example those with sickle cells have a higher risk of bone damage[1] which could indicate they are x-rayed more. This could warp the dataset, causing false positives for sickle cell style bone damage.
[1]https://www.hopkinsmedicine.org/health/conditions-and-diseas...
This was my guess as well. I've spent a lot of time around radiology and AI (I used to work at a company specializing in it) and we read a lot of the failure cases as well. There was one example where the model picked up on the hospital, and one hospital was for higher risk patients- so it learned to assign all patients from that hospital to the disease category simply because they were at that hospital.
There are a ton of cases like this out there, especially when using public datasets (which in the medical field tend to be very unbalanced datasets due to the difficulties of building a HIPAA compliant public dataset).
Certainly possible! They do control for hospital and machine …
>Race prediction performance was also robust across models trained on single equipment and single hospital location on the chest x-ray and mammogram datasets
… but it’s also possible that different chest x-rays were being used for different diagnostic purposes and thus have a different imaging style, which a) may correlate with ethnicity and b) does not appear to be explicitly controlled for.
>"We found that deep learning models effectively predicted patient race even when the bone density information was removed for both MXR (AUC value for Black patients: 0·960 [CI 0·958–0·963]) and CXP (AUC value for Black patients: 0·945 [CI 0·94–0·949]) datasets. The average pixel thresholds for different tissues did not produce any usable signal to detect race (AUC 0·5). These findings suggest that race information was not localised within the brightest pixels within the image (eg, in the bone)."
Our tools are so precise you can tell which parent a set of cousins had with DNA tests, this doesn't make them a different species/sub-species or race from each other, even if one group has red hair and the other has black.
It's the pointless lumping together of people who are genetically distinct and drawing arbitrary, unscientific lines that's the issue.
Presumably the same experiments that can detect Asian Vs Black Vs White could also detect the entirely made up 'races' of Asian orBlack, AsianorWhite and WhiteorBlack since those are logically equivalent.
So are the races I made up a moment ago real things? No. But a computer can predict which category I'd assign, doesn't that make them real and important racial classifications? No it means my made up classifications map to other real genetic concepts at a lower level, like red hair.
Which came as a surprise to the ophthalmologists, because they aren't aware of any significant differences between male and female retinas.
[0] https://www.researchgate.net/publication/351558516_Predictin...
A human doctor is also a black box, in meat form.
The deceptively evil part of the concept of race is, it does not simply differentiate biological features but it goes on to impose a fork at the root of the ancestoral tree where people of that race share the same origin and same differences. In reality biological differences are a result if what a culture considers attractive multiplied by mutations that help people adopt to different environments (e.g.: skin color being a result of adaptation to sun light and vitamin d levels instead of a being a feature that shows ancestoral forks in creation or evolution).
It is simply inaccurate to label people by race but it is useful to impose social evils. But biological differences due to mating and cultural choices are very real and can be examined at a granular level that takes the actual factors for the differences into account instead of the lazy+evil correlation that is the concept of race.
Ethnicity is not what culture you identify with. You don't become ethnically african american because you like african american culture and grew in a specific neighborhood. It is the marriage of culture and ancestry.
Similar to prejudice and stereotyping or the worst of lies there is some truth in its reasoning but the untrue part, the lazy part allows people to commit evil and be unjust. A reason to harm others with minimal conscious discomfort.
What does it mean for shortfings to be dramatically taller? Are you saying that shortfings must transmit height along with finger length; some sort of race invariance? Or are you saying that most shortfings you meet are also tall?
If a black person is a pale as a white person, they're still considered black (and may share many other characteristics that many black people have.) If some of your shortfings have long fingers, does the distinction still make sense as a scientific category?
> the longfings are complaining that they're overrepresented in jumpball?
Is admission to jumpball determined by finger measuring, or through social factors?
You can draw an analogy to colours in the rainbow: a rainbow is a spectrum but we can still draw lines that demarcate colours. Colour definitions are fuzzy at the edges but this doesn't mean coarse colour labels are not distinct.
Now, issue really is that whole race grouping is extremely murky. And not really anywhere specific enough as used in common speech. White, Black, Asian etc. are way too wide to be very useful. Even inside what we could understand as rather homogenous groups there is lot of difference between areas.
(Hint : what you want is a classifier on ethnicity, and those aren't trivial either)
Feel free to cite sources.
I'm not here to tell you what to do. Use race then. I offered up why I think this article is only generating interest is because race is a loaded word, and if it weren't used, it'd be passed over.
> The real question is why people don't say that I belong to the six-foot tall bad-knees race
This is an article about ML accurately predicting self-identified race. This is not even on the spectrum of real questions.
A better discussion is around sickle cell anaemia[0] which is exclusively carried by people of African or Afro-Caribbean descent.
As prometheus76 says, perhaps you will one of these days be able to mentally resolve the inherent contradiction in the above sentence.
If your prior belief points strongly in one direction, it is completely rational to require strong weight of evidence in order to update it to point to the other direction.
And yes, it's a completely reasonable prior belief for a person who is not already versed in medical imaging literature.
I often find that people who study this literature have bad attitudes like yours. You should be grateful that there are people out there who value intellectual honesty enough to acknowledge when a result is a result and to change their beliefs. Instead I get two different people showing up to insult me.
That just sounds like poor feature selection/engineering. Garbage in, garbage out.
This is a cutting edge subfield of ML, so it's understandable that one paper in a medical journal isn't going to be on that cutting edge, but I think they should at least acknowledge that their investigations barely scratched the surface.
Consider an "AI" that rates the probability of recidivism for prisoners nearing their parole date. That score would then be presented to the parole board, and taken into consideration in determining whether or not to grant parole. If this AI were accidentally/incidentally accurately determining the race of the prisoner, then the output score would take that into account as well. Black men have a recidivism rate significantly higher than other groups[1]. The reasons for the above aside - it's a complex topic, and outside the scope of this analogy - this is extremely undesirable behavior for a process that is intended to remove human biases.
You might then ask, how does this relate to medical imaging? Medical decisions are regularly made based on the expected lifespan of the individual. It makes little sense to aggressively treat leukemia in a patient who is currently undergoing unrelated failure of multiple organs. Similarly it would likely make sense for a healthy 30-year-old to undergo a joint replacement and associated physical therapy, because that person can reasonably be expected to live for an additional 40 years while the same treatment wouldn't make sense for a 70-year-old with long-term chronic issues. This concept is commonly represented as "QALY" - "quality-adjusted life years".
Life expectancy can vary significantly based on race[2].
An AI that evaluates medical imagery that considers QALY in providing a care recommendation may result in a positive indicator for a white hispanic woman and a negative indicator for a black non-hispanic man, with all else being equal and with race as the only differentiator.
In short - it's not necessarily a bad thing for a model to be able to predict the race of the input imagery. The problem is that we don't know why it can do so. Unless we know that, we can't trust that the output is actually measuring what we intend it to be measuring.
1: https://prisoninsight.com/recidivism-the-ultimate-guide/ 2: https://www.cdc.gov/nchs/products/databriefs/db244.htm
If, in your hypothetical recidivism case, an AI "accurately" determined that a pattern of higher recidivism-related features was correlated to race, and was able to determine "accurately" that the specific subset of recidivism-related features predicted race, why would it be wrong to make parole decisions using those recidivism-related features?
edit: imagine I was a teacher who systematically scored people with certain physical characteristics 10% lower than people who didn't have them. Let's say, for example, that I was a stand-up comedy teacher that wasn't amused by women.
If I used an AI trained on that data to choose future admissions (assuming plentiful applicants), I would end up with an all-male class. If this happened throughout the industry (especially noting that the all-male enrollment that I have would supply the teachers of the future), stand-up comedy would simply become a thing that women were seen as not having the aptitude to do, although nobody explicitly ever meant to sabotage women, just to direct them into something that they would have a better chance to succeed in.
> efforts to control [model race-prediction] when it is undesirable will be challenging and demand further study
I mean, sure, there are tons of ways for garbage data to sneak into ML models -- though these guys tried pretty hard to control for that -- but if the model actually determined that "race" is a meaningful feature, then that might be because it is, and science should be concerned with what is, not with what we wish were.
https://dicom.innolitics.com/ciods/procedure-log/patient/001...
If interested, searching for "dicom conformance" should yield lots of docs that probably contain specific values for those things.
Because in the US some people have a hard time understanding that all races and genders deserve to be treated equally as humans with the same access to goods and services. Further, that there are disparities in care based on race/ethnicity[1][2] and gender[3][4] because of that racism/sexism present in the systems. This then leads to requiring that race/ethnicity and gender data be scrubbed sometimes to keep people from impacting outcomes based on their own biases.
[1] https://www.americanbar.org/groups/crsj/publications/human_r...
[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1924616/
[3] https://www.americashealthrankings.org/learn/reports/2019-se...
Then you are not pretending very well. When I lived in the US I was shocked at how often it was an issue. It permeates nearly every aspect of US culture.
The icing on that cake: A government-run interactive map so you can lookup which races live in which neighborhoods. Some versions allow you to zoom in to see little dots representing clusters of black or white residents. https://www.census.gov/library/visualizations/2021/geo/demog...
https://www.healthit.gov/isa/uscdi-data-class/patient-demogr...
The question I was posing is different, though, because this was discussing an AI system that looked at the underlying [in this case, recidivism] data which had race and race-adjacent information removed, and the AI has effectively rediscovered the concept of "race" by connecting it to some set of attributes of the actual [in this case, recidivism-predicting] features. If the AI were to determine such a link, that doesn't make its results biased, it just makes them uncomfortable. It's not clear to me that in such a case that would mean that we should remove those [recidivism-predicting] features from the dataset just because they ended up being correlated to race.
One common issue is a lot of these kinds of tags rely on optional human input and are inconsistently applied. As opposed to say, modality specific parameters produced by a machine, which are consistent.
DICOM is a great example of design by committee, with the +'ve and -'ves that implies.
This study appears to have done a good job controlling for known biases that could have been proxies for race, but it is presumably possible that they missed something and tainted the data
For example, not having race data on resumes is generally productive, because that categorization can't provide a meaningful input to the decision associated with an individual person. Even if it were to be the case that there was some correlation between race and skill at whatever job you're interviewing for[1], the size of the effect is almost certainly small, and in the meanwhile you've also controlled for any bias in the person doing the reviewing.
If you're having a machine look at a dataset, and the machine determines that race or ethnicity is a material factor in determining some attribute in that dataset, you're not doing anybody any good by denying that fact and destroying the result.
[1]Let's ignore for the purposes of this discussion, fields (like certain sports) where extreme competition combines with a position heavily dependent upon racially-linked physical characteristics. Though even in this case, there is still a (different, weaker) argument for suppressing race data in "resumes" (yes, I know, ballplayers don't submit resumes to their local NBA franchise)
If the outcome that you're trying to predict is also affected by perceptions of race, you've built a gossip feedback loop.
Now look just above that latter table, showing distances between East Asians and Europeans. The distances are far greater--more than 10x.
The precision with which we can identify and track ancestry, often based on small fractions of DNA (Y-chromosome in particular wrt Ashkenazi Jews, not mtDNA as one might think) doesn't imply the degree of genetic distinctiveness.
I think the trickiness is in providing the machine unbiased data to begin with so that it doesn't incorrect associations between features like race. The most egregious examples I'm aware of are the machine learning systems used to suggest criminal sentencing, but, apropos to this topic I believe there are cases where it may produce erroneous associations in something like skin cancer risk.