Facebook apology as AI labels black men 'primates'

Facebook apology as AI labels black men 'primates'(bbc.com)

169 points by lindenstark 4 years ago | 252 comments

TOMDM 4 years ago |

Previous discussion from a couple days ago.

https://news.ycombinator.com/item?id=28415582

lindenstark 4 years ago | |

my mistake, missed that.

TOMDM 4 years ago | | |

Eh, no harm, 200+ comments and no one complaining.

People obviously still see value in discussing it

simonw 4 years ago |

This happened to both Google Photos and Flickr too. Which makes it an inexcusable mistake to make in 2021 - how are you not testing for this?

Google Photos in 2015: https://www.wired.com/story/when-it-comes-to-gorillas-google...

Flickr in 2015: https://www.independent.co.uk/life-style/gadgets-and-tech/ne...

IncRnd 4 years ago | |

The reason these companies don't fix these systems is because they don't know how. It is easier to remove certain outputs or retire the whole system. There is no line of code they can tweak.

nolaspring 4 years ago | | |

Richen the dataset it’s trained on enough so that the model is correct before you release it to prod.

wpietri 4 years ago | |

This reminds me of a favorite tweet from 2013: "Then Google Maps was like, 'turn right on Malcolm Ten Boulevard' and I knew there were no black engineers working there" -- https://twitter.com/alliebland/status/402990270402543616

Facebook, like a lot of tech companies, has long had problems with diversity in engineering. Here's an article from April that discusses specific incidents and the broader background: https://www.washingtonpost.com/technology/2021/04/06/faceboo...

oreilles 4 years ago | | |

This isn't a problem with diversity. Everybody knows how to pronounce Malcom X. And it's not like just because a google engineer was black that he was like "oh, let's try and see if Malcom X is pronounced correctly because he's black and I'm black too". This only happens in white people's brain.

throwcommonsns 4 years ago | | |

This is a contrarian take that may get me downvoted and unfairly labeled, but I encourage critical thinking instead:

I've struggled with people telling me that these FAANG companies have "diversity problems," as a person of color myself. A majority of software engineers are female and male immigrants from East Asia and South Asia. These population centers are some of the most diverse regions of the world. The engineers who have been hired by preparing for and passing these companies' selective merit based coding tests had to overcome adverse conditions in their home countries as well, including extreme poverty, starvation, and totalitarian regimes.

Why do they not count toward diversity, to some white and white-adjacent critics? What message are we sending to people who are ethnic minorities from certain groups who earned their spots through merit and have also been targeted in recent newsworthy attacks, just as others have, when we make these kinds of accusations? What does a non problematic ethnic composition look like? What are these companies doing right toward some minority groups and wrong towards others?

IncRnd 4 years ago | | |

Your comment implies black engineers will check that Malcom X Boulevard is pronounced correctly. That's awfully specious.

root_axis 4 years ago | | |

As others noted, just because someone is black doesn't mean that they would have caught this. The whole point of ML is to adapt to what is effectively an unbounded set of inputs, pretty much by definition there will be cases where even a team of 100% black people will train a model that, given the correct input, will fail in ways that particularly affect black people.

sinyug 4 years ago | | |

> Facebook, like a lot of tech companies, has long had problems with diversity in engineering.

If that is the case, why is it that Google voice nav routinely butchers the names of places and roads in India in spite of having thousands of Indian engineers on staff?

Could we blame the intractability of the problem, or just plain old incompetence, before we blame every single problem in the world on racism and lack of 'diversity'?

zozbot234 4 years ago | | |

> Then Google Maps was like, 'turn right on Malcolm Ten Boulevard' and I knew there were no black engineers working there

Silly Google TTS, the proper pronunciation is obviously "Malcolm the Tenth" there.

dd444fgdfg 4 years ago | | |

google maps is made in Australia, and the diversity there is different

gumboshoes 4 years ago | |

Google Photos solved the problem by simply returning no results for words like gorilla, monkey, primate, etc.

jcims 4 years ago | | |

I was just thinking about that. Unfortunately it just makes the bias harder to detect.

Once you search for these:

https://www.google.com/search?q=human+female+face&tbm=isch

https://www.google.com/search?q=human+male+face&tbm=isch

You can see that 'human face' has a bit of post-hoc tuning.

https://www.google.com/search?q=human+face&tbm=isch

silisili 4 years ago | | |

So disappointing. I was legitimately looking for a monkey pic I took years ago to no avail because of no searchability. One of the richest companies in the world prefers to just remove ability than to solve hard problems. But hey, at least we all get ads.

bpodgursky 4 years ago | |

You can't just "test" a neural network like that. For all you know they tested a thousand pictures of Chimpanzees and Gorillas against the network, but for some reason the NN decided to classify the photo differently because the subject was standing in front of the wrong kind of tree or wearing a funny-colored hat.

There's no super reliable way to prevent this (with current tech) other than forbidding that output entirely.

silisili 4 years ago | |

Is it inexcusable that if I search 'Japan' to look for pics from my trip to Japan, it shows me pictures containing any Asian person at all? If I search Japan today, I get mostly pics of my not Japanese wife. But I guess we don't complain enough for anyone to care.

https://i.ibb.co/Mf6rVdf/Screenshot-20210907-002516-Photos.j...

Nobody who has traveled at all would mistake my wife and child as Japanese. And doing so is especially insidious considering the Bataan death march.

godelski 4 years ago | |

> Which makes it an inexcusable mistake to make in 2021 - how are you not testing for this?

They probably are, but not good enough. These things can be surprisingly hard to detect. Post hoc it is easy to see the bias, but it isn't so easy before you deploy the models.

If we take racial connotations out of it then we could say that the algorithm is doing quite well because it got the larger hierarchical class correct, primate. The algorithm doesn't know the racial connotations, it just knows the data and what metric you were seeking. BUT considering the racial and historical context this is NOT an acceptable answer (not even close).

I've made a few comments in the past about bias and how many machine learning people are deploying models without understanding them. This is what happens when you don't try to understand statistics and particularly long tail distributions. gumboshoes mentioned that Google just removed the primate type labels. That's a solution, but honestly not a great one (technically speaking). But this solution is far easier than technically fixing the problem (I'd wager that putting a strong loss penalty for misclassifiying a black person as an ape is not enough). If you follow the links from jcims then you might notice that a lot of those faces are white. Would it be all that surprising if Google trained from the FFHQ (Flickr) Dataset?[0] A dataset known to have a strong bias towards white faces. We actually saw that when Pulse[1] turned Obama white (do note that if you didn't know the left picture was a black person and who they were that this is a decent (key word) representation). So it is pretty likely that _some_ problems could simply be fixed by better datasets (This part of the LeCunn controversy last year).

Though datasets aren't the only problems here. ML can algorithmically highlight bias in datasets. Often research papers are metric hacking, or going for the highest accuracy that they can get[2]. This leaderboardism undermines some of the usage and often there's a disconnect between researchers and those in production. With large and complex datasets we might be targeting leaderboard scores until we have a sufficient accuracy on that dataset before we start focusing on bias on that dataset (or more often we, sadly, just move to a more complex dataset and start the whole process over again). There's not many people working on the biased aspects of ML systems (both in data bias and algorithmic bias), but as more people are putting these tools into production we're running into walls. Many of these people are not thinking about how these models are trained or the bias that they contain. They go to the leaderboard and pick the best pre-trained model and hit go, maybe tuning on their dataset. Tuning doesn't eliminate the bias in the pre-training (it can actually amplify it!). ~~Money~~Scale is NOT all you need, as GAMF often tries to sell. (or some try to sell augmentation as all you need)

These problems won't be solved without significant research into both data and algorithmic bias. They won't be solved until those in production also understand these principles and robust testing methods are created to find these biases. Until people understand that a good ImageNet (or even JFT-300M) score doesn't mean your model will generalize well to real world data (though there is a correlation).

So with that in mind, I'll make a prediction that rather than seeing fewer cases of these mistakes rather we're going to see more (I'd actually argue that there's a lot of this currently happening that you just don't see). The AI hype isn't dying down and more people are entering that don't want to learn the math. "Throw a neural net at it" is not and never will be the answer. Anyone saying that is selling snake oil.

I don't want people to think I'm anti-ML. In fact I'm a ML researcher. But there's a hard reality we need to face in our field. We've made a lot of progress in the last decade that is very exciting, but we've got a long way to go as well. We can't just have everyone focusing on leaderboard scores and expect to solve our problems.

[0] https://github.com/NVlabs/ffhq-dataset

[1] https://twitter.com/Chicken3gg/status/1274314622447820801

[2] https://twitter.com/emilymbender/status/1434874728682901507

trhway 4 years ago | |

>how are you not testing for this?

i wonder how testing for that looks and sounds in corporate environment. It may as well be an area similar to patents - you pretend that you never heard, never discussed, God forbid any mentioning in corporate email/chat/etc. or clicking on a link from inside a corporate network,...

jcims 4 years ago | |

Why are you so sure they aren't testing for it? Bias finds a way.

jcims 4 years ago | | |

Curious if anyone on HN has built a testing framework to catch this kind of issue.

silisili 4 years ago |

I've been trying to avoid controversy lately, but hey, here's one to downvote.

Have we considered AI and ML as a general brain replacement is a failed idea? That we humans feel we are so smart we can recreate or exceed millions of year evolution of a human brain?

I'd never call AI a waste, it's not. But getting it to do human things just may be.

Even a child can tell the difference between a human of any color and an ape. How many billions have been spent trying, and failing, to exceed the bar of the thoughts of a human child?

scotty79 4 years ago |

Is that a result of a skewed training set or are people really hard to tell apart from gorillas if there are no obvious tells like large difference in brightness of different areas of the face?

ALittleLight 4 years ago |

The video features white and black men. It seems like concluding the algorithm is calling black men primates is the same kind of error people are accusing the algorithm/Facebook of. i.e. The reason you think it's racist is because you assume it's talking about black people specifically suggesting you think the word is more apt to describe black people.

Primates and humans are similar labels. This was almost certainly not intentional. Video classifiers are going to make mistakes - sometimes crude or offensive ones. I don't get outrage over labeling errors like this. Facebook should fix the issue - but they shouldn't apologize. It only encourages grievance seekers.

6gvONxR4sf7o 4 years ago | |

We’re assuming it because that’s exactly what has happened with other products in the past. It’s an issue the field has struggled with, so it seems likely.

ALittleLight 4 years ago | | |

Maybe I'm not aware of what you're referring to, but I don't think so. I think, like this incident, companies apologize for stuff like this because they lack the courage to say the truth, which is that it's an unfortunate labeling error but not a big deal. Instead, they judge it to be more political to beg forgiveness. Of course, the people who get offended by labeling errors are only encouraged by apologies and use them as evidence of wrongdoing.

dadoge 4 years ago | |

Intentional or not, the outcome is all that matters.

In every aspect of your life

aufhebung 4 years ago | | |

I don't know if this is necessarily true. We have separate charges for murder and manslaughter for example.

systemvoltage 4 years ago | | |

Speak for yourself. Intent is one of the key factors in crime investigations. Even in 'every aspect of life', intent plays a critical role in greasing the society's abrasion. It helps us understand each other better. Did you accidentally bumped me or did you try to push me out of the bus?

crooked-v 4 years ago | |

> The reason you think it's racist is because you assume it's talking about black people specifically suggesting you think the word is more apt to describe black people.

No, I think it's racist because racists have a long history of calling black people primates, and because an automated system doesn't get to escape scrutiny and critique just because someone didn't specifically put in a line of code that emulates the actions of racists.

firefoxd 4 years ago |

This happens because there are no black people of consequence in the ML pipeline. In my previous company Everytime we built a new model, a bunch of us would test it. Being the only black person in the company, I often found some very odd things and we would correct it before shipping.

I understand that fb is a much bigger scale, but all the reason to have a much more diverse set of eyes to test their models before they go live.

If you want to avoid this, hire more black people, seriously.

dshpala 4 years ago | |

"Hire more black people" - isn't that what FAANG desperately been trying to do for some time now?

I guess first step might be to "hire more black QA people".

SamReidHughes 4 years ago | |

They already get hired, when they meet the hiring bar.

joebob42 4 years ago | |

It's not obvious to me what black people would have done to fix this specific problem. Would they have said "oh we should make sure to test the algorithm on blurry images of people in a forest and make sure it doesn't get confused"?

firefoxd 4 years ago | | |

"I uploaded my picture and it says I'm a monkey"

"Oh, maybe we should look into that"

yanlezeiler 4 years ago |

I worked for another computer vision company, Clarifai that had the same issue. One of the employees noticed it and we retrained the model before it became public.

MBCook 4 years ago | |

This is what amazes me. Given this exact thing has happened in the past and resulted in public humiliation of the companies involved, how did they not notice this? Why didn’t they check for it?

avalys 4 years ago | | |

What’s being reported is that there is a single video which is mislabeled. For all we know, they did test for this, and believed there was no issue.

AI models are deterministic in a purely technical sense, but practically speaking, they are non-deterministic black boxes. It’s not as if you can write a unit test which generates all possible videos of black people and makes sure it never outputs “gorilla”.

MattGaiser 4 years ago | | |

Checking would require checking every possible video against the classifier.

root_axis 4 years ago |

I think the negative reaction is reasonable. Clearly, if a human did this it would a problem, so why should it be acceptable for an automated system to do the same thing? The fact that it is unintentional doesn't negate the fact that it's an embarrassing mistake.

On the other hand, imagine a world where these labels were applied by a massive team of humans instead of a deep learning algorithm. At Facebook's scale, would the photos end up with more or less racist labels on average over time? My guess is that the model does a better job, but this is just another example of why we should be wary about trusting ML systems with important work.

jessaustin 4 years ago | |

Clearly, if a human did this it would a problem, so why should it be acceptable for an automated system to do the same thing?

One worries that the corporate overlords are preparing the legal system for completely impune manufacturers of self-driving cars. "Sorry your child is dead; the car did it so there's no one to sue or convict."

ipaddr 4 years ago | |

That raises the question, is it embarrassing or an expected mistake to be learned from. Many things are mislabel many things are labelled properly but we never say AI must feel pride at the good labeling job why would we give emotions to an emotionless system?

root_axis 4 years ago | | |

> it embarrassing or an expected mistake to be learned from.

I would say it's both. It's embarrassing for Facebook because it looks racist even though it really isn't. The system might be emotionless but the people who interact with it aren't, and we don't expect them to be.

varelse 4 years ago |

AI is not the problem here. AI just notices stuff. It's the lack of even amateur hour emotional intelligence in the product managers who deploy systems like this IMO.

Cycl0ps 4 years ago |

I don't like these stories. It always trends towards the most inflammatory arguments, those being inherint bias and unconscious racism put upon our technology. Real issues in those topics aside, are any articles like this doing anything but feeding flames and generating ad revenue?

Instead, I want to talk about pareidolia. Humans are social creatures. We have evolved to identify others of our kind and read their expressions. This was important to us, as we evolved alongside gorilla analogues as well, and the few of us that couldn't discern one face from another didn't usually last long.

I think we're trying to place too much of a human expectation onto these machines. I think that human features and primate features are strikingly similar, and it's our specialized brains that let us so easily discern. Yes, with enough data and training we could have more accurate models, but we can't cry foul everytime an algorithm doesn't behave like a human does.

Reference: https://www.reddit.com/r/Pareidolia/

dd444fgdfg 4 years ago |

Humans are primates. The AI is correct. Does it classify white men and Asians as primates too? If not, that's a bug.

OneEyedRobot 4 years ago |

I wonder if AIs are good at distinguishing individual gorillas, etc. I'd never really thought about the problem of classification being harder (perhaps) than identification if you see what I mean.

istillwritecode 4 years ago |

Google: hold my beer

Paraesthetic 4 years ago |

ooof, thats uncomfortable

jimjimjim 4 years ago |

nothing important in the world should RELY on a AI/nn/ml.

SilverRed 4 years ago | |

This feature was not of any importance. It just asks the viewer if they want to see more or less of a certain kind of video.

jimjimjim 4 years ago | | |

yes, and now imagine if the feature was important. like sorting job applications, or medical diagnosis, or (dramatic pause) driving. Lots of organizations are looking at ways of completely removing humans from their decision making processes. 99% certainty rates would be fantastic unless you are in the 1% false positive/negative group

sonicggg 4 years ago |

Wow, what else? Did it also label them as "Homo Sapiens"?

q-rews 4 years ago |

I feel that 0-failure-rate expectations from technology will keep us from progressing as a species.

Facebook disabled Thai-to-English translation back in April because it translated the queen as “slut” and it’s been disabled since.

Maybe we should learn to accept non-fatal errors from applications instead of forcing things to stop entirely.

I find it ridiculous that my Photos app suggests I change monkey to “lemur” while I have plenty of photos of monkeys and zero of lemurs.

smoldesu 4 years ago |

Who takes the fall when an AI screws up?

If you shine enough light on it, apparently the brand does. If a human were to do this, the company would immediately fire the employee and cut all ties with them. But as the article points out, 'fixing' an AI mistake isn't really a fix at all:

> [Google] said it was "appalled and genuinely sorry", though its fix, Wired reported in 2018, was simply to censor photo searches and tags for the word "gorilla".

exporectomy 4 years ago | |

If a human did it with intent to offend, they would. But if a black person genuinely looked like a primate to human eyes, that would be pretty shitty to fire the poor worker who had no way of knowing. Here, the AI isn't trying to offend so maybe there should be no consequences and people should stop demanding severe punishment for minor accidental insults.

userbinator 4 years ago |

The AI is very honest and innocent, it doesn't know what political correctness is. I've heard stories of parents whose kids would also mislabel a black human as a gorilla.

exporectomy 4 years ago | |

People need to learn to cope with the difference between innocent mistakes and expressions of genuine feelings of contempt/etc.

crooked-v 4 years ago | | |

An 'innocent mistake' performed by a megacorporation with bajillions of dollars, of exactly the same kind of error that has publicly appeared in the actions of other megacorporations with bajillions of dollars... at that point it's not a 'mistake', it's just the people in charge not caring.

userbinator 4 years ago | | |

Unfortunately, this is the sort of article that just causes all the SJWs to come crawling out and destroy any attempts at logical reasoning.

tartoran 4 years ago | |

AI doesn’t build itself in a vaccuum. The very people who train it are likely to be biased in one direction or another and if left unchecked their mastetpiece will be biased as well. Now, if these algos were staying in a lab or something it wouldn’t be a problem but as soon as they hit the real world they should be held up to some standards, don’t you think?

kodah 4 years ago |

I don't really think the world needs AI right now. One can argue that the AI is making an innocent mistake and that calling an AI or ML (or it's improper training, however that works) "racist" is overblown rhetoric as people are here, but I think all of that aschews the actual issue. The problem is that AI and ML are primarily used for decision making, like in recommendation engines. These little gadgets that provide recommendations may be fairly low-stakes, but are theoretically proof-of-concepts for future applications like policing, fighting terrorism, or human trafficking. If you get it wrong there, the consequences are devastating. If people don't raise the flag about how wildly wrong the AI is now, then there will inevitably be a false confidence to use it for the aforementioned applications (and there are plenty of examples of how this has already happened).

throwthere 4 years ago |

Maybe the algo or the training set or something else was racist, maybe it wasn't. But if you code something that labels people slurs, you've messed something up. Like, you need to be 99.999999% sure you're not throwing out slurs or your whole project is failing spectacularly. And then you have to apologize to the 0.0000001% , which is still probably like 10 people if half the planet uses your site. How do you get there? I don't know. I guess it'd help if you could be 99.999999% sure you weren't looking at a human face before using another label. Like, bias towards humans in a big big way. Heck, the pre-test probability that your algo is looking at a person is probably much higher than the one from your training set if you're facebook. Or maybe you drop primates from your training set. I guess in that case you'll misidentify some primates as people-- which is kind of the flipside of the same problem technically but oh so much more acceptable.

overgard 4 years ago | |

This isn't the kind of slur that you can just run a dictionary search for. There are totally valid contexts to tag a gorilla in a picture if it contains gorillas. I'm sure there are other words it could also mistakenly classify with that might be insulting on accident but arent slurs (maybe tagging an athlete as a statue, for instance, like "that quarterback is a statue in the pocket"). This tech isn't perfect so you either need a human editor or you have to learn to live with mistakes. IMO the fact that this was unintentional and an AI mistake makes me think the outrage is more performative than genuine.

throwthere 4 years ago | | |

Oh boy. People who know basic ML think "Oh, it was unintentional, just a basic misclassification, it happens." Guess what? You're still calling people slurs on your website, even if you did it accidentally.