Making a racist AI without really trying (2017)(blog.conceptnet.io) |
Making a racist AI without really trying (2017)(blog.conceptnet.io) |
> As a hybrid approach, you could produce a large number of inferred sentiments for words, and have a human annotator patiently look through them, making a list of exceptions whose sentiment should be set to 0. The downside of this is that it’s extra work; the upside is that you take the time to actually see what your data is doing. And that’s something that I think should happen more often in machine learning anyway.
Couldn't agree more. Annotating ML data for quality control seems essential both for making it work, and building human trust.
Making this assumption is fine in some cases (for example if you don't have training data for your domain), but if you build a classifier based on this assumption why don't you just use an off-the-shelf sentiment lexicon? Do you really need to assign a sentiment to every noun known to mankind? I doubt that this improves the classification results regardless of the bias problem.
> Do you really need to assign a sentiment to very noun known to mankind?
No, but it seems like a simple (and seemingly innocuous) mistake that many programmers can and will make.
The big conclusion here after all that code buildup does not logically follow. All it shows is that one new word embedding, trained by completely different people for different purposes with different methods on different data using much fancier semantic structures, outperforms (by a small and likely non-statistically-significant degree) an older word embedding (which is not even the best such word embedding from its batch, apparently, given the choice to not use 840B). It is entirely possible that the new word embedding, trained the same minus the anti-bias tweaks, would have had still superior results.
I that there is a real statistical pattern in the training data that names associated with certain ethnicities are more likely to appear close to words with negative sentiment. I just don't think this necessarily means that the news is racist. I think more analysis is needed to see where this pattern comes from.
However, if it is true that the news is biased and racist in a quantifiable way, that would be a bigger problem than biased word vectors. I would genuinely be interested in seeing that type of analysis.
They haven't shown that there is no tradeoff, either in general or in this case.
Um, that's because we know it comes with trade-offs once you have the most optimal algorithm. See for instance https://arxiv.org/pdf/1610.02413.pdf. If your best performing algorithm is "racist" (for some definition of racist") you are mathematically forced to make tradeoffs if you want to eliminate that "racism".
Of course, defining "racism" itself gets extremely tricky because many definitions of racism are mutually contradictory (https://arxiv.org/pdf/1609.05807.pdf).
Not all patterns are equally useful. By removing those unuseful patterns we might make less mistakes (for example giving negative sentiment to a Mexican restaurant review) and free up capacity in the word vectors to store more useful patterns. I would expect baking other real-world assumptions into your word vectors unrelated to bias could also be helpful.
There are two ways to look at this:
1. Racism makes the algorithm good so we should make the algorithm less racist (at a cost to its performance) or decide we want to allow systematic racism.
2. The metric for how good the algorithm is (ie training data) encourages it to be racist and therefore correcting the bias in the algorithm may decrease its performance on the training data but may not affect its performance in the real world, or decrease its performance in the “performance + meets legal requirements” metric.
Italian restaurant is good.
Chinese restaurant is good.
Chinese government is bad.
Mexican restaurant is good.
Mexican drug dealers are bad.
Mexican illegal immigrants are bad.
And hence the word vector works as expected and the sentiment result follows.
Update:
To confirm my suspicion, I tried out an online demo to check distance between words in a trained word embedding model using word2vec:
http://bionlp-www.utu.fi/wv_demo/
Here is an example output I got with Finnish 4B model (probably a bad choice since it is not English):
italian, bad: 0.18492977
chinese, bad: 0.5144626
mexican, bad: 0.3288326
Same pairs with Google News model:
italian, bad: 0.09307841
chinese, bad: 0.19638279
mexican, bad: 0.16298543
It seems to me that if you wanted to root out sentiment bias in this type of algorithm, then you would need to adjust your baseline word embeddings dataset until you have sentiment scores for the words "Italian", "British", "Chinese", "Mexican", "African", etc that are roughly equal, without changing the sentiment scores for all other words. That being said, I have no idea how you'd approach such a task...
I don't think you could ever get equal sentiment scores for "black" and "white" without biasing the dataset in such a manner that it would be rendered invalid for other scenarios (e.g., giving a "dark black alley" a higher sentiment than it would otherwise have). "Black" and "white" is a more difficult situation because the words have different meanings outside of race/ethnicity.
Also, regarding black/white etc., there might legitimately be words which have so many different meanings (whether race-related or not) that you should just exclude them from sentiment analysis. "Right" can mean like "human rights", "right thing to do", or "not left". Probably plenty of other words like that. You might do better to have a list of 100-200 words that are just excluded because of issues like that.
I haven't studied word embeddings past the pop-sci level but wouldn't such words form multiple clusters in the embedding space? I would have thought it would be relatively easy to get different 'words' for 'right (entitlement)', 'right (direction)', etc?
Edit: Nibling post answers this question.
I would argue that it’s pragmatically associated with bad things (e.g., being mugged, overcrowded areas) but it’s not intrinsically bad (or good) itself.
Is that not what's meant by sentiment?
So it's not giving us objetive decisions, but a mirror. Not so bad either.
To explain what happened here: They trained a classifier to predict word sentiment based on a sentiment lexicon. The lexicon would mostly contain words such as adjectives (like awesome, great, ...). They use this to generalize to all words using word vectors.
The way word vectors work is that words that frequently occur together are going to be closer in vector space. So what they have essentially shown is that in common crawl and google news names of people with certain ethnicities are more likely to occur near words with negative sentiment.
However, the sentiment analysis approach they are using amplifies the problem in the worst possible way. They are asking their machine learning model to generalize from training data with emotional words to people's names.
For any such analysis there are a great many confounds, both blatant and subtle. Finding racism everywhere could be because overt racism is everywhere, or it could be confirmation bias. It could even be both! That's the tricky thing about confirmation bias—one never knows when one is experiencing it, at least not at the time.
Reminds me of how Google Photos couldn't differentiate between a black person & a monkey, so they've excluded that term from search altogether.
While the endeavour itself is good, fixes are sometimes hilariously bad or biased (untrue)
Recently there was an article about recognition of bullshit: https://news.ycombinator.com/item?id=17764348
To me the article brought great insight - I realized that humans do not just pattern match. They also seek understanding, which I would define as an ability to give a representative example.
It is possible to give somebody a set described by arbitrarily complex conditions while the set itself is empty. Take any satisfiability problem (SAT) with no solution - this is a set of conditions on variables, yet there is no global solution to these.
So if you were a Chinese room and I would train you on SAT problems, by pure pattern matching, you would be willing to give solutions to unsolvable instances. It is only when you actually understand the meaning behind conditions you can recognize that these arbitrary complex inputs are in fact just empty sets.
So perhaps that's the flaw with our algorithms. There is no notion of I understand the input. Perhaps it is understandable, because understanding (per above) might as well be NP-hard.
So at some point you need to decide when you pattern-match and accept the result for granted, and when you decide to dig into it further to understand why the pattern matched the way it did, and whether it's relevant. But that is itself a choice, and it's also going to be biased (for example, towards people you personally know, and against random strangers).
It is possible that brain has specialized circuits to solve small instances of SAT, and it just gives up on large enough instance. I am sure you know the feeling that you get when you understand something - it's very much like the pieces of the puzzle that suddenly perfectly fit to each other.
> Note that the accuracy of sentiment prediction went up when we switched to ConceptNet Numberbatch.
> Some people expect that fighting algorithmic racism is going to come with some sort of trade-off. There’s no trade-off here. You can have data that’s better and less racist. You can have data that’s better because it’s less racist. There was never anything “accurate” about the overt racism that word2vec and GloVe learned.
I wonder if this could be extended to individual names that have strong connotations with people because of the fame of some particular person, like "Barack", "Hillary", "Donald", "Vladimir", or "Adolf", or if removing that sort of bias is just too much to expect from a sentiment analysis algorithm.
Suppose sentiment for the name Tyrel was better than for Adolf. Would that indicate anti-white bias? Suppose the name Osama has really poor sentiment. What fudge factor do you add there to correct for possible anti-Muslim bias? Suppose Little Richard and Elton John don't have equal sentiment. Is the lower one because Little Richard is black, or because Elton John is gay?
What we have been seeing lately is an effort to replace unmeasurable bias that is simply assumed to exist and to be unjust and replace it with real bias, encoded in our laws and practices, or in this case, in actual code.
It takes two to Tango .. the average audience behavior isn't blameless for the impact of its response. Also, how an AI decides interprets an ambiguous response as being desirable or not is really interesting.
Systems that are perfectly objective about recognizing patterns will definitely be biased in their predictions, that's the whole point.
The learning algorithms are crude and dumb. They will simply fit to any data you provide it (you choose how many Mexican restaurant food reviews you train your sentiment classifier on). Then they count how many times the words "mexican" and "man" and "mexican man" appear with a positive or negative label in the train set. And objectively try to give the best probability for that.
Current sentiment analyzers are not AI: no common sense, no understanding, no reasoning. We are just rushing to replace looking a job candidate in the eyes with running some 1960's logistic regression over their cover letter. Let's hope for their sake they did not manage a Mexican restaurant.
http://costena.com/famous.html
On May 3rd, 1997 La Costeña of Mountain View, California created the world's largest burrito. The burrito weighed in at 4,456.3 pounds and was measured at 3,578 feet long. It was created at Rengstorff Park in Mountain View.
Technically that is what happened, but it paints an incorrect picture in people's minds. Out of the billions of images that Google Photos had auto-tagged, it tagged one picture of two black people as "gorillas".[1] This was probably the first time this had ever happened. (If it had happened before, it surely would have been spread far and wide by social media & the press.)
So Google's classifier was inaccurate 0.0000001% of the time, but the PR was so bad that Google "fixed" the issue by blacklisting certain tags (monkey, gorilla, etc). If you take photos of monkeys, you'll have to tag them yourself.
I'm sure Google could do better, but the standard required to avoid a PR disaster is impossible to meet. If the classifier isn't perfect forever, they're guaranteed to draw outrage.
1. https://twitter.com/jackyalcine/status/615329515909156865
That is a very big leap. Social media might be widespread, but almost everything in the world goes unremarked upon. Think of all the news stories that turn up an old tweet or Facebook post that, if anyone had paid attention at the time, would have stopped events from progressing.
https://arxiv.org/abs/1802.05365 (state of the art)
"On two occasions I have been asked, — 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."
Similarly, my guess is that alley is rarely found in a positive context, but the actual sentiment comes from elsewhere in the utterance.
It depends on two things. The first is how you're defining racism. If the algorithm is predicting that 10% of white people and 30% of black people will do X, because that is what actually happens, some people will still call that racism but there is no possible way to change it without reducing accuracy.
If the algorithm is predicting that 8% of white people and 35% of black people will do X even though the actual numbers are 10% and 30%, then the algorithm has a racial bias and it is possible to both reduce racism and increase accuracy. But it's also still possible to do the opposite.
One way to get the algorithm to predict closer to 10% and 30% is to get better data, e.g. take into account more factors that represent the actual cause of the disparity and just happen to correlate with race, so factoring them out reduces the bias and improves accuracy in general.
The other way is to anchor a pivot on race and push on it until you get the results you want, which will significantly harm accuracy in various subtle and not so subtle ways all over the spectrum because what you're really doing is fudging the numbers.
What is actually happening? Does it tell you if they are they doing X precisely because they are black or white? The racist part might not be the numbers per se, but in the conclusion that the color of their skin has anything to do with their respective choices.
edit: spelling
I remember a while back Google got flack because the image search for "scientist" was almost entirely famous African American scientists. That's now changed and shows stock images of (mostly white) people in lab coats.
"Three black teenagers" shows mostly groups of mugshots.
The word "Brazilian" shows hot, almost nude women. "German" shows the flag. "Portuguese" shows maps, flags, and a lot of normal looking people. "Hispanic" all pictures are normal looking people.
For instance search for 'green circle' - okay you get mostly green circles. Now search for 'green circle with red line' and the results are completely nonsensical. The huge leap forward in search engines was being able to avoid returning hardcore porn when searching for Abraham Lincoln. But in spite of tens of thousands of engineers, hundreds of billions of dollars in revenue, and all sorts of fancy declarations of ultra sophisticated AI solving every problem under the sun, we really haven't moved that far beyond that early milestone.
Sometimes data might be 'racist' (i.e. human written corpus text)... but sometimes data is just data.
Are facts racist?
I would seem the world is rather diverse, i.e. 'people are different' and as we are different, AI is going to pick up on that. That's the whole point.
Now ... some bad examples like in this example taking positive/negative inferences the wrong way. OR actual systematic racisms showing up in bad ways i.e. maybe some groups are more likely to be monitored than others, thereby showing up more frequently in mad terms etc..
If we nobly insist that the models describe the world as we wish it were and ought to be, then we won't be describing the data accurately. Maybe that trade-off is worthwhile if it somehow reforms human attitudes along lines we find more agreeable?
Then again, my personal social bubble leans heavily liberal and hard left. And I think that has a lot more to do with it than with how much people understand ML. When you explain this sort of thing to people who have no idea about ML, in very simple terms ("we give the robot the text that humans wrote, so that it can pick up the patterns" etc), they see why it does that very quickly, as well - if their politics makes them aware of bias in general.
Stuff like this puts the lie to that, though.
Disclaimer: I don't know enough about the data or the algorithm to determine this mathematically but I think worth pointing out. Would have been nice to see some statistical analysis instead of just assuming the charts speak for themselves.
It seems to me like the people writing about how AI is bad because it picks up biases from data are wishing the ML would learn the world as it ought to be. But that's wrong, and that would make such algorithms not useful. ML is meant to learn the world as it is. Which is, as you wrote, neither fair nor a controlled experiment.