Automating exact matches for known bad content is easy enough. But when trying to automate without using exact matches, it becomes impossible when nuance and context can make a huge difference in the meaning of a word or sentence. Automation can get it wrong, and when it does, users want the ability to force a human to review it and make a decision, but bad actors will abuse and overwhelm the system.
Not impossible. A few years ago, one report found that Facebook would need to double the number of moderators, expand fact-checking, and take a few other actions. Facebook won't do it because they'd have to divert a small portion of their $39 billion in yearly net income toward that goal.
https://static1.squarespace.com/static/5b6df958f8370af3217d4...
- Moderation is a 'cost center', which is MBAspeak for "thing that doesn't provide immediate returns disproportional to investment". For context, so is engineering (us). So instead of paying a reasonable amount to hire moderators, Facebook and other platforms will spend as little as possible and barely do anything. This mentality tends to be enforced early on in the growth phase where users are being added way faster than you can afford to add moderators, but remains even after sustainable revenues have been discovered and you have plenty of money to hire people with.
- Certain types of against-the-rules posts provide a benefit to the platform hosting them. Copyright infringement is an obvious example, but that has liability associated to it, so platforms will at least pretend to care. More subtle would be things like outrage bait and political misinformation. You can hook people for life with that shit. Why would you pay money to hire people to punish your best posters?
That last one dovetails with certain calls for "free speech" online. The thing is, while all the content people want removed is harmful to users, some of it is actually beneficial to the platform. Any institutional support for freedom of speech by social media companies is motivated not by a high-minded support for liberal values, but by the fact that it's an excuse to cut moderation budgets and publish more lurid garbage.
When attacks can be automated, but moderation can't be, it is impossible to win against bad actors no matter how much you fund the moderation team.
For example, there are certain words which are highly offensive in some regions, but completely fine in others. Some pictures and imagery which are banned or seen as offensive in some places but not others, and other such differences.
There's also often a lot of subculture specific terminology that might come across as offensive to those outside said culture if they don't know the context or situation too. We've seen that with certain tech related terms in the past.
Political misinformation and outrage bait is similarly not always simple to classify. Is content from a satire site that too many people take as real 'misleading'? I know there have been problems with things like The Onion, the Babylon Bee, Hard Drive, etc where people took their stories at face value. It also often falls into 'things the government/those in power don't want others questioning' too, especially when it comes to certain political events and controversies.
Let's not even get into talking about conflicts of interest and scenarios where business value and fairness clash, like when certain political figures say things that would probably get any other account suspended. What if the US president, UK prime minister, president of the EU or some country leader promotes information that's abusive, misleading or dangerous? Twitter had to deal with that problem back in 2016, and just ended up pretending he didn't exist until his term in office was over.
And then there's just context. Is an insulting message aimed at someone an attack by a rival or bully or troll? Or banter from one of their friends, like how many friend groups and families can make friendly jokes at each other's expense? On smaller communities this isn't much of a problem since the people there know each well enough to tell the difference between personal attacks and jokes, but is that going to be the case with a moderator here?
So moderation gets kinda tricky due to all the context needed to know whether a message is abusive or not. A well moderated large platform probably needs people in a variety of locations, from a variety of backgrounds, with some sort of way of getting a group consensus if any staff member is unsure.
Of course the other incentives you point out don't help much either. Google and co want to automate everything, so the idea of using humans to tell the difference between quality and spam/abuse is never even considered by them. Some things like outrage bait are definitely supported by the platforms for their addictiveness, and then we get situations where the owners themselves are horribly biased/trolls/whatever and are happy to allow abuse so long as its from 'their team'.