How To Empower Artificial Intelligence To Take On Racist Trolls
Social media is overwhelmed with toxic trolls and humans are failing to keep them at bay. It’s time for AI to help them.
Many working in a mathematics heavy field have a similar vice. We want to quantify everything, especially if the quantification process is going to be an extremely complicated and imperfect one. In fact, the level of difficulty is the main draw because it forces us to think about what makes up the very thing we’re trying to quantify and how we can objectively define and measure it in the real world. And when it comes to quantifying bigotry that’s exploding on social media, this isn’t an abstract problem for the curious. As social networks have become a global phenomenon with billions of users, human moderation is failing to scale with the explosion of content and on top of the human toll, creating major business and public relations problems for the companies that built them.
Just ask Twitter. After years of hemorrhaging cash, it’s been looking for a buyer interested in monetizing its users for its own devices and willing to absorb the losses for a flood of new sales. But despite some interest and a few bids, the deals went nowhere for one simple reason: Twitter’s troll problem. And as the problem spreads to Facebook and comment sections of news and blogs, Google tried using its artificial intelligence knowhow to help flag bigotry, but when used against actual hate, its system came up short on many counts since it has to rely on keywords and the sequences in which they have to be used to know how toxic they are. It’s the fundamental principle by which neural networks used for such problems are built, and they’re rather limited.
For example, let’s say someone posts a comment that says “all black people are thugs” which is obviously racist as hell. Google’s neural net learned by analyzing over phrases containing slurs like this and their intended targets again and again until it sunk in that the keywords “black,” “people,” and “thug” put in close verbal and logical proximity to each other are, say 90% toxic. So far, the system works, but let’s set the complexity bar higher. Let’s consider another hypothetical post that says “black people should just play basketball” which definitely has a racist connotation, but doesn’t have slurs and obvious negatives for the system to react to.
It sees nothing wrong in a combination of “black,” “people,” and “basketball,” yet the quote is obviously saying that black people should just be athletes, implying other careers to be off limits, and not just any athletes, but in a sport designated for them. It’s a solid 90% or higher on the toxicity scale, but the algorithm sees little to be suspicious about other than the word “just” and flags it as 60% toxic at the very most. Simply looking at sequences of words and their logical distances from each other in the phrase has some problems as a reliable method for a bigotry detector. But how exactly do we remedy these glaring shortcomings?
The Problem With Dog Whistles
To try and answer that, we need to step way, way back and first talk about bigotry not as an algorithm, but as social entity. Who exactly are bigots and what makes them tick, not by dictionary definition one would expect to find in a heavily padded college essay, but by practical, real world manifestations that quickly make them stand out. They don’t just use slurs, or bash liberal or egalitarian ideas by calling them something vile or comparing them to some horrible disease, which means the bigots in question will quickly catch on to how they’re being filtered out and switch to more subtle or confusing terms, maybe even treating it like a game.
Just note how Google’s algorithm goes astray when given quotes light on invective but heavy on the bigoted subtext and what’s known in journalist circles as dog whistles. Sarcasm adds another problem. How could you know on the basis of one comment that the person isn’t just mocking a bigot by pretending to be them, or conversely, mocking those calling out his bigoted statements? Well, the obvious answer is that we need context every time we evaluate a comment because two of the core features of bigotry are sincerity and a self-defensive attitude. Simply put, bigots say bigoted things because they truly believe them, and they hate being called bigots for it.
Only sociopaths and psychopaths are perfectly fine with seeing themselves as evil, ordinary people don’t think of themselves as villains or want others to consider them as such. Even when they say and do terrible things we will use as cautionary tales in the future, they approach it from the standpoint that they’re either standing up for what they know to be right, or just doing their jobs. Even when confronted with irrefutable evidence of their bigotry, sexism, or evil deeds, they’d go as far as to say that they were driven to it because they were criticized so much, as if “I only started using ethnic slurs and calling for mass deportations because you called me a racist” is a legitimate defense.
It’s a phenomenon explored in the famous Holocaust treatise The Banality of Evil, which argues that what we think of as evil on national and global scales can’t be explained by greed, jealousy, or even religious fundamentalism, but by a climate in which everyone is a cog in a machine the, stated goal of which is some nebulous “greatness.” No, this is not to draw a direct parallel between Trumpism and Nazism because they have fundamentally opposite goals. The latter was based around ethnic cleansing and global domination, the former is based on isolationism and seems fine with cultural homogeneity and forced assimilation. But those who were taken in by Trumpism really don’t want to be reminded that this is still bigotry.
In fact, the common message given to tech businessman Sam Altman on his interview tour of Trump’s America was that they detest being called bigots, bad people, or xenophobes, and warn that they will cling closer to Trump if they keep being labeled as such. I have no doubt that they don’t think they are bigoted or xenophobic, but it’s hard to take their word for it when it gets followed by a stream of invective about immigrants destroying culture, bringing crime and disease with them, and describing minorities as getting fortunes in government handouts while “real Americans” like them are just tossed by the wayside by “un-American” politicians.
Ish just got real. https://t.co/WsbMa61ufa
Social Scores And Shadow Ban Solutions
It’s the classic rule that any statement beginning with “I’m not racist, but” will almost always end up being bigoted because the conjunction pretty much demands something not exactly open-minded to be said in order for what will be said to make sense. This is very likely how the aforementioned algorithm knows to start to raise its toxicity score for the argument: it detects a pattern that raises a red flag that something very, very negative is about to make its appearance because it has seen enough examples in its training set.
And this is ultimately what a successful bigotry-flagging AI needs: patterns and context. Instead of just looking at what was said, it needs to know who said it. Does this person frequently trip the bigot sensor, pushing it into the 55% to 65% range and above? Does this person escalate when called out by others, tripping the sensor even more? What is this person’s social score as determined by feedback from other users in their replies and votes and likes?
Yes, the social score can be brigaded, but there are tell-tale signs which can be used to disqualify likes and votes, signs like large numbers of people from sites known for certain biases coming in to engage a certain way, correlations between some of these sites posting and a rush of users heavily skewing one way, and floods of comments that trigger the sensor, so these are well understood problems that can be managed already. We should also track from where the users are coming on the web. Are they coming from sites favorited and frequented by bigots to post stuff that trips the sensor? That’s also a potential red flag.
A flow that tracks where the user came from, their reputation, their pattern of comments, and how they handle feedback won’t be a perfect system, but it’s not supposed to be. It will give users the benefit of the doubt, then crack down when they show their true colors. In the end, we should end up with a user with a track record and a social score reflective of it, and if that score is very problematic, the best practice would be to shadow ban this person.
You will also be able to model the telltale signs of a verbal drive-by over time to flag it before anyone sees it and take appropriate automated action. Again, it would be impossible to build a perfect anti-abuse system, but with a flow of data moderated by several purpose-built neural nets will definitely give you a leg up on toxic users. And certainly, for some users it will almost be a kind of perverse challenge to see how far they can push the system and become a commenter with the lowest reputation or the highest offense score. But for a number of others, it could actually be an important piece of feedback.
These bigots may have thought about themselves as sober skeptics who worry more about facts than feelings, but immersing themselves in the Trumpist bubbles were led to embrace bigotry through distorted, misleading data, and outright lies. They still think of themselves as upstanding people without a hateful bone in their bodies. But a computer which can show them when what they said tripped a bigot sensor, how often, and the severity and degree of their rants might show them that no, they’re not the nice people they thought.
And being able to transparently present this feedback may be just as key to good anti-troll AI as monitoring sources of traffic, the actual content, users’ histories, and learning how to flag dog whistles from those histories and the input of other users and administrators. We don’t want something only able to flag abuse if we don’t know how it works, we want something that shows us an audit trail to inform users and the programmers what happened, and use the same process we use to identify bigots: over time, in context, giving time and opportunity for the hood to slip and reveal what’s beneath.
Then we can mute, quarantine, and provide feedback to users who leave toxic or bigoted comments what we find so objectionable and why. It’s true there’s no law against hate speech or racism, but social media is not a government ran enterprise which must respect their first amendment rights and cannot do anything about their speech as not to violate the law. Trolls can, and do, build their own networks where they can exist in an anything-goes-I-live-to-offend environment, and their disappointment that they can’t harass “normies” does not have to be our problem.
Social Media Needs To Take Out The Trash
Social media was created and is maintained by private companies that don’t have to give bigots a major platform, and its users are fed up with trolls who sincerely believe not only that their opinions are only offensive to “libtards, cucks, and kikes,” but any disagreement and consequences for their actions and words violates their right to free speech. Since it doesn’t, we can finally do something about the popular refrain that the comment section is where a misanthrope goes to reaffirm his hatred of humanity, and reason along with civil discourse go to die a horrible death by a thousand insults.
Google’s new Perspective algorithm is a good start, but it’s just one piece of the puzzle we can’t solve with the data points from a single comment, even with the most well trained recurrent neural networks. Ultimately, we need to teach computers to follow a conversation and make an informed opinion of a person’s character, something that can’t be done by a single neural net heavily reliant on parsing language. Understanding how to do it may be one of the most important technical issues we tackle, or lose the web to armies of trolls, bots, and people really into goose-stepping to a strongman’s tune.
Again, yes, the AI won’t be perfect. There will be false positives and sarcasm will be flagged as racism while actual racism gets the occasional free pass. If humans have trouble telling the two apart sometimes, a computer is bound to make mistakes too. But we’re not aiming for perfect. We’re aiming for parity with human moderators who understand what bigotry is, what it sounds like, and don’t make exceptions or follow arbitrary, absolutist rules like Facebook set up for its moderation team. We won’t need to flag everyone who said an objectionable thing in public, we just need to catch enough of the absolute worst offenders to start making a dent in their advance.
There’s also a debate to be had for each network how to handle the system’s output. Should they digitally coral bigots into their own corners and then tag them as toxic, like Reddit and some gaming communities? Should they be shadow banned or booted entirely? How to handle repeat offender who were able to figure out how to game the system’s latest iteration? None of these are technical questions, they’re philosophical debates each social media company will have to have on its own. But they’ll need to have them. And soon.
This article originally appeared on [ weird things ] on 03.01.2017 and has been expanded and slightly updated.