How do you tackle hate speech one emoji at a time?

Wednesday 8th Dec 2021, 12.30pm

Online communication channels are popular, to say the least. For instance, there are 187 million active Twitter users per day alone. Sadly, these open channels of communication also open up the potential for harm, through online hate speech. The problem is so large that we require AI to help detect it. But what about when it comes to emoji? According to some reports, ten billion emoji are sent in messages around the world every day. When the same emoji can have vastly different meanings depending on the context, how can we use AI to detect their use as hate speech, and make the internet a safer place? We speak to Oxford AI researcher Hannah Rose Kirk to find out.

Read Transcript

Emily Elias: There are 187 million active users on Twitter, a day. That’s a lot of tweets. And monitoring the hate speech proportion of these posts is getting more complicated, as emojis become a nuanced part of our daily online language.

On this episode of the Oxford Sparks Big Questions podcast, we are asking, how do you tackle hate speech, one emoji at a time?

Hello, I’m Emily Elias, and this is the show where we seek out the brightest minds at the University of Oxford, and we ask them the big questions.

For this one, we are talking to a researcher who struggles to explain her work to her grandparents.

Hannah Rose Kirk: Hey, I’m Hannah Rose Kirk, and I’m a researcher at the University of Oxford and the Alan Turing Institute. My current research mission is to try to use artificial intelligence to detect and really just understand online harms, in order to make the internet a safer space for everyone.

Emily: And one of the places you are particularly looking at is hate speech. How is hate speech currently monitored on social media?

Hannah: So, the internet and social media platforms really opened up these very free-flowing and wide-reaching channels of communication. Which of course have lots of benefits for lots of people.

And in the early days of platforms like Twitter and Facebook, the key vision was to create these free spaces of enhanced communication and enriched discussion. And these key principles are still evident in platforms’ terms of service today.

But it really is a case of, with great power comes great responsibility. So, platforms have become the gatekeepers of public discussion. And whether we like it or not, they have this role in moderating these interactions at scale. And hate speech, abuse, harassment and cyber-bullying are just some of these online harms which platforms now must deal with on a day-to-day basis.

Emily: We obviously are not living in some sort of perfect world, where there isn’t hate speech on the internet. So, how is it actually appearing in online conversations?

Hannah: Well, I think the thing is that the growth of online communications kind of have this dark side. That when you get these open channels of communication to people outside your direct circle, you kind of open up the scope of potential harm. And I think scale here is really crucial, right?

So, the velocity and the volume of content nowadays, it’s not longer feasible to burden humans with viewing each piece of content and making a moderation decision on a case-by-case basis. And that’s why, for hate speech, all the major platforms have really turned to these automated solutions, which depend on some form of artificial intelligence.

And really, often people criticise the use of automated content moderation algorithms in the first place, and say we shouldn’t abdicate such important decisions to a machine. I really hear those concerns, but I think it’s picking the wrong argument.

So, to deal with harmful content nowadays at scale, we need automation. And I think then the better question to ask is, how can we make our artificial intelligence as good and robust as possible?

This is a really hard question to answer, because the algorithms developed by Twitter or Facebook are not opened up to researchers for audit, or to the public for scrutiny. So we can’t really know how good their solutions are. And that’s why the inner workings of these AI models are often called a ‘black box system’.

Because we can’t shine a flashlight on how decisions are made for each piece of content anymore, and nor can we know how they handle complex communications, containing irony or sarcasm, or really how they handle other modes of communication, as the internet evolves. So, memes, GIFs or emoji.

Emily: Yes, emojis. How are we actually seeing emojis being used in hate speech?

Hannah: We stated research emoji-based hate back in January. But hate speech or abuse expressed in emoji really came to the forefront of public attention over the summer, and that was following the Euros 2020 Final.

And I don’t know how many listeners are football fans, but basically what happened is three members of England’s team, who missed their penalties in the final, were really subjected to this torrent of emoji-ed abuse. So, monkey emoji, banana emoji, watermelon emoji, were just some of the racist content reported by users on Instagram and Twitter.

But the real challenge here is understanding context. So, the same emoji can have wildly different implications, depending on how it’s used in a sentence. If you use the monkey in a tweet talking about your favourite animal at the zoo, then that’s perfectly legitimate content, there’s no problem with that.

But then if you use the same monkey emoji to racially abuse England’s football players, then that use of the emoji comes with this whole history of racial slurs, discrimination and lived depression.

So, that’s where the challenge really lies, is that there is really no such thing as a hateful emoji, in and of itself, it’s not that much of a black and white distinction. It’s really how the emoji is used in a sentence.

Emily: So, how do you then train a machine, all the nuances that come with our social context?

Hannah: Crucially, what our algorithm has to do is draw this line between harmful and legitimate content. And in the field, that’s what we call the algorithm’s decision boundary. But no algorithm comes with this readymade decision boundary for the task at hand. Kind of in the same way that no child is born fully understanding the line between morally-right and morally-wrong actions.

So, you can think of the researchers or the humans as the teacher here, and the algorithms as the student. We need to show our algorithm multiple examples of harmful and non-harmful uses of emoji, so it can learn how to process future examples without the handholding.

And we actually already have AI models which handle purely text conversations pretty well. So, there are these large-scale AI language models, that have been pre-trained, pretty much, on the entire internet. So, they have a good grasp of human language and language patterns.

But emoji are a slightly more thorny problem. And that’s because humans process emojis visually. So, we see them as little pictures embedded in text. But AI language models encode them as Unicode characters. So, these models can’t ‘see’ what the picture represents.

This means we have to teach the model how to encode an understanding of what an emoji represents in different sentences, and how those choices condition the likelihood of hatefulness.

Emily: Okay, so this sounds really, really complicated. How easy was this for you guys to work out and figure out a code of how to discern hatemojis or hateful emojis?

Hannah: So, I guess the first starting point is actually to research, how are people using emojis to construct hate online. So, we did some empirical research on Twitter, and came up with a number of different ways that emojis were used for hateful purposes.

So, for example, the definition of hate speech usually encompasses some negative intent, opinion or threat towards a society-vulnerable group. And that could be on the basis of an identity attribute, like gender, sexual orientation, disability, religion or race.

And in some examples of emoji-based hate we saw, people would substitute this identity with an emoji, so using a Pride flag to represent the LGBTQA community. And in other examples, we saw people substituting a slur or a negative description into an emoji. And this was kind of the Euros 2020 example, using that monkey emoji.

We also saw some examples of people appending emoji to the end of a sentence. So, saying something like, I don’t know, “My new manager is a woman,” with a heart emoji displays a very positive emotion towards that statement, that’s great. But if you say, “My new manager is a woman,” and you add on an angry face emoji or a vomit emoji, suddenly we have very negative emotion in that statement. And that’s the type of example that starts to cross this line.

Emily: So, you guys were able to come up with an algorithm that was able to scan for these types of uses of emoji?

Hannah: So, we came up with this set of test examples, which we wanted to use as a checklist, for evaluating existing algorithms’ performance, and then work out how our newly-trained algorithms performed better. So, really assessing the current landscape was the starting point.

And we found, rather shockingly, that existing content moderation algorithms, both commercial and academic solutions, really failed quite dismally, when we tested these on these test cases of emoji-based hate, the types of examples, that I explained before.

And Google’s content moderation solution, which is called Perspective API, is used by a number of different internet services, to filter discussions and comment forums. So, I think it’s used by the New York Times, for example. And it had very severe vulnerabilities to even simple examples of emoji abuse, like swapping the race or gender of a targeted individual from a word to an emoji.

I think it’s really key, for me at least, to note that these woeful failings are really concerning. Because emoji are not rare on the internet. So, some reports suggest 10 billion emoji are sent in messages around the world every day. But finding this big crack in model defences, and then doing nothing about it, doesn’t really do anything to help protect against this form of harm in the wild.

So, that’s where we came in and wanted to train better models, I guess to be better teachers.

Emily: And fixing this crack, was it relatively easy to do, in computer science land?

Hannah: Yes. The method we used in the field would be called datacentric AI. So, traditionally, the way that AI models are developed focuses quite heavily on the model architecture. So, that’s how the internal workings and the internal layers of these big AI models function.

But really, a model is only as good as the data it learns from. There is this wisdom in the field – garbage in, garbage out. If we have garbage data, we have a garbage model.

So, my work at the Alan Turing Institute and the Uni of Oxford really focuses on the most fundamental ingredient to any AI system, which is the training data. And to curate this training data, we used a method of AI development called human-and-model-in-the-loop learning.

Now, I’ll unpick what that means for you, because it can be quite a mouthful. So, what we mean by human-and-model-in-the-loop training is that it’s a collaborative and dynamic process of multiple interactions between a team of human annotators and our current AI model.

So, the annotators take the current model and they try to trick it, with examples of hateful and non-hateful content. In the AI community, this is called adversarial learning. Because we intentionally try to break our model in the research and development stage, so we can learn from its mistakes, before we deploy it in the real world.

So, by multiple rounds of building a model, breaking it and then fixing it, we get a more robust and intelligent end product. And moving into the future, this human-centric way of training AI is really important, when we use AI for high-stake tasks, or for more nuanced social computing tasks, such as hate speech misinformation.

We don’t really have a perfect line between one label and another. And keeping humans in the loop is then a critical consideration, to make sure we are training these models in a robust and sustainable way.

Emily: So, you guys were able to make this hate speech detector, this hate speech moderator, using emojis as the data point. I mean, if you can do this, why aren’t we seeing the big tech companies do this?

Hannah: I think it was exciting, but also confusing, that it was so easy for us to train a better model. So, I think I see the work as like a double-edged sword. Because on one hand it was shocking how bad existing models were, at handling even simple statements of abuse written in emoji, and how many examples of emoji abuse were still on the Twitter platform, so hadn’t been taken down.

Yet on the other hand, it was a relatively easy problem to fix. So, I think from just around 2,000 carefully-created training examples, our model made these huge gains in performance, of up to 70% on some types of emoji abuse.

And that was the confusing part. Because then it begs the question of, why are big tech giants not doing this? And this really comes back to the crux of the problem, which is that big tech algorithms are a black box. We can’t test them, we can’t really know what they’re working on.

But what we can learn from is this breadcrumb trail of content that’s still on the platform. So, we know that emoji-based is a problem that these algorithms are not handling particularly well.

Emily: So, do you think that they are working on it, or the appetite isn’t there to attack this specific type of hate speech, using emojis?

Hannah: Yes, I think you’ve hit the nail on the head there. It’s key to note that emojis are just one form of online harm. So, people are super creative when it comes to expressing online hate. And that’s why content moderation has commonly been called a cat-and-mouse game.

Because you always have these perpetrators trying to find cracks in the current system, to trick its capabilities and evade detection. So, this isn’t an easy job for big tech, I don’t want to downplay how difficult it is to try and find solutions that are agile, flexible and adaptable, to these constantly emerging and evolving forms of hate.

New languages, new domains, GIFs, memes, emoji – it’s never going to stop evolving. So, I think what we really need to focus on is making sure that the algorithms of tech companies are really evolving, to keep step with how online hate is changing.

And this is what we need to get good at, reacting quickly. And our method really provided a tangible way to speed up those reactions, and make sure we’re focusing our efforts in the most efficient places. And bearing this data efficiency in mind is something that we really hope that tech companies are working on.

Emily: This podcast was brought to you by Oxford Sparks from the University of Oxford, with music by John Lyons, and a special thanks to Hannah Kirk.

If you want to find us on the internet, we are @OxfordSparks. And you can also go to our website,

I’m Emily Elias. Bye for now.


Transcribed by UK Transcription.