Linguistic data analysis of 3 billion Reddit comments shows the alt-right is getting stronger

Ap_172246308879072

Tim Squirrell in Quartz:

The alt-right isn’t one group. They don’t have one coherent identity. Rather, they’re a loose collection of people from disparate backgrounds who would never normally interact: bored teenagers, gamers, men’s rights activists, conspiracy theorists and, yes, white nationalists and neo-Nazis. But thanks to the internet, they’re beginning to form a cohesive group identity. And I have the data to prove it.

The_Donald is a Reddit community with over 450,000 subscribers. It’s the breeding ground for the alt-right, and the fermenting vat in which this identity is being formed. According to data analysis by FiveThirtyEight, it’s US president Donald Trump’s “most rabid online following,” and Reddit itself now claims it is the fourth most visited sitein the US, behind only Facebook, Google, and YouTube.

As part of the Alt-Right Open Intelligence Initiative at the University of Amsterdam, I’ve been working to understand the language of the alt-right and what it can tell us about its members. Working with the UK Home Office’s Extremism Analysis Unit, I used Google’s BigQuery tool, which lets you trawl through massive datasets in seconds, to interrogate a collection of every Reddit comment ever made—all 3 billion of them.

Focusing on The_Donald, I used a script that lets you see which words are most likely to occur in the same comment. Combining this with a tool that allows you to look at the overlap in commenters between different parts of Reddit, I found that the alt-right isn’t just one voice: It’s made up by distinct constituencies that share different opinions and ways to express them, identifiable by the language they use and the other communities they post in.

In other words, there’s a taxonomy of trolls. So who are they, and what language do they use?

More here.