Press "Enter" to skip to content

Researchers develop strategy for identifying Twitter trolls in 50 tweets

Scientists claim to be able to identify Twitter trolls in 50 tweets using algorithms to that can detect ‘distinctive’ patterns of word repetition.  

The algorithms identify linguistic patterns in tweets in order to distinguish deceptive ‘troll’ messages – which aim to achieve a specific purpose while also masking that purpose – from those posted by ‘normal’ Twitter users.  

Surprisingly, trolls start repeating words and word pairs later than normal users, due to their attempts to influence many different accounts at once. 

While previous research has investigated distinguishing characteristics of troll tweets such as timing, hashtags, and geographical location, few studies have examined linguistic features of the tweets themselves.  

‘Though troll writing is usually thought of as being permeated with recurrent messages, its most characteristic trait is an anomalous distribution of repeated words and word pairs,’ said study author Sergei Monakhov at Friedrich Schiller University in Germany. 

‘Using the ratio of their proportions as a quantitative measure, one needs as few as 50 tweets for identifying internet troll accounts.’

Troll internet messages aim to influence particular people, spread fake news or sow discord, while also masking what they’re doing.  

In one high profile example of trolling in February 2018, the US government indicted 13 Russian nationals for interfering with the 2016 US presidential election through social media.

These individuals were accused of using false American personas to operate social media pages and groups designed to attract American audiences and cause discord by disparaging Democrat candidate Hilary Clinton.

The repercussions of this were far-reaching, almost leading to the impeachment of Trump, damaging the Russian economy due to imposed US sanctions and almost ruining relations between the two nations.

‘Taking into account the global scale of this scandal and its ever-widening ramifications for society, one can only wonder why the phenomenon of troll writing has not received to date any substantial scientific attention,’ Monakhov says in his research paper. 

For his study, Monakhov used a collection of tweets connected to the Internet Research Agency, a Russian government-owned ‘troll factory’, as Monakhov calls it.  

Affectionately known as the Trolls from Olgino, the Internet Research Agency is a Saint Petersburg-based company engaged in online influence operations on behalf of Russian businesses.  

Monakhov combined this dataset with a sample of ‘genuine’ or non-toll tweets from US congresspeople – accredited accounts. 

He identified distinctive patterns of repeated words and word pairs that are different from linguistic patterns in tweets from the normal accredited Twitter users. 

Monakhov showed that these troll-specific restrictions result in distinctive patterns of repeated words and word pairs that are different from patterns seen in genuine, non-troll tweets. 

Trolls have a limited number of messages to convey, but must do so multiple times and with enough diversity of wording and topics to fool readers. 

‘Understandably, they are afraid of being repetitive because that will inevitably lead to their identification as trolls,’ Monakhov told MailOnline. 

‘Hence, their only possible strategy is to keep diluting their target message with ever-changing filler words – and that’s exactly why we can so easily detect them.’ 

On that basis, with every new tweet from a genuine author, the probability of using a word that has already been used should increase, while for a troll account, the probability of using a word that has already been used should continuously decrease or stay constant.  

Though troll writing is usually thought of as being permeated with recurrent messages, he showed that its defining characteristic is an irregular distribution of repeated words and word pairs.

Trolls also use formal jargon, which isn’t seen as often in general digital communications between normal people. 

Then, Monakhov tested an algorithm that uses these distinctive patterns to distinguish between genuine tweets and troll tweets. 

He found that the algorithm required as few as 50 tweets for accurate identification of trolls versus congresspeople. 

The algorithm also correctly distinguished troll tweets from tweets by US President Donald Trump, which have been deemed by Twitter as provocative and ‘potentially misleading’, but are not crafted to hide his purpose. 

Trump therefore cannot be called a troll in Monakhov’s sense of the word. even though he ‘undoubtedly repeats himself over and over again’. 

This new strategy for identifying troll tweets could help inform efforts to combat hybrid warfare while preserving freedom of speech. 

Further research will be needed to determine whether it can accurately distinguish troll tweets from other types of messages that are not posted by public figures.

‘Though troll writing is usually thought of as being permeated with recurrent messages, its most characteristic trait is an anomalous distribution of repeated words and word pairs,’ Monakhov said.

‘Using the ratio of their proportions as a quantitative measure, one needs as few as 50 tweets for identifying internet troll accounts.’

The study has been published in PLOS ONE. 

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *