paint-brush
A Quick Guide to Identify Twitterbots Using AIby@shashankgupta_54342
1,662 reads
1,662 reads

A Quick Guide to Identify Twitterbots Using AI

by Shashank GuptaDecember 16th, 2017
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

In one of our last <a href="https://blog.karna.ai/using-ai-to-combat-the-menace-of-fake-accounts-on-social-media-8af96bc71842" target="_blank">blog post</a>, we discussed how to identify ‘Fake Accounts’ or ‘Potential Spammers’ on Twitter. It is important to filter out such information to get most reliable and accurate insights. A lot of firms and individuals have taken the game forward and used Twitterbots to automate and fasten the content delivery. A <a href="https://arxiv.org/pdf/1703.03107.pdf" target="_blank">study</a> estimated that the number of active bots on twitter can be as high as 15% of the total users.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - A Quick Guide to Identify Twitterbots Using AI
Shashank Gupta HackerNoon profile picture

In one of our last blog post, we discussed how to identify ‘Fake Accounts’ or ‘Potential Spammers’ on Twitter. It is important to filter out such information to get most reliable and accurate insights. A lot of firms and individuals have taken the game forward and used Twitterbots to automate and fasten the content delivery. A study estimated that the number of active bots on twitter can be as high as 15% of the total users.

Initially, Twitterbots were made to reduce human effort. Take Netflix Bot for an example. It Tweets whenever a new show or movie is added on Netflix.

Netflix Bot in action.

There are some extraordinary ones as well. For example, someone has created a very smart online version of the Big Ben, that marks the passing of every hour as shown in the tweet below. Now that humanity is spending more and more of its time online, it would be just a matter of time that our monuments start having an online presence too :).


<blockquote class=”twitter-tweet” data-lang=”en”><p lang=”tl” dir=”ltr”>BONG BONG BONG BONG BONG BONG BONG BONG BONG BONG BONG</p>— Big Ben (@big_ben_clock) <a href=”https://twitter.com/big_ben_clock/status/864058030408822784?ref_src=twsrc%5Etfw">May 15, 2017</a></blockquote><script async src=”https://platform.twitter.com/widgets.js" charset=”utf-8"></script>

But, there is a large herd of Twitterbots who post a large amount of malicious and spam content on the platform. I am sure you can find some in your followers list as well. According to Wikipedia, bots had a role to play in the US Presidential Election — 2016 as well.

Role of TwitterBots in US Presidential Election


A subset of Twitter Bots programmed to complete social tasks played an important role in the United States 2016 Presidential Election. Researches estimated that pro-Trump bots generated four tweets for every pro-Clinton automated account and out-tweeted pro-Clinton bots 7:1 on relevant hashtags during the final debate. Deceiving twitter bots fooled candidates and campaign staffers into retweeting misappropriated quotes and accounts affiliated with incendiary ideals.–Wikipedia

Twitterbots and spammers try to cloud the views of other users by constantly promoting fake news and opinions. Given that there is no human effort required, bots can tirelessly keep on tweeting about a topic and help make it trending. For a political analyst, market researcher or anyone else seeking to do in-depth analysis using social media, it is important to identify and filter out these bots to get genuine unbiased opinions.

The Hypothesis

The idea behind our AI driven approach to identify bots on social media is based on this hypothesis: “Tweets made by bots are related to a very narrow topic/context while humans’ tweets are much more diverse”.

How did we do it?

To use this approach to automatically identify bots, we crawled the latest tweets posted by a large sample of Twitter accounts. For each account, we converted the Tweet text into vectors and calculated the similarity by checking the average distance metrics for these Tweets. We made sure that the sample of accounts were diverse.

If a handle tweets about the same topic and theme, the tweets(individual data points) will be closely located in the hyperspace due to the semantic similarity. These closely packed similar tweets form a cluster. We can quantify the similarity by calculating the cosine distance between any two data points.

A representation of clusters

The table below represents the results of the analysis. Here, Mean Distance is the average of all the cosine distances between the individual data points. Lesser the Mean Distance, more similar the Tweets. Clearly, you can infer this from the table. The aforementioned Big Ben Bot has the lowest Mean Distance among the chosen ones as its posts only contains the word ‘BONG’.

Mean Distance Table

We chose a few ‘Spammers’ accounts also so as to highlight difference between a bot and a spammer. The spammers post about multiple topics time to time but bots post about a specific theme generally (We did a similar analysis to detect spammers account. You can check it out here). Thus, their Mean Distance is far greater than that of the bot’s. Notice that, the Mean Distance of TOIIndiaNews (leading Indian news publisher) is nearer to the Mean Distance of the bots. Generally, such handles follow a standardized structure to post news. Therefore it has relatively lesser Mean Distance.

Impacts of bots on the real world

I am listing down a few cases where Twitterbots were influential and why it is important to identify them.

  • The number of followers on social media is considered a popularity metric for celebrities. But is it really? As mentioned earlier, around 15% of Twitter users might be bots. Thus, number of followers doesn’t come out as a concrete metric for popularity. During 2012 US Presidential Elections, it was reported that Barack Obama’s 29.9% followers might be bots/fake and this number for Mitt Romney was around 21.9%. The number of followers after removing bots and spammers can serve as a better popularity metric.
  • Twitterbots have said to influence the opinions of voters by tweeting and retweeting tons of pro-Trump content during 2016 US Presidential Elections. As mentioned earlier, pro-Trump bots generated four tweets for every pro-Clinton automated account and out-tweeted pro-Clinton bots 7:1 on relevant hashtags during the final debate. Some of the content shared by these bots was fake and deceiving. Thus it becomes really important to clearly identify these bots to get views and opinions of only from real people.
  • Recently concluded French Presidential Elections also saw an involvement of bots. Just before the election, a massive 9 GB of classified campaign documents related to Emmanuel Macron were posted online. Twitterbots kept on posting about it and helped in making the topic a trending one for for hours before the election. Though, it seems to have had little effect on the outcome as Macron won comfortably (which we predicted correctly using AI).
  • Suppose a brand hires a marketing agency for a publicity campaign. However, to judge the efficacy of the campaign it is import to understand whether the virality of the campaign was due to push from spammers/bots. In that case, it might have a negative effect on the brand and the brand will be in a fallacy of increased number of followers. These bots are not the real customers. Thus, it is a loss from both ends for the brand.

These are a few notable places where bots have influenced the views of the audience. Though meant for a better role in the social media, bots are now being targeted mostly as spam on Twitter. Social media platforms are constantly being optimized to fight against such menace. Like any other technology, if used ethically, bots can help you in many ways. It can help you in customer support, marketing and general business development. Interesting times are ahead, as the future holds door for machine intelligence era. It is up to intelligent AI algorithms to help us phase out spam, bots and fake content from social media platforms.

The above study was carried out by Karna AI, Market Research division of ParallelDots Inc.ParallelDots AI APIs , are a Deep Learning powered web service by ParallelDots Inc, that can comprehend a huge amount of unstructured text and visual content to empower your products. You can check out some of our text analysis APIs and reach out to us by filling this form here or write to us at [email protected]