Uncovering Gender Bias within Journalist-Politician Interaction in Indian Twitter: Data Collectionby@mediabias
415 reads
415 reads

Uncovering Gender Bias within Journalist-Politician Interaction in Indian Twitter: Data Collection

Too Long; Didn't Read

In this paper, researchers analyze gender bias in Indian political discourse on Twitter, highlighting the need for gender diversity in social media.
featured image - Uncovering Gender Bias within Journalist-Politician Interaction in Indian Twitter: Data Collection
Media Bias [Deeply Researched Academic Papers] HackerNoon profile picture

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.


(1) Brisha Jain, Independent researcher India and [email protected];

(2) Mainack Mondal, IIT Kharagpur India and [email protected].


In this section we describe our data collection process from Twitter. We specifically collected data about the interactions between specific Indian politicians and journalists on Twitter sampled based on their popularity and gender. First, we start with how we created a list of Indian journalists and politicians for our study.

3.1. Identifying Twitter accounts of Indian politicians and journalists

Identifying Twitter accounts of individual Indian politicians: We leveraged a dataset of Indian Politicians from previous research by Pal et al.[20]. This dataset contained names and handles of multiple Indian Twitter accounts which are involved in politics (labelled as politicians). However, we noted that this dataset contained accounts of both political organizations (e.g., BJP for Andaman and Nicobar Islands) as well as individuals. To that end, we first cleaned the dataset, by cross-matching the names from this dataset with names from MyNeta[3] which is an open data repository platform run by Association for Democratic Reforms (ADR) for bringing transparency to Indian elections. For each of the Indian political accounts in Pal et al. ’s dataset, we searched MyNeta platform with the name of the account. If the search found no politicians with this name, then we discard the account from our analysis as that account is probably not from an individual. At the end of the procedure, we ended up with 4,484 Twitter accounts of politicians.

Identifying Twitter accounts of individual Indian political journalists: Next, we focus on the Twitter accounts marked as individual journalists from a dataset of Twitter influencers released by Pal et al.’s previous research [3] (separate from accounts of media houses). There were 4,099 such accounts. However, we again faced a challenge—how can we identify the political journalists? Specifically, we noted that this list contains several journalists who are not associated with political reporting and focus on areas such as entertainment, sports etc. Thus, we set to identify political journalists— journalist accounts that directly mentioned politicians’ accounts in a non-trivial tweet (e.g., after discounting tweets with only emojis, urls, birthday greetings). To that end, we collected all the tweets posted by these 4,099 accounts between Jan 2020 and Dec 2022 using an open-source tool called crape. Then we discounted tweets with only emoji, urls, greetings and checked if any of the final tweets mentioned an individual Indian politician’s Twitter account (collected as described above). Finally, we include 3,214 journalists’ accounts (78.4%) in our dataset as political journalists.

Verifying the accuracy of Twitter accounts: Finally, we manually verified if our filtering approach actually identified the correct Twitter accounts of Indian politicians and political journalists. We randomly sampled forty politicians and twenty journalist accounts. Then an author visited the actual Twitter accounts and read the first 20 tweets to ensure the account indeed belonged to an Indian politician (or political journalist). In 92.5% of the random sample, our filtering approach correctly identified Twitter accounts of Indian politicians (or political journalists).

3.2. Inferring gender of Indian politicians and political journalists

Next, we infer the gender of the Twitter accounts of Indian politicians (or political journalists) as identified in the previous section. For this purpose, we used a service called Generize [25]. This service maps names to genders, is customized to Indian names, and previous studies reported high accuracy of gender inference from this service [19]. Once we infer gender of all accounts, for this study we focused on the most popular (by number of followers) politician and journalist accounts. Specifically, we sorted the politician accounts by the follower count and identified the top 50 accounts for male politicians and female politicians (as identified by Genderize). We further manually verified the accuracy of the inferred gender for these 100 Twitter accounts. We similarly identified the most popular 100 journalist accounts (50 male and 50 female).

3.3. Collecting journalist-politician Twitter interaction data

Finally, to answer our research questions, we collect interaction data between the Indian politicians and political journalists’ accounts. Specifically, we collected all the tweets posted by 100 popular political journalist accounts and then filtered out the tweets which mentioned any of the 100 popular Indian politicians in our dataset. Thus, we divided our collected tweets into the following four categories—Male journalist’s tweets mentioning Male Politicians (MJ-MP), Female journalist’s tweets mentioning Male Politicians (FJ-MP), Male journalist’s tweets mentioning Female Politicians (MJ-FP) and Female journalist’s mentioning Female Politicians (FJ-FP). In total we collected 21,188 unique tweets. Note that a single tweet can mention multiple accounts.

Table 1: The # of tweets posted by Indian journalists mentioning politicans. The Female politicians received relatively less mentioned tweets.

We note that, almost all hundred journalists across genders collectively mentioned our chosen popular politician accounts in their tweets. Furthermore, Table 1 presents the number of tweets across our four categories. Notably, the accounts of female politicians received considerably less mention from both male and female Indian journalists. Now, we analyzed this interaction data collected from Twitter to identify potential gender bias in the journalist-politician interactions in Indian Twitter. Furthermore, Table 2 presents tweet excerpts from each of the four categories. These example demonstrate that many of the tweets in our dataset across different categories are related to policy decisions and general governance.