paint-brush
Meet Sarah Masud, Google Phd Fellow and Final Year Doctoral Student at IIIT-Delhiby@themessier

Meet Sarah Masud, Google Phd Fellow and Final Year Doctoral Student at IIIT-Delhi

by Sarah MasudSeptember 26th, 2024
Read on Terminal Reader
tldt arrow

Too Long; Didn't Read

Sarah Masud is a final year doctoral student at the Laboratory for Computational Social Systems based out of IIIT-Delhi, India.
featured image - Meet Sarah Masud, Google Phd Fellow and Final Year Doctoral Student at IIIT-Delhi
Sarah Masud HackerNoon profile picture


HackerNoon editorial team has launched this interview series with women in tech to celebrate their achievements and share their struggles. We need more women in technology, and by sharing stories, we can encourage many girls to follow their dreams. Share your story today!

Tell us about yourself!

I am a final year doctoral student at the Laboratory for Computational Social Systems based out of IIIT-Delhi, India. By employing computational tools, mainly applied Natural Language Processing (NLP) and Network Science, I investigate the contextual signals that can help improve the detection of hate speech on the web. I have been fortunate to receive financial aid via the Google Phd Fellowship and the Prime Minister Doctoral Fellowship (in partnership with Wipro AI) to support my studies. Now, towards the end of my Phd, I am expanding my horizon to investigate other forms of harmful content on the web. Lately, I have been exploring the themes of fact-checking, fake news, propaganda analysis, and news narrative understanding. The last of which is the topic of my upcoming postdoc work at CopeNLU at the University of Copenhagen.


As part of my academic and professional journey, I have been fortunate to work full time at Red Hat, furnish my undergrads in CSE from Jamia Millia Islamia (JMI), be a visiting research student at TUM and volunteer with organizations such as AnitaB and Journal of Open Source Softwares (JOSS).

Why did you choose this field in the first place?

The idea of pursuing computer science or being a phd student was not something I had planned for. Early exposure to science and technology both at home and school, as well as my curiosity to understand how things work, led me to develop a love of science and engineering. However, once I started my undergraduate studies, I struggled to find exactly what I planned to apply my CSE knowledge towards. I have failed at more undertakings than I have succeeded in. I chose to continue my work and studies in computer science because of its interdisciplinary application, especially once I discovered that I enjoyed NLP as a subfield while doing my undergraduate thesis on topic modelling. My interest in applied NLP was further solidified when I got an opportunity to work in the area of computational social systems (CSS) as a part of my PhD.

What was the biggest setback/failure that you faced, and how did you manage it?

PhD in itself is full of setbacks and failures, with success sprinkled here and there. One incident I especially remember, as it lasted almost two years, was my work on developing a dataset and model to detect hate speech in codemixed Hinglish. It took us one year to curate and annotate the dataset from Twitter. Given the Indian context and language knowledge needed to detect hateful connotations correctly, we could not employ existing crowdsourcing platforms like Amazon Mechanical Turk. We spend endless hours performing annotations ourselves to produce an annotation guideline and then seek Indian annotators to help perform large-scale annotations. This process was iterative, slow, and mentally taxing for my colleagues and me. However, our woes did not end there, as the standard practice of finetuning a language model (LM) for hate speech detection did not produce effective results. It took us multiple trials and errors to realise that instead of focusing on building a more "complex and fancy system", we needed to improve the input signals, and it took us another round of unsupervised auxiliary data curation and modelling to augment the standard finetuning pipeline to yield improvements when detecting hatefulness in Indic context. The two years that it took from start to finish of this work were full of uncertainty; coupled with the Covid lockdown, it made me feel as if my phd journey would come to an abrupt end. What got us out of the slump was revisiting the research around contextual information, which was not a common practice in NLP modelling for hate speech but was closer to how content moderators make decisions in the real world.

What's your biggest achievement that you're really proud of?

One project that I am really proud of is a work we recently published on Arxiv (it is still under review for publication). The project explores the use of GenAI tools like ChatGPT in understanding journalistic narratives of the 5W1H (What, Why, When, Where, Who and How) and uncovering neutrality or lack thereof in independent fact-checking. The research not only reiterates the need to look beyond the USA-centric left-right political bias but also contributes to the initial interdisciplinary of combining human-AI efforts to study large-scale corpus and quantify subtle reporting/narrative biases that go unnoticed. The project idea came to me when I was randomly scrolling through an independent news portal, which is known to be "right-aligned", and saw that they had a very active fact-checking project. It led to framing the question of how explicit or implicit political leaning of the independent fact-checking organizations might lead to deviation from neutrality in fact-checking. I was very happy when what I hypothesised from initial manual observations was successfully quantified by our experiments. The bigger question this leaves for the research community is how we quantify the impact of different versions of "truth" and the subtle biases that cause deviation from a neutral point of view.

What tech are you most excited/passionate about right now and why?

While working with a lot of Hindi and Hinglish datasets from Twitter for analysing hate speech, I kept coming across gaps in the performance of NLP models when working with non-English text and non-Western cultural references. In this regard, I am excited about exploring both modelling paradigms and evaluation benchmarks to improve the adaptability of current NLP systems to non-Western connotations. Of course, by developing more culturally tuned systems, we will be able to understand toxic and stereotyped references better. I think a safer deployment of NLP-based models and interfaces is only possible if, at the granular level, we actively work on improving their adaptability to specific cultures and linguistic patterns.


Another area within CSS that I am keen on exploring is narrative building. I recently read "The Language of the Third Reich" by Victor Klemperer and was amazed by how many of the techniques and patterns of narrative building applied by today's media can be traced back to World War II and prior eras. Given that the human psyche remains more or less the same, no matter the language or medium of storytelling, I am keen on applying NLP to help establish long-range narrative understanding.

What tech are you most worried about right now and why?

It is not technology per se, but I am worried about AI being presented as either an impeccable panacea that will fix all humanity's problems or an existential threat that will wipe off the human race. Inflating the abilities of AI beyond what is possible by current paradigms is leading to the investment of resources in the wrong direction. It takes the conversation away from current threats to deploying these systems without proper safety checks in place and ignores the biases it propagates. A person of colour being wrongly convicted of a crime due to biased AI modelling is much more probable and very much a reality than AGI wiping off humanity.

In your opinion, why do we see a huge gender gap in the tech industry, and how can we reduce it?

Working in tech, whether in industry or academia, is a very consuming process. On the one hand, technical skills and the accompanying soft skills are something anyone can acquire via varying training channels (formal or informal). On the other hand, the on-paper equality of the tech space is marred by differences in social expectations from one gender specifically. Unless our social systems become more accommodating of working women and provide adequate moral, financial and infrastructural support, women in tech or the workforce in general will suffer.

Do you have any advice for aspiring girls who want to join the field?

For girls who aspire to work in tech, I would say that they should not feel intimidated; the tech world is full of possibilities and opportunities. There is room for different projects and perspectives to thrive, so keep participating in as many projects as possible and develop your sense of what works and what does not.

What are your hobbies and interests outside of tech?

Outside of tech, I really enjoy visiting museums and historical sites. Whenever I find time on weekends, I participate in guided tours in my city. I also dabble in writing poetry sometimes; I like to refer to myself as a seasonal poet :P However, my favourite hobby has always been reading. From books to blogs to research papers, the act of reading is very relaxing to me.