paint-brush
Stylometry Reveals Clues About the Authorship Behind QAnonby@ethnology
136 reads

Stylometry Reveals Clues About the Authorship Behind QAnon

by Ethnology TechnologyDecember 7th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

The authorship attribution techniques, also known as stylometry (Juola, 2008), are used to identify who might hide behind the pseudonym Q. Stylometry tries to spot the idiosyncratic properties of someone’s language, called idiolect (Coulthard, 2004) The authorship of Q is based on the use of functions words, punctuation, patterns of “parts-of-speech” (e.g. ‘Proper noun / Article’, ‘Article / common noun / verb’) (Bjork-Lund and Zechner, 2017)
featured image - Stylometry Reveals Clues About the Authorship Behind QAnon
Ethnology Technology HackerNoon profile picture

Authors:

(1) Florian Cafiero (ORCID 0000-0002-1951-6942), Sciences Po, Medialab;

(2) Jean-Baptiste Camps (ORCID 0000-0003-0385-7037), Ecole nationale des chartes, Universite Paris, Sciences & Lettres.

Abstract and Introduction

Why work on QAnon? Specificities and social impact

Who is Q? The theories put to test

Authorship attribution

Results

Discussion

Corpus constitution

Quotes of authors outside of the corpus have been

Definition of two subcorpus: dealing with generic difference and an imbalanced dataset

The genre of “Q drops”: a methodological challenge

Detecting style changes: rolling stylometry

Ethical statement, Acknowledgements, and References

Authorship attribution

To help understanding who might hide behind the pseudonym Q, we rely here on authorship attribution techniques, also known as stylometry (Juola, 2008; Cafiero and Camps, 2022), a term coined at the turn of the 19th century to designate the measure of stylistic affinity between texts (Lutoslawski, 1898). Stylometry tries to spot the idiosyncratic properties of someone’s language, called idiolect (Coulthard, 2004).


To identify a linguistic signature of a person, regardless of the topic of the text it produces, stylometry focuses on deep linguistic properties, less subject to conscious manipulation or to context variation. Linguistic feature such as functions words (“and”, “or”, “upon” etc.) (Kestemont, 2014; Segarra et al., 2015), punctuation (Jin and Jiang, 2012), patterns of “parts-of-speech” (e.g. ‘Proper noun / Verb / Article’, ‘Article / common noun / verb’) (Bjork- ¨ lund and Zechner, 2017) are for instance used in specific combinations by individuals, and are a reliable clue of who is speaking or writing.


Studies in psycholinguistics have since shown that functions words and grammatical markers are processed differently by the brain than lexical content words, and are indeed not only revealing of individual use, but also correlate to socio-cultural categories such as gender, age group or native language for instance (Argamon et al., 2009; Pennebaker, 2013). Since a famous study on the authors of the Federalist papers (Mosteller and Wallace, 1963), a review advocating the adoption of the future U.S. constitution, many applications have been made to solve literary or historical controversies, such as Caesar’s contribution to the Commentaries on his wars (Kestemont et al., 2016), or Shakespeare (Plecha´ˇc, 2020) and Moli`ere’s (Cafiero and Camps, 2019, 2021) disputed authorships. It was also used to reveal who may have written Elena Ferrante’s novels (Eder, 2018; Rybicki, 2018; Mikros, 2017) or to recognize J.K. Rowling’s style behind the pseudonym Robert Galbraith (Juola, 2015).


Applications to forensic cases have grown in the past decades, being and being more and more used in U.S. courts (Chaski, 2005) among other, and applied to a wide range of cases ranging from immigration disputes (Juola, 2012) to murder investigations (Cafiero and Camps, 2022).


Previous research in this field suggested that there could be more than one author to the Q drops. Stylometric analysis, based on factor analysis on character 3-grams, suggested that there were probably two authors who wrote these texts, one after the other (Orphanalytics, 2020). Analyses of the distribution of the number of character and words in each Qdrops, as well as the use of special characters suggested that two hands could have written the Q drops (Aliapoulios et al., 2021), posts written under one of the 10 tripcodes Q used exhibiting different properties from the rest of them. Examinations of the pictures posted by Q however show that they have mostly been posted from the same Time Zone in Asia, and that original pictures were taken from the same camera all along the period (Fox , pseudo). This is interpreted as a sign that they were would one unique author to the Q drops.


This paper is available on arxiv under CC BY 4.0 DEED license.