Authors:
(1) Florian Cafiero (ORCID 0000-0002-1951-6942), Sciences Po, Medialab;
(2) Jean-Baptiste Camps (ORCID 0000-0003-0385-7037), Ecole nationale des chartes, Universite Paris, Sciences & Lettres. Table of Links Abstract and Introduction Why work on QAnon? Specificities and social impact Who is Q? The theories put to test Authorship attribution Results Discussion Corpus constitution Quotes of authors outside of the corpus have been Definition of two subcorpus: dealing with generic difference and an imbalanced dataset The genre of “Q drops”: a methodological challenge Detecting style changes: rolling stylometry Ethical statement, Acknowledgements, and References Results Profiles capturing unconscious features of style such as grammatical morphemes have been built from two corpus of texts (a large corpus with all 13 candidates, but sometimes a low amount of relevant material for some of them, and a smaller, more controlled corpus) signed by each putative authors, using supervised machine learning, with a general attributive performance of over 97% (Materials and methods). They show that, for most of the slices, the highest decision function is by far by Ron W. (Fig. 1). The most significant deviation from this concerns the first period of the QDrops, before the switch to 8chan. In this period, the larger corpus analysis gives Paul F. as, by far, the top candidate, before a period where Paul F. and Ron W. signals are competing, until finally Ron W. signals takes over, after a second break that closely matches a tweet described by Paul F. himself as the last authentic Qdrop, that goes There will be no further posts on this board under this ID. This will verify the trip is safeguarded and in our control. This will verify this board is compromised. God bless each and every one of you. Fight, fight, fight! Q The dominance of Paul F. in the first period is not seen at all on the smaller corpus analysis. More secondarily, there are very localised spikes of Christina U. and Michael F. signals, especially in the more recent period of the QDrops. The rest of the candidates lag far behind. Results obtained on the two rolling analyses, and their eventual difference, have to be contextualised by investigating the features who received the strongest coefficients in the different SVM classifiers (fig. 2). For some candidates, like Ron W., the features seem mostly idiolectal, like the 3-grams ‘nyb’, ‘ybo’ (in ‘anybody’) or the relative avoidance of ‘ th’ and ‘his’ and remain stable in between both analyses. This is also the case, for instance, for Donald T. whose most distinctive feature is ‘fak’, part of his very idiolectal ‘FAKE’, while other are more content related (‘mpg’ is even due to the regularity with which he mentioned ‘BrianKempGA’ in the training material), a consequence of the choice of characters 3-grams as features. For authors like Christina U., the features are very content and news-related, like the 3-grams extracted from ‘Israel(i)’, ‘blm’, ‘psy’ (psychologists, psychiatrists, . . . ), etc. In the case of Michael F., the features seem very dependent on the small quantity of the available training material, and the grandiloquent and religious nature of the few material available, with features such as ‘god’ (‘God’), ‘hty’ (‘almighty’), ‘lib’ (‘liberty’). Finally and more importantly, these features, in their variation between analyses, give very good insight in the different results concerning Paul F. In the small corpus, due to the exclusion of his book, the most distinctive features for him are all cursory words and racist insults (‘ fu’, ‘fuc’, ‘uck’, ‘shi’, ‘hit’, ‘ ni’, ‘nig’, ‘igg’, ‘gge’, etc.); on the larger corpus, on the other hand, with the book included, they seem revealing of more neutral idiolectal (and grammatical) features, with pronouns, auxiliaries, determiners ( ‘he ’, ‘had’, ‘was’, ‘the’, etc.). These elements point to the larger corpus analysis being more reliable in what concerns Paul F. (especially in a crossgenre setup) than the smaller corpus analysis. This paper is available on arxiv under CC BY 4.0 DEED license. Authors: (1) Florian Cafiero (ORCID 0000-0002-1951-6942), Sciences Po, Medialab; (2) Jean-Baptiste Camps (ORCID 0000-0003-0385-7037), Ecole nationale des chartes, Universite Paris, Sciences & Lettres. Authors: Authors: (1) Florian Cafiero (ORCID 0000-0002-1951-6942), Sciences Po, Medialab; (2) Jean-Baptiste Camps (ORCID 0000-0003-0385-7037), Ecole nationale des chartes, Universite Paris, Sciences & Lettres. Table of Links Abstract and Introduction Abstract and Introduction Why work on QAnon? Specificities and social impact Why work on QAnon? Specificities and social impact Who is Q? The theories put to test Who is Q? The theories put to test Authorship attribution Authorship attribution Results Results Discussion Discussion Corpus constitution Corpus constitution Quotes of authors outside of the corpus have been Quotes of authors outside of the corpus have been Definition of two subcorpus: dealing with generic difference and an imbalanced dataset Definition of two subcorpus: dealing with generic difference and an imbalanced dataset The genre of “Q drops”: a methodological challenge The genre of “Q drops”: a methodological challenge Detecting style changes: rolling stylometry Detecting style changes: rolling stylometry Ethical statement, Acknowledgements, and References Ethical statement, Acknowledgements, and References Results Profiles capturing unconscious features of style such as grammatical morphemes have been built from two corpus of texts (a large corpus with all 13 candidates, but sometimes a low amount of relevant material for some of them, and a smaller, more controlled corpus) signed by each putative authors, using supervised machine learning, with a general attributive performance of over 97% (Materials and methods). They show that, for most of the slices, the highest decision function is by far by Ron W. (Fig. 1). The most significant deviation from this concerns the first period of the QDrops, before the switch to 8chan. In this period, the larger corpus analysis gives Paul F. as, by far, the top candidate, before a period where Paul F. and Ron W. signals are competing, until finally Ron W. signals takes over, after a second break that closely matches a tweet described by Paul F. himself as the last authentic Qdrop, that goes There will be no further posts on this board under this ID. There will be no further posts on this board under this ID. This will verify the trip is safeguarded and in our control. This will verify the trip is safeguarded and in our control. This will verify this board is compromised. This will verify this board is compromised. God bless each and every one of you. God bless each and every one of you. Fight, fight, fight! Fight, fight, fight! Q Q The dominance of Paul F. in the first period is not seen at all on the smaller corpus analysis. More secondarily, there are very localised spikes of Christina U. and Michael F. signals, especially in the more recent period of the QDrops. The rest of the candidates lag far behind. Results obtained on the two rolling analyses, and their eventual difference, have to be contextualised by investigating the features who received the strongest coefficients in the different SVM classifiers (fig. 2). For some candidates, like Ron W., the features seem mostly idiolectal, like the 3-grams ‘nyb’, ‘ybo’ (in ‘anybody’) or the relative avoidance of ‘ th’ and ‘his’ and remain stable in between both analyses. This is also the case, for instance, for Donald T. whose most distinctive feature is ‘fak’, part of his very idiolectal ‘FAKE’, while other are more content related (‘mpg’ is even due to the regularity with which he mentioned ‘BrianKempGA’ in the training material), a consequence of the choice of characters 3-grams as features. For authors like Christina U., the features are very content and news-related, like the 3-grams extracted from ‘Israel(i)’, ‘blm’, ‘psy’ (psychologists, psychiatrists, . . . ), etc. In the case of Michael F., the features seem very dependent on the small quantity of the available training material, and the grandiloquent and religious nature of the few material available, with features such as ‘god’ (‘God’), ‘hty’ (‘almighty’), ‘lib’ (‘liberty’). Finally and more importantly, these features, in their variation between analyses, give very good insight in the different results concerning Paul F. In the small corpus, due to the exclusion of his book, the most distinctive features for him are all cursory words and racist insults (‘ fu’, ‘fuc’, ‘uck’, ‘shi’, ‘hit’, ‘ ni’, ‘nig’, ‘igg’, ‘gge’, etc.); on the larger corpus, on the other hand, with the book included, they seem revealing of more neutral idiolectal (and grammatical) features, with pronouns, auxiliaries, determiners ( ‘he ’, ‘had’, ‘was’, ‘the’, etc.). These elements point to the larger corpus analysis being more reliable in what concerns Paul F. (especially in a crossgenre setup) than the smaller corpus analysis. This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Machine Learning and Linguistic Profiles Sheds Light on Q's Possible Authors

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Roadmap for Addressing Critical Challenges in Human-Machine Social Systems

GenAIbot Conversations in British vs. American English

Social Impacts, Global Reach, and the Mystery of 'Q'

Stylometry Reveals Clues About the Authorship Behind QAnon

Addressing Quotations and Copy-Paste in QAnon Authorship Attribution

Beyond Deleted: Cradle's True 'Disappearing' Message Technology.

A Roadmap for Addressing Critical Challenges in Human-Machine Social Systems

GenAIbot Conversations in British vs. American English

Social Impacts, Global Reach, and the Mystery of 'Q'

Stylometry Reveals Clues About the Authorship Behind QAnon

Addressing Quotations and Copy-Paste in QAnon Authorship Attribution

Beyond Deleted: Cradle's True 'Disappearing' Message Technology.

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps