paint-brush
Addressing Quotations and Copy-Paste in QAnon Authorship Attributionby@ethnology

Addressing Quotations and Copy-Paste in QAnon Authorship Attribution

by EthnologyDecember 7th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

QAnon is a social network that claims to have been hacked by a group of prominent politicians. This article presents a systematic analysis of the corpus of texts used to create the profile. The authors used a Locality-Sensitive Hashing algorithm to analyse the texts. They used a TextReuse package to analyse sentences and broken word bi-grams.
featured image - Addressing Quotations and Copy-Paste in QAnon Authorship Attribution
Ethnology HackerNoon profile picture

Authors:

(1) Florian Cafiero (ORCID 0000-0002-1951-6942), Sciences Po, Medialab;

(2) Jean-Baptiste Camps (ORCID 0000-0003-0385-7037), Ecole nationale des chartes, Universite Paris, Sciences & Lettres.

Abstract and Introduction

Why work on QAnon? Specificities and social impact

Who is Q? The theories put to test

Authorship attribution

Results

Discussion

Corpus constitution

Quotes of authors outside of the corpus have been

Definition of two subcorpus: dealing with generic difference and an imbalanced dataset

The genre of “Q drops”: a methodological challenge

Detecting style changes: rolling stylometry

Ethical statement, Acknowledgements, and References

Dealing with quotations and copy/paste

Quotes of authors outside of the corpus have been excluded as much as possible by close reading: in particular, quotes from Q, Wikipedia, the Stanford Encyclopedia of Philosophy, Abraham Lincoln, the Intelligence Resource Program (irp-fas), Steve Scully’s biography etc. All these quotes have been removed.


Direct quotations (with or without quotation marks) and copy/paste between the writings of the different candidates can also occur. A good deal of them quote Donald, Eric or Melania T. – Q does it too. There is also a certain number of quotations from Q by the others (such as Paul F. for instance). This could lead to small biases in the constitution of idiolectal profiles. To avoid this, we then proceeded to systematically detect citation between the candidates themselves. Direct pairwise comparison being computationally too costly for a corpus of this size, we used a Locality-Sensitive Hashing (LSH) algorithm. To that end, we used the open source TextReuse package (Mullen, 2020). The corpus was tokenised into sentences, and broken word bi-grams (with skip of 1, that is, allowing for any one word to be inserted between the two words of the bigrams) were counted. For all pairs of sentences, a Jaccard similarity score was computed. Be A and B two samples considered as sets of bi-grams, the Jaccard similarity is computed as:



All pairs of sentences with a Jaccard similarity score superior or equal to 0.5 (i.e., at least half of their bi-grams in common) were examined by a human expert, and quotations removed.


Even for J = 1, we were sometimes confronted to false positives. Dan S. and Melania T. both use once the sentence “we are all in this together”, without directly citing each other. We thus left this passage in both their texts. Rarely used, the sentence “the American people are not stupid” nevertheless appears in different texts. It was kept in the texts studied, as other simple sentences (“thank you for your service” etc.)


Other situations were trickier to address. For instance, Dan S. uses once the sentence: “the best is yet to come”. It is used five times by Q, himself quoting former President Donald Trump. This sentence could be used by anyone without directly quoting Q or Donald Trump. Yet, as its use by Dan S. starts with “As the President says. . . ”, we considered it a direct quotation and proceeded to deletion from Dan S.’s text. Yet, we did not delete it from Q’s own writing, as it is never used as an explicit quotation: the sentence could be used in another context, the person(s) writing the Qdrops with this sentence could try to impersonate Donald Trump, etc. In any of these cases, it would be legitimate to leave the information. Same thing goes for expression such as “the world is watching” or “make America great again”, used by Donald Trump. but also by Q and some of the potential candidates here.


This paper is available on arxiv under CC BY 4.0 DEED license.