paint-brush
Challenges in Building a QAnon Authorship Corpusby@ethnology
223 reads

Challenges in Building a QAnon Authorship Corpus

by Ethnology TechnologyDecember 7th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Because of their contested contents, many accounts have been deleted by the social media platforms. This raised many challenges for the corpus constitution, which had to rely on data collection realized before the deletion, or on difficulty searchable web archives. All sources do not have the same time-span, and were created at different dates, discussing potentially different news, a source of heterogeneity in training material.
featured image - Challenges in Building a QAnon Authorship Corpus
Ethnology Technology HackerNoon profile picture

Authors:

(1) Florian Cafiero (ORCID 0000-0002-1951-6942), Sciences Po, Medialab;

(2) Jean-Baptiste Camps (ORCID 0000-0003-0385-7037), Ecole nationale des chartes, Universite Paris, Sciences & Lettres.

Abstract and Introduction

Why work on QAnon? Specificities and social impact

Who is Q? The theories put to test

Authorship attribution

Results

Discussion

Corpus constitution

Quotes of authors outside of the corpus have been

Definition of two subcorpus: dealing with generic difference and an imbalanced dataset

The genre of “Q drops”: a methodological challenge

Detecting style changes: rolling stylometry

Ethical statement, Acknowledgements, and References

Corpus constitution

Because of their contested contents, many accounts have been deleted by the social media platforms. Other accounts were simply deleted by the user themselves. This raised many challenges for the corpus constitution, which had to rely on data collection realized before the deletion, or on difficulty searchable web archives. Moreover, all sources do not have the same time-span, and were created at different dates, discussing potentially different news, a source of heterogeneity in training material that creates challenge in the attribution procedure.


We list here the sources we used for each candidate:


Roger S We collected Roger S.’s posts on Gab (https://gab.com/RogerJStoneJr), from June 14th to August 2nd, 2021.


Michael F Michael F. wrote a series of 10 articles for the Western Journal, from June 29, 2020 to July 31, 2021. (https://www.westernjournal.com/author/mflynn/). He also wrote a letter to ask for support to Roger S.’s wife, published on FrankReport (https://frankreport.com/2021/06/11/bannedby-twitter-gen-michael-flynn-is-published-on-frankreport-concerning-roger-stone/).


His article for Fox News about ISIS has also been analyzed. (https://web.archive.org/web- /20161215042531/http://www.foxnews.com/opinion/2016/11/02/gen-michaelflynn-after-mosul-is-liberated-isis-could-attack-usnext.html).


Finally, we collected posts on his Twitter account from October 19th, 2016 to September 18th, 2017.


Paul F We collected Paul F.’s personal writing on his website (https://paulfurber.net/).


We retrieved archives from his twitter on account of a threadreader (https://threadreaderapp.- com/thread/1158540523008905216.html).


We also captured archives from the CBTS boards on 8chan, where he wrote as “The Board Owner”.


A few posts he left on Discord were transcribed from pictures found online.


Finally, Paul F. wrote a book: Q: Inside The Greatest Intelligence Drop In History, included in our large corpus.


Jim W Archives of Jim W.’s Twitter account were found on archive.today (https://archive.is- /https://twitter.com/xerxeswatkins), which preserves 9 screenshots (22 Dec 2014, 6 Feb 2016, 15 Mar 2016, 30 Mar 2016, 4 Apr 2016, 5 May 2016, 28 May 2016, 25 Mar 2017, 8 Apr 2017) taken prior to the account suspension. A large amount of these tweets were only citing article titles from his own media The Goldwater. As these titles were not necessarily (and probably not) written by him, we chose to exclude them.


A small text was written by Jim W. on 5ch about a service problem on 8ch that he blames on a government attack. (https://fox.5ch.net/test/read.cgi/poverty/1418- 027836/826)


Figure 3: Dimensions 1 and 2 from a correspondence analysis of the 1000 words samples from RonW and PaulF texts; the Qdrops from the 4Chan period (purple), 8Chan period before (yellow) and after (orange) the ‘board compromised’ post have been projected as supplementary individuals.


Finally, posts from his Parler account were added to the corpus.


Ron W A sample of 3130 tweets by Ron W. have been collected through the Twitter API.


We also collected his posts on Telegram from November 30th, 2021 to December, 20th, 2021.


Coleman R Under the pseudonym PamphletAnon, Coleman R. wrote a vast number of posts on 8chan, especially on the board “the Storm” (https://8ch.net/thestorm/catalog.html) of which he was the owner. We collected his posts on the Wayback Machine, which seems to provide a complete archive of the board.


A few messages he posted on Discord (Q Central, 2017) have been archived by DDOS (https://ddosecrets.com/wiki/Distributed Denial of Secrets), at this address https://whispers.ddosecrets.com/discord/user/376607495470448643. 347 messages by PamphletAnon on Discord (Patriots’ Soapbox, 2018) are also available at: https://discordleaks.unicornriot.ninja/discord/user/41936.


We also found a small text on Reddit, where Coleman R. announces a future talk on Infowars with Rob Dew.


On September 11, 2020, Pamphlet Anon wrote a text on “Patriots’ Soapbox”, the media he curates with Christina U. (https://patriotssoapbox.com/opinion/memories-of9-11-surreal-and-terrifying/).


Courtney T Collaborating with Ron and Jim W., Courtney T. publicly announced that she knew the truth about the Q drops, and that it would be highly disappointing to their public (Huback, 2021). We collected her posts on Twitter, under her account IWillRedPillYou, archived here (http://web.archive.org/web/20180113162029if /-https://twitter.com/IWillRedPillYou).


Tracy D Under the pseudonym Tracy Beanz, Tracy D. published a large number of tweets, on an account now suspended, but of which archives.today retains 92 captures (https://archive.vn/lDRyR), from December 2016 to January 2021.


She also published a long post explaining herself about quarrels over Q related publications on Steemit (https://steemit.com/drama/@tracybeanz/shestood-in-the-storm)


Christina U Christina U. wrote a number of articles on PatriotsSoapbox (https://patriotssoapbox.com/), of which the 5 most recent on July 7h, 2021 were collected.


We also found an online conversation she had on Muckrock with Homeland Security on the 7th and 22nd of July 2021. (https://www.muckrock.com/foi/united-statesof-america-10/patriots-soapbox-department-ofhomeland-security-115196/#file-956999) Archives of her GAB profile were also analyzed.


Donald T Former president of the United States tweets were collected during the month of december 2020. We removed tweets suspected to have been written by his staff by the site https://factba.se/.


Melania T We collected a sample of tweets by former FLOTUS thanks to the Twitter API, from January 21st, 2017 to January 19th, 2021.


Eric T We collected a sample of 3000 tweets by the son of former POTUS thanks to the Twitter API from September 10th 2016, to June 24th 2021.


Dan S We collected a sample of tweets by the former POTUS’ deputy chief of staff thanks to the Twitter API, from October 25th, 2018 to January 20th, 2021.


This paper is available on arxiv under CC BY 4.0 DEED license.