Finding the fucks on Twitter and other observations of the profane
It’s summer in our nation’s capital. The humidity and the political climate are unbearably oppressive, and the denizens of D.C. do not give a fuck — at least, not literally. How do we know? I, along with fellow data scientist Rebecca Meseroll, collected over 10.7 million tweets from the contiguous 48 states and found out that ‘fuck’ appears in approximately 21 out of every 1000 tweets. In other words, slightly over 2% of all American tweets contain at least one variant of the word ‘fuck.’ Our analysis reveals a dearth of fucks in the District relative to the rest of the nation; the local fuck frequency in D.C. is a scant 11.7 per 1000 tweets. Language in other locales is not so chaste, however. Wyomingites, Californians, and Nevadans liberally peppered their tweets with profanity, exceeding 25 fuck-containing tweets per 1000 — more than twice their D.C. counterparts.
State Total tweets # of fuck-containing tweets (per 1000)
MT 9976 10.4
AR 36,957 11.2
DC 94,142 11.7
NE 42,636 13.6
MO 108,180 13.7
LA 216,023 23.4
AZ 173,604 24.8
NV 127,481 25.9
CA 1,377,434 26.7
WY 5357 27.6
Why fuck usage varies regionally remains a mystery. Based on the timing of data acquisition, we initially conjectured that some inscrutable draft decisions by local NBA teams were the culprit for elevated fuck levels in California. As our sampling duration continued, however, trends persisted beyond the scope of any individual event. We also speculated that the oversized presence of the porn industry might somehow be increasing usage in California and Nevada. This was swiftly dismissed when manual curation of 2000 tweets containing the word ‘fuck’ revealed that very few instances of the epithet (~2%) refer to a literal sex act and precisely zero refer to porn. Our data are consistent with a 2013 Slate analysis of swearing on Facebook which found ‘fuck’ to be the most popular curse word in the West, but only the second-most popular in the rest of America, so this may simply be a case of regional variation in word usage, like ‘soda’ vs. ‘pop.’
What else can Twitter tell us about how its users employ vulgar vocabulary? We examined how ‘fuck’ stacks up against other profane words in overall usage and found that it takes second place to ‘shit’ by a small margin. ‘Fuck’ and ‘shit’ are by far the most popular curse words on Twitter; both are used more than twice as frequently as the next most common epithet, ‘bitch.’
Curse word % of curse total
So far, we’ve only considered tweets with (at least) one curse word. What happens when Twitter users aim to maximize the impact of their strong language by combining a epithets for added effect? We determined the frequency of co-occurrence of major curse words (see list above) and found that, in general, ‘fuck’ is the most likely companion to almost any other obscenity, even the extremely mild ‘darn.’ The motives underlying such an unexpected and unorthodox combination are difficult to discern, but life is a rich tapestry. The exceptions to this rule are ‘crap’ and ‘cock’. For tweets with more than one curse word, the presence of one ‘cock’ in a tweet is a strong indicator for a second.
The popularity of ‘fuck’, both on its own and in conjunction with other profanity, is evident. But are we really capturing the entire extent of its usage by only looking for its standard spelling? For the most part the answer is yes, however the lesser orthographical variations on ‘fuck’ are interesting to consider. Some (most commonly ‘f*ck’) appear to be self-censorship, while others include expanded strings of u’s or k’s (e.g., ‘fuckkkk’ and ‘fuuuuck’), possibly for added emphasis. Notably, there is a sweet spot for letter expansions; both for k and u, the magic number is four repeats of the letter.
Given that ‘fuck’ can express a broad spectrum of emotion, it is natural to examine our corpus to ask about the feelings of the tweets’ authors. Is there such a thing as a happy ‘fuck’? We inferred the emotional intent of each tweet using a sentiment analysis tool that was specifically tuned to social media. Sentiment analysis, although an imperfect science, aims to capture the writer’s attitude and emotional intent. In the case of the f-word, we found that while the average usage was (unsurprisingly) negative, a quarter of the tweets were strongly positive. To illustrate this emotional mélange, we selected nine representative examples at various intervals with inferred sentiment shown in parenthesis. For comparative purposes (and because this is the internet) all of the selected tweets pertain to cats.
- I hate cats.. just evil little fuckers (-0.91)
- I just want to go to fucking sleep these stupid ass cats are fighting right outside my window (-0.85)
- I just got a cat fucking drunk and he’s abusive (-0.78)
- i want a cat now who the fuck am i (-0.49)
- Let your cat be a fucking cat. (0.0)
- honestly scaring cats is fucking hilarious (0.48)
- aye bruh how bout you show yo cats some love too mufucka (0.64)
- Cats are just so fucking perfect and I love them and want them all (0.88)
- I FUCKING LOVE MY CATS SO MUCH LOOK AT THIS BEAUTIFUL GUY I SWEAR WHAT A SMART LOYAL LOVING ANIMAL GIFTED TO ME (0.96)
What meaning should we ascribe to Twitter profanity? Possibly none — these trends could be as ephemeral as the anger over the latest sporting event. On the other hand, America’s head-of-state has plenty of fucks to give and his former communications director invoked self-fellatio in an expletive-filled rant against his colleagues (incidentally, whether the consequences of this tirade further dampen the crudeness of D.C.’s tweets is potential fodder for future analysis). We contend that the language we use matters, and investigations into emotionally-tinged epithets, exclamations, and other foul interjections may reveal insights into our nation’s sometimes vulgar mind.
Data source and methods
Tweets were acquired at a rate of ~60,000 tweets per hour by opening a firehose to all tweets within the contiguous United States reporting a location. Data was constrained using a bounding box around the contiguous United States and tweets from users in Mexico and Canada were discarded. Because of this technical limitation, we were unable to collect data from Hawaii and Alaska. Data were collected for two weeks, yielding a total of ~10⁷ tweets, which were filtered to remove matches in mentions (@’s) and links. Compiled subsets of the data and source code are available on GitHub. Sentiment analysis was performed using Valence Aware Dictionary and sEntiment Reasoner (VADER).
TH conceived of the study, acquired and analyzed the data, and wrote/revised this article. RM suggested additional analyses, analyzed the data, and wrote/revised this article.
Travis tweets about machine learning and other highly irrelevant projects at @metasemantic. Rebecca mostly lurks at @robotwarning. Prior to this story being posted, neither of them has given a single fuck on Twitter.