paint-brush
Which programming languages have the happiest (and angriest) commenters?by@srobtweets
29,590 reads
29,590 reads

Which programming languages have the happiest (and angriest) commenters?

by Sara RobinsonDecember 20th, 2016
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

It’s officially winter, so what could be better than drinking hot chocolate while querying the new <a href="https://cloud.google.com/blog/big-data/2016/12/google-bigquery-public-datasets-now-include-stack-overflow-q-a" target="_blank">Stack Overflow dataset in BigQuery</a>? It has every Stack Overflow question, answer, comment, and more — which means endless possibilities of data crunching. Inspired by <a href="https://medium.com/@hoffa" data-anchor-type="2" data-user-id="279fe54c149a" data-action-value="279fe54c149a" data-action="show-user-card" data-action-type="hover" target="_blank">Felipe Hoffa</a>’s <a href="https://medium.com/@hoffa/always-end-your-questions-with-a-stack-overflow-bigquery-and-other-stories-2470ebcda7f#.3b7kldtci" target="_blank">post</a> on how response time varies by tag, I wanted to look at the <a href="https://bigquery.cloud.google.com/table/bigquery-public-data:stackoverflow.comments?pli=1&amp;tab=schema" target="_blank">comments table</a> (53 million rows!).

People Mentioned

Mention Thumbnail

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Which programming languages have the happiest (and angriest) commenters?
Sara Robinson HackerNoon profile picture

It’s officially winter, so what could be better than drinking hot chocolate while querying the new Stack Overflow dataset in BigQuery? It has every Stack Overflow question, answer, comment, and more — which means endless possibilities of data crunching. Inspired by Felipe Hoffa’s post on how response time varies by tag, I wanted to look at the comments table (53 million rows!).

The happiest Stack Overflow tags :)

To measure happy comments I looked at comments with “thank you”, “thanks”, “awesome” or “:)” in the body. I limited the analysis to tags with more than 500,000 comments. Here’s the query:



























#standardSQLSELECTtag,ROUND((COUNT(case when comment_text like '%thanks%' or comment_text like '%:)%' or comment_text like '%thank you%' or comment_text like '%awesome%' then 1 end) / COUNT(*)) * 100,2) as percent_happy,COUNT(*) total_commentsFROM (SELECTLOWER(a.text) as comment_text,SPLIT(b.tags, '|') as tagsFROM `bigquery-public-data.stackoverflow.comments` aJOIN `bigquery-public-data.stackoverflow.posts_questions` bON a.post_id = b.idUNION ALLSELECTLOWER(b.text) as comment_text,SPLIT(c.tags, '|') as tagsFROM `bigquery-public-data.stackoverflow.posts_answers` aJOIN (SELECT post_id, text FROM `bigquery-public-data.stackoverflow.comments`) bON a.id = b.post_idJOIN `bigquery-public-data.stackoverflow.posts_questions` cON c.id = a.parent_id), UNNEST(tags) tagGROUP BY 1HAVING total_comments > 500000ORDER BY percent_happy DESC

Here’s the result in BigQuery:

And the chart:

R, Ruby, HTML / CSS, and iOS are the communities with the happiest commenters according to this list. People who ask questions about XML and regular expressions also seem particularly thankful for help. If you’re curious, here are the 15 highest scoring happy comments that were short enough to fit in a screenshot (and their associated tags) :

But because people sometimes get angry on the internet, you’re probably wondering…

The angriest Stack Overflow tags :(

For angry comments, I counted those with “wrong”, “horrible”, “stupid”, or “:(” in the body. The SQL is the same as above with the search terms swapped out. Here’s the result:

And the chart:

Clearly the angriest comments are those related to C derivatives. Many programming concepts also wound up here: multithreading, arrays, algorithms, and strings. And here are the highest scoring angry comments:

This analysis is not perfect, as the comment “that one’s so stupid it underflows and becomes awesome” appears in both lists. That’s where a machine learning tool like the Natural Language API would come in handy.

Between the two lists there were only a few tag overlaps. The most excitable tags (I’m interpreting tags that showed up in both the happy and angry list as ‘excitable’) are: ios, iphone, objective-c, and regex questions. And while the internet may seem like a dark place sometimes, there appears to be roughly six happy comments for every angry one.

What’s next?

Dive into the Stack Overflow dataset, or check out some of these awesome posts to get inspired:

If you have comments or ideas for future analysis, find me on Twitter @SRobTweets.