Hackernoon logoData Visualization — Experiment 1 — by@sinewaver

Data Visualization — Experiment 1 —

Author profile picture

@sinewaverAbhiram

Data Visualization — Experiment 1 —

Hours of max FB Usage —

I originally intended to do something else, but ended up doing this instead — Figuring out at what hours during the day I’ve been most active on Facebook — outliers included.

How?

Step 1: Downloaded my entire Facebook data . 
This can be done using the Facebook APIs or using the less painful way — requesting for a copy of your Facebook data as a zip file from your Settings panel.

Step 2: Scraped this local webpage’s timeline section after hosting it on a local server with a simple Python script using Scrapy and extracted only the timestamps for all the posts and stored it in a CSV.

Step 3: Parsed this data (a modest dataset of 8500+ entries) and fetched the timestamps taking heed of AM and PM and storing this data in another CSV.

Step 4: Segregate this data into hourly ranges e.g. 1am-2am, 2am-3am , for all 24 hours and obtain the cumulative count of entries per hour and store this in another final CSV. Surprisingly, MS-Excel was very helpful in this venture.

Step 5: The CSV from Step 4 serves as input to our D3.JS bubble chart script (Courtesy Mike Bostock’s bubble chart template, with some tweaks).

The result was this gloriously satisfying and insightful bubble chart.

What does this show?

The colors have no bearing, but the sizes of the bubbles indicate the number of posts, well, posted in that hour — the larger bubbles indicating more activity during that period and the smallest ones being the least amount of activity in that hour. The inference being that I have wasted a tremendous amount of time on Facebook during almost every hour of the day over the last 9 year time-period for which this data was obtained, save for the 2am — 6am mark. This also illustrates a seemingly poor sleep pattern.

What haven’t I taken into account?

A lot of people post on my wall on Facebook on my birthday. But this number which varies around the 100–150 mark every year is scattered across all hours (around 10 posts approx. every hour) of the day and this outlier, I feel, can be safely neglected.

What’s next?

I’m hoping to glean some more insights (useful ones hopefully) from this data and will post the results as a follow-up to this blog.

Feel free to try this out for yourself if you’re interested in insights that you can probably intuitively already find out without going through this procedure or are just looking to wade into the basics of Data Visualization just to feel like you’ve started something rudimentary along these lines for your satisfaction ..like I did.

Reference material —

https://www.facebook.com/help/302796099745838
https://doc.scrapy.org/en/latest/topics/shell.html
https://bl.ocks.org/mbostock/4063269

Source code and other intermediate data —

Closer look at image — https://cdn.rawgit.com/abhiii5459/fb_usage/e9bd311b/bubblechart.html

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.