“Come with me if you wanna not die” — Wyldstyle to Emmet in ‘Lego Movie’
Governments and other public institutions have always collected data about their populations, land, weather, etc. for reasons of administration, better governance, planning and strategy among others. Until a few decades ago, most of the general population had access to this type of data through fairly primitive formats like paper, and in rare cases, in the form of something like an excel table. The uses of this data are varied, ranging from research in social sciences, high school projects to for profit services and strategy making (like planning real estate projects). The open data handbook defines open data as ‘the idea that certain data should be freely available for everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control’. In a sense, government data like the census information, land use studies, budget, etc. are all forms of Open Data. As you can notice, Open Data has always been there. What makes the idea so groundbreaking now?
There is very little correlation between the availability of information and the ability to access it. The advancement in computing power due to Moore’s law and the proliferation of the internet are key pieces of this puzzle. Companies like Intel and Microsoft have been pioneers in bringing the computing revolution. Bill Gates’s vision of bringing the computer on every desk has been truly inspiring and largely has been a catalyst in the Open Data revolution. This along with the advent and spread of the internet have made access to information far far easier.
Those of you from the pre-internet era would know the pains of trying to access even the smallest bit of information when requesting for an official document like a passport or your voter ID.
Now imagine a world where there is no internet, in that sense there is very little communication possible between a big organization like the government and you. The government has released large quantities of data and the geek in you wants to play around with it. You scourge through multiple government offices, meet innumerable officials and somehow after a few months of hard work you get your hands on the data. They hand you a book or a physical file with everything typed out, or worse, handwritten. You are tired by now, but you aren’t a quitter. You sift through the documents manually, make a number of graphs and charts by hand, do statistical calculations and discover revolutionary insights that will change the world. But, how do you tell people about your discoveries? You decide to write a book, hold public meetings and hope the idea spreads. It has been more than a year; most sane people would give up by now. You aren’t sane right?
Now imagine what happens when we include the internet in the above scenario. The government is forward thinking and has released large quantities of data on their website. The geek in you wants to play around with this data. You go to the government website and download the data on your computer. Better yet, the government was smart enough to release this data in a machine readable format which means it exists as values and numbers in a CSV or an Excel file which is tabular in nature, or formatted in new age data structures like JSON, GeoJSON, etc. You write a program for the computer to sift through this data. The computer generates a number of charts and graphs, does statistical calculations within minutes. You discover revolutionary insights about the data in a matter of days. Aren’t you a fucking genius? The world needs to know about this. You use Twitter, Facebook, Email to spread the idea, you write a blogpost, you organize events on social media, Skype with influencers. It’s been a few weeks now; you are a pseudo celebrity already.
The nature of innovation is such that development often comes from unlikely places — Open data handbook
Wait! This might not happen at all and maybe nobody discovers your genius. That’s not the point here. Although the task was more or less the same in both scenarios, you could see an exponential reduction in the time required to perform the task when we put the internet in the equation. When you provide the population with a medium to easily access and use data, you give them incentive to work on this data. It is a message that says that our public institutions trust us. Once the data is there and readily accessible, innovation and discovery turns into a ‘monkeys on typewriters’ problem. The ‘monkeys on typewriters’ or ‘infinite monkeys theorem’ is a theory on randomness proposed by Nassim Nicholas Taleb in his book ‘Fooled By Randomness’. It states that if you have one billion to the power one billion number of monkeys (which means a whole lotta monkeys) with each one given a typewriter, and make the monkeys bash the keyboard randomly: there is a high probability that one of the monkeys writes an ‘Iliad’ or ‘The Mahabharata’ purely by accident.
I believe this theory holds true for the Open data scenario as well, more or less at a similar magnitude. If you give a large population an easy access to rich datasets, merely by the law of large numbers someone somewhere is bound to find something interesting, may it be a scientist, activist, a researcher at a university or even a high school student.
This post is the third part in a series. Here are links the previous posts:
Business, business, numbers, numbers!