Don’t Be Misinformed On How Big Data Is Making A Social Impact | Conversation with John Hopkins on Hackable Projects
I had the pleasure of meeting Kevin Quach, Biostatistician at Johns Hopkins Bloomberg School of Public Health, at the Forbes 30 Under 30 Summit. Not only did we have amazing conference speakers to hear and learn from, but we were part of some amazing conversations. I wanted to write a more detailed post about one of the topics we talked about: BIG DATA.
As an organizer of civic hackathons, I know that this term is thrown around frequently and I’ve witnessed the blank stares in various conversations as people talk about it.
So let’s break it down.
I know that you’ve just Googled it so here’s the definition again: “extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.”
But I learn through examples and asking my questions, so let’s turn to Kevin for a more holistic answer to help all of us understand.
Kevin: First things first, the opinions expressed herein are my own and do not necessarily reflect the views of The Johns Hopkins University.
Sounds great to us! So first, what the hell is “Big Data”? I know there are various definitions…
Kevin: Thanks for inviting me to contribute. It’s my pleasure to offer a window of my thoughts on this topic.
‘Big Data’ is a buzzword coined in recent years for the public to describe the revolution in computing powers to gather and store large data sets but also to sift through this data and provide us with insights in a timely manner. Data collection methods have also become more granular, where we can have information on many features about one individual observation.
Kevin: Historically, we have been able to do one or the other, but never together efficiently. This brings us to the next point that we can also implement this data analysis at a low cost using frameworks such as Hadoop and Spark. Furthermore, all of this is scalable depending on the size of what you need. Small project? Rent out a single server. Big project? Just rent out more.
There is also another element of types of data. We typically think of data as structured i.e. imagine what you would see in an Excel table. But, we also have a lot of data coming in unstructured forms, such as text data from Twitter or sound data from heart rhythms.
Kevin: For example, Amazon Alexa can understand your message because of extensive training of the words and syntax used in the English language.
We can also detect for abnormal heart rhythms by using convolutional neural networks (deep learning technique) to analyze heart sounds!
So, in summary, we can now collect data, analyze it, and implement all of this in a cost-efficient manner to provide solutions to your questions quicker. Analysis of large data sets become increasingly important in the profit and non-profit world as today’s average individual is fully engaged with mobile devices that can store and transmit user information to add to the growing universe of data.
What are 3 new cool things happening by using Big Data to make a positive impact in society?
Kevin: Collectively, we can do a lot with increasing availability of data. Case studies have shown that data-driven innovations can have a positive social impact for:
- Improving patient care
- Better environmental protection
- Increasing awareness for improving public inefficiencies
Improving patient care
Kevin: The Individualized Health Initiative aims to work with scientists, engineers, and physicians to integrate big data from different large databases including clinical information, genetic and molecular patient data, and imaging to change the way we deliver health care.
Using comparative data from thousands of patients from similar health profiles, physicians can clearly understand the risk of treatment options and prognosis, to provide the appropriate treatments or preventative measures. It is a true interdisciplinary collaborative effort with inHealth participants from Johns Hopkins schools of Medicine, Public Health, Engineering, Arts and Sciences, and Nursing
Better environmental protection
Kevin: The United States was struck by many extreme weather events in the past year. Hurricane Maria on Puerto Rico was devastating and prompted movement of the citizens to seek shelter elsewhere.
Using real-time, anonymous data gathered by Google location services, the MIT Media Lab and Google, Inc. found that there was a 20% increase in movement out of Puerto Rico the day before Hurricane Maria hit, specifically to Orlando, Miami, New York, and Atlanta. Puerto Ricans have slowly been moving away with more people leaving than arriving.
Kevin: Furthermore, people in Puerto Rico have been less mobile since the hurricane because of damaging effects to local infrastructure and services. The authors explain that this was an opportunity to study human behavior since the event of a humanitarian crisis.
Knowledge from this work can be used to better understand how to allocate resources in future events.
Increasing citizen awareness for improving public inefficiencies
Kevin: Many cities across the United States are using data sources collected on services used by homeless individuals to track their movement and behavior. The goal: to identify key components for creating better housing solutions.
In New York City, hundreds of people work daily with homeless individuals and collect data on their interactions through the StreetSmart app. This has improved tracking of thousands of homeless individuals and communication between outreach workers to better direct assistance.
The Chicago Alliance has a large anonymized, individual-level data set from last 15 years of any person who received help from the Homeless Information Management System. A team at the University of Chicago has applied their data science skills to learn visualize homeless movement and predict housing stability.
Any fun ideas on how big data will make a social impact? I know that’s a loaded question.
Big Data Isn’t Perfect
Kevin: Big data is revolutionary, but there are pitfalls. We often confuse the relationship of correlation vs. causation, and we should examine findings with a healthy dose of skepticism and be critical of biases in the methodology on data collection and analysis. Often times, many assumptions are made without appropriate consideration of their validity.
One of the more notable was the election forecast during the 2016 U.S. presidential elections. Using historical pools and recent polling data, most major vote forecasters (FiveThirtyEight, New York Times, Princeton Election Consortium) put Mrs. Clinton’s chances of winning at 70 to 99%.
Another example is Google Flu Trends which tracked flu outbreaks based on flu-related search terms, and can estimate flu prevalence before CDC’s data. However, their estimates missed the peak during the 2013 season and the initiative was quickly removed.
Now What? Combine Data Sets
Kevin: I think this is a great question and can lead you to a fun data science project! One great thing with how cities are evolving with today’s times is that there are publicly available data sets online of your city.
1. Check out current open data portals:
Accustom yourself with ‘ggplot’ or ‘leaflet’ to plot data points on a map.
For example, you can combine US Census Data with Zillow’s Economics Data and learn about the demographics of the housing market in your city!
Thank you so much, Kevin!
So there you have it, I hope that we were able to shed some light on this term and beyond that inspired more folks to jump on board with the potential we can harness with big data. What projects will you bring for us all to hack in 2018?
Shameless plug: ATX Hack for Change, Austin’s Annual Civic Hackathon is going into year 6 and Project Submission is NOW OPEN. Let’s hack!