paint-brush
Chicago Crime Mapping: Magic of Data Science and Pythonby@singhuddeshyaofficial
10,188 reads
10,188 reads

Chicago Crime Mapping: Magic of Data Science and Python

by Uddeshya SinghSeptember 8th, 2018
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Predictions, Forecasts and Loss scores. Sound too mainstream, don’t they?<br>In the era of increasing interest towards Machine <a href="https://hackernoon.com/tagged/learning" target="_blank">Learning</a> and its algorithms, we are hugely ignoring important duties of being a <a href="https://hackernoon.com/tagged/data" target="_blank">data</a> scientist, and one of those is <strong>Data Exploration.</strong>

People Mentioned

Mention Thumbnail

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Chicago Crime Mapping: Magic of Data Science and Python
Uddeshya Singh HackerNoon profile picture


“When a man is denied the right to live the life he believes in, he has no choice but to become an outlaw.” ― Nelson Mandela


Predictions, Forecasts and Loss scores. Sound too mainstream, don’t they?In the era of increasing interest towards Machine Learning and its algorithms, we are hugely ignoring important duties of being a data scientist, and one of those is Data Exploration.

We, the modern data scientists are so naive that we forget the beauty of Visualizations and the quality it stands for. Today, allow me to present you an Exploratory Data Analysis of the Kaggle Dataset : Crime in Chicago.

The Crimes in Chicago Dataset

I will be using the codes and visualizations from my Kernel which you can find here : Chicago Crime Mapping

Chicago Crime Mapping — At the time of editing

So, before starting off with the analysis, Let me brief you about the dataset, According to the briefings, it says:

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department’s CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or [email protected].

Essentially, this dataset contains the type of Crime, Location, Sub Category of the Crime, Type of Vicinity and Whether the arrest was possible or not.

Checking if data contains Null Values or not

The very first step maybe to check if dataset contains any null values or not, and I used a heatmap to determine the same.

Viridis Heatmap

Looking at our heat map, we can safely conclude that there are not many values left out, so just go ahead and drop it.

I was curious to find out what maybe the number of crimes reported in these 5 years, and what I could see was :











STREET 325084RESIDENCE 223854APARTMENT 179444SIDEWALK 158478OTHER 53474PARKING LOT/GARAGE(NON.RESID.) 40907ALLEY 31239RESIDENTIAL YARD (FRONT/BACK) 30209SMALL RETAIL STORE 28209SCHOOL, PUBLIC, BUILDING 25474Name: Location Description, dtype: int64

Pretty high, for a span of 5 years.

Location Description and its semantics

One maybe thinking about where the crimes happened most. Is it the dirty streets, notorious residents or unguarded parking lots? We can check it for ourselves using this snippet:

plt.figure(figsize = (15, 10))

sns.countplot(y= 'Location Description', data = df, order = df['Location Description'].value_counts().iloc[:10].index)

Location Semantics

Apparently the Streets are the unsafest of all, while residence and apartments following close suite.

Mapping the amount of Crimes

Let’s have a closer look at the unique locations where the crimes have taken place and use Folium to map them. You can use this snippet to recreate my map.



chicago_map_crime = folium.Map(location=[41.895140898, -87.624255632],zoom_start=13,tiles="CartoDB dark_matter")




for i in range(500):lat = CR_index['LocationCoord'].iloc[i][0]long = CR_index['LocationCoord'].iloc[i][1]radius = CR_index['ValueCount'].iloc[i] / 45

if CR\_index\['ValueCount'\].iloc\[i\] > 1000:  
    color = "#FF4500"  
else:  
    color = "#008080"  
  
popup\_text = """Latitude : {}<br>  
            Longitude : {}<br>  
            Criminal Incidents : {}<br>"""  
popup\_text = popup\_text.format(lat,  
                           long,  
                           CR\_index\['ValueCount'\].iloc\[i\]  
                           )  
folium.CircleMarker(location = \[lat, long\], popup= popup\_text,radius = radius, color = color, fill = True).add\_to(chicago\_map\_crime)

Map of crimes

Here, the Orange Circles means that crimes taken place at that particular location are above 1000, while others are self explanatory. Clicking on those maps would show the Coordinates and the number of crimes taken place at that particular (Latitude, Longitude)

An example of details

A closer look at the thefts

I have a special interest in thefts and public peace disruptions, but let’s have a look at the latter one later. For now, let’s focus on the types of thefts taken place around Chicago in these 5 years.

Type of thefts in Chicago from 2012- 2017

Well, $500 thefts are pretty dominating for now. No?

If that’s not enough, let’s have a look at the way these thefts are split around the month. Have a look at this graph and allow me to explain the sudden plunge in crime scene statistics.

Thefts Per Month

Well, in August, no Superman or Batman arrived in the city to protect justice. It was just a algorithmic loss which resulted in a NAN value at August which I had to replace by 0 (Because I am lazy)

Here is the code, if you don’t believe me:

theft_in_months = pd.DataFrame({"thefts" : df_theft['Month'].value_counts(), "month" : df_theft["Month"].value_counts().index}, index = range(12))


theft_in_months.fillna(0, inplace=True)theft_in_months = theft_in_months.sort_values(['month'], ascending=[1])

theft_in_months.head()

Annual Crime Statistics : Using Literally all the data at once.

You can try re-sampling the dataset with respect to date and you will realize that this data set, indeed contains 1854 days to be precise. Want to know the mapping of how many crimes were committed in a single day? Have a look at this graph then.

Thefts on a daily basis

As you may have noticed, the yearly crime statistics follow a general trend.

Here, the noticable trend is a rise in curve at the start of the year and achieveing the peak at the mid point. somehwhere at June — July . After that it has an equally sharp drop to the initial number of crimes as the year started!

Public Peace Violations

I promise that this is the last area of research in this article.

Anyways, if one may focus on the types of Public Peace Violations and their numerical distribution, one can easily point out that Reckless Conduct is the leader in this area and (thankfully) not Bomb and Arson threats.

Types of Public Peace Violations

While we are at it, let’s have a look at the Peace Disruption incidents around Chicago. In this map, the Orange Circles represent that Peace Disruptions at that location have exceeded the count of 30 in these 5 years and hence are a little sensitive spots to tread on.

Peace Disruption Locations

Conclusion

As you may have already judged, this is not a coding tutorial but a potential project starter. You can use this EDA in your notebooks keeping in mind the Apache 2.0 License and make your prediction models out of these ideas.

Few Ideas from my side :

  1. A Season Based Predictive Model which predicts how many crimes are going to happen on that particular day.
  2. A Prediction Model which judges the sensitivity of an area or vicinity (like Lincolnwood ) and predict when will the next crime take place.

Or any other idea which may strike your mind.

Until next time, peace out.

Uddeshya Singh