paint-brush

This story draft by @escholar has not been reviewed by an editor, YET.

US Fatal Police Shooting Analysis and Prediction: Fatal police shooting rate

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture

Authors:

(1) Yuan Wang, University of Rochester (e-mail: [email protected]);

(2) Yangxin Fan, University of Rochester (e-mail: [email protected]).

Table of Links

Abstract and Introduction

Related work

Methodology

Media reporting analysis

WP fatal police shooting dataset insight

Fatal police shooting rate and victims race prediction

Conclusion and References

6. Fatal police shooting rate and victims race prediction

In this part, we used the insights we draw from WP data and multi-attributes correlation analysis to build predictive models. We constructed a series of regression models to predict fatal police shooting rates on the state level and a series of classification models to predict fatal police shooting victims’ race.

6.1. Fatal police shooting rate prediction on state level

According to above correlation analysis, we chose the violent crime rate, land area, and gun ownership rate, state joined year based on their highest correlation coefficient with the fatal police shooting rate. We acquired more data points by looking at each state every year from 2015 to 2019 separately.


In the Weka machine learning software, we tried all models and chose three of the best-performed models based on ten-fold cross-validation performance. The best one is Kstar [1]. It achieved 28.04% cross-validation relative absolute error and explained 88.53% variance, followed by KNearest-Neighbor Regression and Random Forest. These three models all performed much better than the baseline linear regression model, see Table-2.


Table 2. Ten-fold cross validation results


Figure-19 displays the cross-validation prediction error of each data point in the Kstar model (each data point represents the fatal police shooting rate of a state in a particular year). The X-axis is the real police shooting rate, while the Y-axis is the predicted police shooting rate. The large cross means a higher error rate.


Figure 19. Predicted fatal police shooting rate vs. Real fatal police shooting rate


The prediction model tells us that the reason for fatal police shootings could be complex. It is related to the state joined year, state land area, gun ownership rate, and violent crime rate. It suggests us to understand this problem from multi-dimensional aspects.

6.2. Predict victims’ race in fatal police shooting

This prediction intends to test whether or not there is racial discrimination during the fatal police shooting. The null hypothesis is that the model cannot predict the victim’s race (No racial discrimination). The alternative hypothesis is that the model can predict the victim’s race (racial discrimination). We use WP data from 01/01/2015 to 02/12/2020 and excluded the data missing the race information. The total records are 4518. Since “age” is the only numeric variable, we applied the chi-square test to select the predictor for the rest of the variables.

6.2.1 Chi-square testing

where χ 2 = chi squared, Oi = observed value, Ei = expected value


Table 3. The chi square contingency table for body camera


Table 4. Chi-square testing for categorical variables


After applying chi-square testing to the above categorical variables, we find that threat level, signs of mental illness, armed, flee, body camera, and gender are not independent of the race at 0.05 statistically significant level, see Table 4. On the other hand, manner of death and is gencoding exact are independent of the race at 0.05 statistically significant level. For city and state, the degree of freedoms (DF) is too large to apply chi-square testing. Finally, we chose armed, age, gender, signs of mental illness, threat level, flee, and body camera as predictors and city, age as backup predictors for the racial classification model.

6.2.2 Classification model

In the Weka machine learning software and Python AutoML package, we tried all models and chosen the top three best-performed models based on stratified five-fold crossvalidation performance. see Table-5 below.


Table 5. Stratified cross validation results


We find that adding city and state attributes could boost model performance. Gradient Boosting Machine [4] performs best, having 0.589 precision and 0.611 recall, slightly better than predicting all victims to be white (about 50% precision and recall). GBM algorithm gives us an idea of the importance of attributes we selected for prediction. City, state, armed, and age attributes play essential roles in racial prediction. See Figure-20 below. We failed to reject the null hypothesis since even the best-performed model cannot predict victims’ race well, proving that there is no racial discrimination for observed fatal police shootings in WP data.


This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

TOPICS

THIS ARTICLE WAS FEATURED IN...