The Problem with Predictive Policing

by Eric Crowley

The term ‘predictive policing’ conjures a wide range of emotions. For police a powerful new tool like predictive analytics can give them a sense of hope that they will be able to take more preventative measures in their fight against crime. For groups who feel targeted by the police this kind of preventative policing can instead instill a sense of fear that they will be forced to endure further unjust persecution, harassment and targeting. Both the hope and the fear generated from law enforcement’s adoption of predictive policing techniques are valid. Predictive analytics can provide crucial insights to decision makers allowing them to execute their roles to the best of their abilities, so it makes sense that law enforcement would want to try and take advantage of the technology. The problem is that for a lot of civilians predictive policing conjures flashbacks to the 2002 film Minority Report, a future where people are arrested and sentenced before they ever commit a crime. These differences in opinion can make it difficult to decide on how or even if law enforcement should be using predictive analytics. So, what should determine the validity of predicative policing as a tool for law enforcement? The answer is the data.

Data, specifically its quality, is the key to good predictions. To build a predictive model, you need a training data set. This is normally historical data containing a pattern or trend that proceeds a predictable event. For example, if you are trying to predict machine failure you will need a training data set that contains all the information relevant to that machine for a period leading up to, during, and possibly after a failure. While the time frame and data content will vary by use case, the underlying requirement that the training data be valid remains. This also extends to your recurring data. Once you have a model trained you will need to feed it new data on a recurring basis if you want to generate new predictions. If your input data is incorrect, the predictions created by your model will also be incorrect.

Having high quality data is vitally important for predictive policing solutions, so then how exactly do law enforcement agencies acquire their data? According to the National Institute of Justice (https://www.nij.gov/topics/crime/pages/ucr-nibrs.aspx) the two major sources of crime statistics in the United States are the Uniform Crime Reports and the National Incident-Based Reporting System. The Uniform Crime Reports system (UCR) is managed by the FBI and has been tracking data on seven major crimes (murder, robbery, rape, aggravated assault, burglary, theft and vehicle theft) since the 1930s.  The National Incident-Based Reporting System (NIBRS) contains data generated by 5,271 different law enforcement agencies throughout the country.  Now over 5,000 law enforcement agencies and a national FBI database may sound like a comprehensive data set, but unfortunately this is just not true. According to the Bureau of Justice Statistics the NIBRS database only covers 20% of all US law enforcement agencies, which represent only 16% of the population (https://www.bjs.gov/content/nibrsstatus.cfm). The UCR does do a better job of representing the country with 47 states participating in the program, but only 25 of those states have mandates that require law enforcement to report their data to the program, so this dataset may also be very incomplete (https://www.bjs.gov/content/nibrsstatus.cfm).  

The lack of a comprehensive national data set is a major reason that predictive policing should not be deployed at the national level, but what about our local law enforcement? After all some states, like Iowa where 248 law enforcement agencies representing 100% of the population are required to report to the UCR as well as the NIBRS, do an excellent job of recording crime data. Should these states be able to implement analytics tool sets at the local level? The short answer is probably not. While increasing the amount of data that agencies collect and report is certainly a step in the right direction, there is still one major flaw with this data; the data collection tool. 

When building a dataset within a manufacturing environment, as an example, the main tools used for collecting the raw data are usually electronic sensors or measuring tools. These data collection tools are machines, they do not have any personal motivations or beliefs. All a sensor knows is ones and zeros, and this makes them very good at reporting real reliable information. The problem with police data is that the data collection tool is not a sensor, it’s a person, and people are flawed. Race, gender, and social class can influence an officer’s decision to interact with a person, even if the officer does not consciously realize it. This allows for bias or personal motivations to influence the data collection process, which produces data that is not a real representation of what is going on. 

Before going further it’s important to say that a large percentage of law enforcement officers are good people just trying to keep their community safe, there are some bad apples, but this is more the exception than the rule. However, the data does appear to indicate that bias exist within our law enforcement agencies. We see an example of this in New York City. According to the United States Census Bureau in 2017 white individuals make up 43.1% of the city’s population, while black and Latino individuals make up 53.4% of the city’s population (https://www.census.gov/quickfacts/fact/table/newyorkcitynewyork/PST045217).  If we then look at the New York City Police department’s year end report for 2017, we see that 89.5% of individuals who where subjected to stop and frisk by the NYPD where black or Latino, while whites made up only 8.6% of individuals stopped (https://www1.nyc.gov/assets/nypd/downloads/pdf/analysis_and_planning/year-end-2017-enforcement-report.pdf). It’s important to note that race is not the only factor at play in these stop and frisk numbers. It would be incorrect to say that police stopped individuals based solely on their race, but at the same time it was also be incorrect to completely rule out race as a factor in these stops.  The stop and frisk numbers from New York City are not even close to representative of the nation, but they are important in the context of police data collection. What this data shows us is that there is a possibility that racial bias can affect an officer’s judgement, which in turn means that racial bias can affect the data also. 

The problem with predictive policing is not the technology, it’s the data. Accurate data comes from accurate sensors. Currently the best sensor we have for collecting this kind of data is the police officer. These brave men and women are heroes, but they are not machines. They are humans, and humans make mistakes. What this leaves us with is an error prone data set that only represents a small percentage of the population. Currently, we are seeing the technology used to record data improve (police body cameras, drones, etc..). Concurrently 22 states are developing or testing computer systems that will allow them to submit their local law enforcement data to the UCR national database. These advances in data collection and distribution will dramatically improve the capabilities of predictive policing systems.  Over time, this will lead to predictive toolsets being essential to law enforcement activities, until then we should all keep a close eye on how these tools are used with the data we have.