Using Statistics to Understand California Police Stops

Farzan Ahmad Hashmi
4 min readAug 9, 2021

More than 20 million Americans are stopped each year for traffic violations, making this one of the most common ways in which the public interacts with the police. There was never a comprehensive, national repository detailing these encounters. This was until Stanford University filed hundreds of public records requests in collaboration with Big Local News to compile a dataset of over 200 million traffic stops, conducted in dozens of cities and states across the country — the largest such effort to date.

The Dataset:

13031552 recorded police stops

Goal for my project:

  • Just to use some of programming skills to find explore the data set and gain any useful insights

Using Time Series Graphs and Bar Graphs to observe trends with respect to time:

  • Using the stop_date column, I made the following time series graph to see how the number of police stops changes over time.
We can notice some periodic fluctuations
  • Zooming in some more:
  • And some more:
There seems to be a relationship between day of the week and # of police stops
There seem to be a lot more stops made during the week then on the weekends which makes sense
The trend holds up for the rest of the dataset.

That’s cool, but now let’s get into something more interesting,

Search and Hit Rates

In order to use the data to try to find any racial disparities in police traffic stops, we use two components:

  • Search rate: This is basically just the percentage of people that end up getting searched as a result of a police stop. We can separate our results by race.
  • Hit rate: The hit rate looks at the proportion of searched cars that actually do end up having contraband.

So if one race has a really high search rate, but just a normal hit rate in comparison to other races, that may be hinting towards something.

Disclaimer: This benchmark and outcome test is not a rigorous test of discrimination. In their paper, Stanford themselves said, different signal distributions of the likelihood of possessing contraband for different populations can lead to misleading results from the benchmark and outcome tests.

  • After some programming and manipulation of the original table, I created this new table with each county, and the search and hit rates per race.

Now to examine racial disparities, we will create a graph with a 45 degree line. Each point on the graph will represent a county and its ratio of minority search rates to white search rates. Theoretically, if there was no difference between the search rate of whites and black people in a county, that county would lie on the 45 degree line.

Based on these graphs we can see that although most of the counties, lie near the line, some don’t. In particular there is one county that searches Hispanic and black people at a much higher rate than white people.

I realized that the county was Trinity county. Now if we plot the hit rates in the same fashion we get:

The red dot represents Trinity County. It is evident that although black and Hispanic people are searched at a much higher rate than white people, they are not actually being caught with contraband at a higher rate at all. I tried doing some more research on Trinity County but the most I found was that a few months ago a Trinity County deputy was placed on administrative leave after posting George Floyd memes.

--

--