-
Notifications
You must be signed in to change notification settings - Fork 0
/
aqrd23 paper.qmd
118 lines (74 loc) · 17.2 KB
/
aqrd23 paper.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
title: "COMPAS Prediction for Common Drug-Related Charges"
author: "Debbie Olorunisola"
format:
cmc-article-pdf:
fontsize: 12pt
margin-left: 1.5in
margin-right: 1.5in
margin-bottom: 1.2in
margin-top: 1.2in
keep-tex: true
fig-pos: 'H'
tbl-pos: 'tbp'
bibliography: bibliography.bib
abstract: |
A 2018 paper shows that COMPAS, a commonly-used recividism prediction tool, is more likely to incorrectly predict that a Black defendant recidivates than their white counterparts. Further, the paper uncovered that the algorithm does not have significantly better predictive power than a model with two predictors: prior convictions and age. This paper seeks to replicate their findings and examine whether this bias is also evident amongst drug-related offenses. Using two regression approaches, I find that this negative predictive bias towards Black defendents is still evident for charges relating to cocaine, controlled substances, and marijuana, but that drug-related offenses to not contribute significantly to the predictive power (error) of the algorithm. This paper further emphasizes the inability of predictive software to unbiasedly (that is, irrespective of race) determine recidivism risk.
---
# Introduction
As a method of crime prevention and mitigation, the criminal justice system has turned to predictive measures to survey and monitor places and people that may be "prone" to crime. This has led to companies like equivant (formerly Northpointe) releasing recividism prediction softwares which are meant to be an instrument for considering one's potential to commit another crime upon release from prison.
While this sounds sensible in theory, COMPAS and other predictive softwares are practically biased, especially against specific racial and ethnic minorities in the United States, even without directly using race as a variable in its algorithm. A 2018 paper found that, although the tool boasts its sophistication with 137 predictors, it is not significantly better at predicting recidivism than the average person [@dressel2018]. Using data from the Broward County, Florida jail (an institution that uses COMPAS as a tool in the sentencing and parole process), the researchers found that the predictive accuracy of the algorithm and performance for classification thresholds like sensitivity and specificity could be well-approximated by much simpler logistic regression models with significantly less predictors. In fact, they were able to get the model down to two predictors: the subject's age and the number of prior arrests they had. @tbl-main replicates the results in columns 1 and 2 of Table 2 on page 4 in @dressel2018.
| |
|:---------------------------------|
| ![](tables/tbl_bootstrapped.png) |
: Logistic Regression Accuracy for 2 and 7-Predictor Models {#tbl-main}
Without using race, how were they able to reconstruct this algorithmic bias?
Ben Harcourt, a legal theorist who specializes in criminal surveillance and penalty, traces the history of criminal justice as it relates to predicting and mitigating risk or repeated offense in a working paper [@harcourt2010]. While racial bias in policing is already a documented issue in literature analyzing post-Reconstruction America (see: @hinton2018), his main contribution in this paper is detailing more explicit cases of how U.S. police departments used factors like race, nationality, and skin color to predict what people and areas were likely to commit crime and repeatedly offend in the 1920s (p. 5-6). He uses legal documents to illustrate a temporal trend towards simpler prediction methods from the 1920s to the 1970s, coming to the conclusion that police departments are using "prior criminal history as a proxy for future dangerousness" before Dressel and Farid's 2018 paper. He argues that risk prevention must focus on the front-end of the prison-industrial complex: the "continuously increasing racial disproportionality in the prison population" (p. 9). Without directly addressing racial disparties in surveillance and access to particular preventative resources, the data that these software uses will continue to be a positive feedback loop.
While the specific variables COMPAS uses are unknown to the public, actuarial research into insurance shows how "other variables such as where someone lives, their income, their age at the first application for a loan, or a host of other social data have become so closely correlated with race that these social statistics act as a de facto proxy for race" are potential racial proxies, thus embedding and perpetuating racial bias (@wiggins2020, p. 91). The latest release of the COMPAS best practices manual reveals that, in addition to considering prior arrest history and family history, the algorithm includes substance abuse, financial problems, social environment, and residential instability (@compasuserguide, p. 9) when deciding someone's risk potential. Since many of these variables overlap with the ones mentioned in @wiggins2020, this clearly has concerning implications for the algorithm's bias against Black defendants. Further, it corroborates Harcourt's logic that economic and social intervention might be better methods of mitigating risk than lengthening a sentence or denying. Preventing exposure to these risks is a form of risk mitigation in and of itself.
In the 2019 update to their manual, equivant focuses on defending the accuracy of their software, potentially in response to @dressel2018. However, whether or not their accuracy is "on-par" for the AUC-ROC metric is not the core problem. Rather, as people in actuarial ethics and fairness argue, algorithmic transparency and attempts to alleviate systemic bias are the real issues with black box software like COMPAS. In order to illustrate how, despite claims of being "accurate enough", COMPAS is embedding racial disparity in the variables it considers, I will focus on understanding how COMPAS ranks substance abuse, using data from Broward County, Florida between 2013-2014. According to the best practices guide, someone with a substance abuse problem is ranked on a scale from Unlikely (deciles 1-2) to Probable (3-4) to Highly Probable (5-10) (@compasuserguide, p. 9). Although I can not seperate the overall decile score from the substance abuse decile, I will use it as a reference point to see how substance type and race affect decile and, thus, COMPAS' outcome[^1]. Further, while I do not have direct information about substance abuse, I will use "drug-related offenses" as a proxy for substance use, as Broward County does not disambiguate "possession" charges from usage.
[^1]: While COMPAS does not directly consider race in its algorithm, in order to illustrate how substance use and drug-related charges interact with race, I will be using race in the formulas as a means of consolidating all of the other variables that COMPAS *does* use, as well as a way to show potential associations between race and charge type that are reflective of policing patterns in the United States.
# Data and Methods
| |
|:---------------------------------|
| ![](tables/tbl_1.png) |
: Descriptive Statistics for People Charged with Drug-Related Offenses {#tbl-1}
@tbl-1 shows that most of the drug-related charges in Broward County were for possession, allowing us to use the data as a close proxy for drug usage. I keep observations for other drug-related offenses in this paper in order to keep the sample size large.
For the purposes of this paper, I focus exclusively on Black and white defendants. After labelling the data for drug-related offenses and by drug type, I looked at the relationship between race and drug type.[^2]
[^2]: Once labelled, I found that 17.8% of the cases in the dataset were for drug-related offenses. Subsetting for race and substance type left me with 984 observations or 13.6% of the observations in the dataset.
We see that the most prominent cases are for cannabis, cocaine, and controlled substances, with Black defendants making up the most of the former two, while the latter is nearly 1:1. Using this information, I focused my analysis on these cases.
![Barplot showing the relationship between race and drug type](figures/race_and_decile.png){#fig-2 width="75%"}
If the COMPAS algorithm was race blind, then when comparing drug-related offenses, we would expect similar distribution trends of white and Black defendants for each decile bin. However, @fig-2 shows a descending trend in the number of white defendants as decile score increases and an ascending trend in the number of Black defendants. Notably, this trend differs for to overall trend amongst Black defendants; when we do not subset for drug charges relating to cocaine, cannabis and controlled substances, we see a more uniform distribution in the decile scores for Black defendents (the trend for white defendants is still descending).
Visualization (see appendix) also shows that COMPAS might weigh cocaine use as a higher risk than cannabis or marijuana usage. If this proves to be the case, then it might be one of the factors instrumenting for race in their algorithm, as our first figure shows that Black defendants are more likely to be charged in relation to these particular substances.
# Results
| |
|:--------------------------------|
| ![](tables/tbl_interaction.png) |
: Logistic Regression Table for the Interactions between Using Common Drugs and Race {#tbl-3}
In my analysis, I wanted to see how drug-related offenses affected recidivism and how drug type affected decile score. First, I looked at how race and drug-offense were related using multiple logistic regression and interaction terms. I compared the COMPAS-predicted recidivism for drug-related offenses to all other charges (excluding the less common drug offenses) using the equation: $$ log(\frac{p}{1-p}) = \beta_0 + \beta_1x_{Black} + \beta_2*x_{drug-related} + \beta_3*x_{Black}*x_{drug-related}$$
@tbl-3 corroborates Dressel and Farid's conclusion that COMPAS is likely to under predict the odds for a white person to recidivate, with significance[^3]. We see that, for a white person who has not be charged with a drug-related offense, the odds of repeat offense are predicted to be 0.472, whereas the actual odds are 0.629. Interestingly enough, we can also observe the effects of drug use on prediction for while people. Given drug use, equivant's software predicts that the increase in odds of recidivating is much higher for white people than it is for Black people. In fact, COMPAS overpredicts the odds for a white person to reoffend if they committed a drug-related charge (odds = 0.889 vs 0.753).
However, this difference in magnitude is not indicative of an inverted relationship, compared to the @dressel2018 paper. Whereas white defendants are presumed to always have low odds of recidivating, Black defendants, by default, have a higher chance of recidivating than not, according to COMPAS. Therefore, it's possible that the effect of a drug charge is only negligible for Black people because the effect of race was already so high; the COMPAS model overpenalizes a Black defendant who has not committed a drug-related offense at a rate 2.27 times higher than reality. The model then resolves its high penalty for Black people with a drug-related charge, as COMPAS predicts the odds of them returning to prison to be 1.31, which is comparable to the actual rate of 1.30.
[^3]: None of the coefficients intersect 0, meaning their effect on predicting whether a defendant reoffends is non-negligible.
| |
|:---------------------------|
| ![](tables/tbl_linmod.png) |
: Table of Coefficients for a Linear Model including the Interactions between Using Common Drugs and Race {#tbl-2}
Next, I looked at how race and drug interacted to see if there was a bias towards rating Black defendants as "highly probable" to reoffend (deciles 5 through 10) compared to white defendants. It's important to look at what threshold defendants are at, on average, as it might help understand how much drug-related offenses contribute to COMPAS prediction.
For this model, I looked at the same subset of data, grouping all non-drug related offenses as "None". The model I used for race and drug interaction is similar to the multiple logistic regression model: $$ y = \beta_0 + \beta_1x_{Black} + \beta_2*x_{drug-related} + \beta_3*x_{Black}*x_{drug-related}$$ except here, the outcome $y$ is the COMPAS decile prediction for a single defendant. The "Drug Interaction with Decile" model simply does not contain race as a variable.
The simple and multiple linear regression outputs significant predictors at the $\alpha = 0.05$ level. The only exception is for the interaction between using a controlled substance or cannabis and being Black, which shows a weak (due to high p-value) negative relationship between the use of these drugs and decile score. The results suggest a relationship between drug-related offenses, race, and COMPAS decile similar to the ones seen before: that Black defendants predicted to be more likely to recidivate and that drug-related charges are positively-correlated with decile score. In particular, we see that the effect of being Black on decile score, whether charged with a drug-related offense or not, is higher on average than the effect of a white person being charged with an offense related to any of the 3 most-used drugs.
However, this conclusions drawn from the model should be taken with caution. The low $R^2$ values (0.019 and 0.099) for these linear models indicate that drug-related charges are not helpful predictors of variations in decile score. Further, for a scale of 1 to 10 with narrow thresholds of risk categorization, having a root mean-squared error of 2.83 and 2.71 for the simple and multiple linear regression models, respectively, is quite a high error. Therefore, while this model does indicate some concerning prediction disparities, it may not be the most reliable for understanding how COMPAS utilizes substance abuse as a predictor.
# Discussion and Conclusion
There is a dissimilar distribution of white and Black defendants along the COMPAS decile risk score, which is also seen in when examining how the algorithm assesses drug-related charges[^4]. According to the logistic regression, COMPAS seems to overestimate the odds of a Black person reoffending and underestimate the odds of any white person reoffending. Given the small effect on odds of drug-related charges for Black defendants *and* the small explanatory power of drug offense, one might draw two conclusions:
1. The effect of having a drug-related charge is negligible compared to other factors that COMPAS uses.
2. The hidden variables that COMPAS uses are over-penalizing Black defendants.
The latter conclusion is not unique to this paper and was already well-demonstrated using both qualitative and quantitative research methods. However, seeing that drug-related offenses do not make up the bulk of this disparity in sentencing might lead future researchers to consider other the other actuarial methods of COMPAS, in order to expose what the black box algorithm is actually doing and, hopefully, figure out more beneficial *preventative* measures, rather than further embedding these systemic differences into these tools.
[^4]: Which includes dealing, purchasing, trafficking, manufacturing, and possession.
# Limitations and Future Research
I was very limited by the lack of socioeconomic status and geographical information in the data. A more thorough attempt at analyzing this data and the effects of race proxies might look at how COMPAS predicts risk by zip code or region and what offenses it rates higher within each of those regions. One might be able to reconstruct neighborhood racial make-up from each of these regions and discuss the predictive power of these variables. From other research into actuarialization, this seems like a promising approach to discussing the trends in the data, although it might even be better suited for a place with more thoroughly-documented racialization within its neighborhoods (like New York, which, according to equivant, also uses COMPAS).
Another study might also benefit from more observations of prior users of other drugs, like heroin, meth, and oxycodone. It would be worthwhile to see how users of these substances compare to other hard drug users and how COMPAS may or may not be incidentally constructing proxies for race through their inclusion. In keeping these factors out due to low observations, my models may have unintentionally embeddd bias into their predictions.
Finally, according to the Florida Department of Corrections, 11.5% of people charged with a drug-related offense will recidivate within 2 years, with those convicted with possession reoffending at a rate of 19.1% and those involved in trafficking reoffending at a rate of 10.6% [@flrecividism2022]. I labelled my data by interaction type (for drug-related offenses), but did not make use of it in this paper. This would also be a worthwhile look at how *interaction type* affects charge, especially given the Florida Department of Corrections report.
::: {#refs}
:::
{{< pagebreak >}}
# Appendix {.appendix}
![Barplot showing the relationship between race and drug type](figures/race_and_drug_barplot.png){width="75%"}
![COMPAS Decile Score for the substances of focus for this paper.](figures/drug_use_decile.png){width="75%"}