The Flaws of Policies Requiring Human Oversight of Government Algorithms

tags: Digisprudence Automation Bias

Notes

enable governments to use algorithms—but only if a human has some form of oversight or control over the final decision.

NOTER_PAGE: (2 0.1794380587484036 . 0.2768313458262351)

“human in the loop”

NOTER_PAGE: (2 0.2790549169859515 . 0.18824531516183987)

Despite the emphasis that legislators have placed on hu- man oversight as a mechanism to mitigate the risks of gov- ernment algorithms, the functional quality of these policies has not been thoroughly interrogated.

NOTER_PAGE: (2 0.6698595146871009 . 0.1030664395229983)

rarely ref- erence empirical evidence demonstrating that human over- sight actually advances those values.

NOTER_PAGE: (2 0.7445721583652618 . 0.3049403747870528)

“no ‘objective’ solution” regarding the ap- propriate balance between rules and discretion

NOTER_PAGE: (3 0.1500638569604087 . 0.6431005110732538)

2. Discretion, algorithms, and decision-making in government

NOTER_PAGE: (3 0.4827586206896552 . 0.08943781942078365)

Many of the most consequential and controversial govern- ment uses of algorithms take place in street-level bureau- cracies

NOTER_PAGE: (3 0.5523627075351214 . 0.09114139693356048)

3. Survey of human oversight policies

NOTER_PAGE: (4 0.1909323116219668 . 0.5178875638841567)

notably high error rates.

NOTER_PAGE: (4 0.20561941251596424 . 0.32282793867120957)

biased against women, minorities, and low-income in- dividuals

NOTER_PAGE: (4 0.2330779054916986 . 0.11073253833049404)

automa- tion and algorithms significantly reduce expertise and discre- tion in street-level bureaucracies and administrative agencies

NOTER_PAGE: (4 0.4125159642401022 . 0.4250425894378194)

3.1. Restricting “solely” automated decisions

NOTER_PAGE: (5 0.13665389527458494 . 0.09114139693356048)

3.2. Emphasizing human discretion

NOTER_PAGE: (5 0.19604086845466157 . 0.520442930153322)

human decision-makers must be able to disagree with the algorithm’s recommendations.

NOTER_PAGE: (6 0.5504469987228608 . 0.6959114139693356)

3.3. Requiring “meaningful” human input

NOTER_PAGE: (6 0.644955300127714 . 0.08177172061328791)

human overseers must understand how the al- gorithm operates

NOTER_PAGE: (6 0.6679438058748404 . 0.5911413969335605)

human decision-makers must not depend on al- gorithms and should instead thoroughly consider all of the information relevant to a given decision.

NOTER_PAGE: (6 0.8378033205619413 . 0.5817717206132879)

4. Two flaws with human oversight policies

NOTER_PAGE: (7 0.26756066411238827 . 0.08347529812606473)

4.1.2. Human discretion does not improve outcomes

NOTER_PAGE: (7 0.3588761174968072 . 0.5212947189097104)

people cannot provide the en- visioned protections against algorithmic errors, biases, and in- flexibility.

NOTER_PAGE: (7 0.3869731800766284 . 0.10221465076660988)

human oversight is unlikely to pro- vide protections against the harms of algorithmic decision- making.

NOTER_PAGE: (7 0.4099616858237548 . 0.5468483816013628)

4.1. Flaw 1: Human oversight policies are not supported by empirical evidence

NOTER_PAGE: (7 0.5172413793103449 . 0.08432708688245315)

automating certain parts of human tasks can make the remaining parts more difficult and cause human skills to deteriorate

NOTER_PAGE: (7 0.6411238825031929 . 0.6124361158432708)

automated systems may simply lead to different types of errors rather than reducing overall errors as intended

NOTER_PAGE: (7 0.6711366538952747 . 0.7819420783645656)

4.1.1. Restrictions on “solely” automated decisions provide su- perficial protection

NOTER_PAGE: (7 0.710727969348659 . 0.08262350936967632)

Automation can also create a diminished sense of control, responsibility, and moral agency among human operators

NOTER_PAGE: (7 0.7164750957854407 . 0.5187393526405452)

Public sector algorithms typically al- ready operate with human involvement

NOTER_PAGE: (7 0.7720306513409962 . 0.24361158432708688)

Broadly speaking, people are bad at judging the quality of al- gorithmic outputs and determining whether and how to over- ride those outputs.

NOTER_PAGE: (7 0.7873563218390806 . 0.5221465076660988)

People struggle to evaluate the accuracy of algorithmic predictions

NOTER_PAGE: (7 0.8199233716475096 . 0.651618398637138)

the narrow scope of “solely” automated decisions creates flimsy and easily avoidable protections.

NOTER_PAGE: (7 0.8582375478927203 . 0.16609880749574105)

even though algorithmic advice can improve the accuracy of human predictions, people’s judgments about when and how to diverge from algorithmic recommendations are typically incorrect

NOTER_PAGE: (8 0.11813537675606642 . 0.1899488926746167)

Police have been shown to follow incorrect advice from algorithms, even when tasked with overseeing an algorithm and under no mandate to follow its advice.

NOTER_PAGE: (8 0.3205619412515964 . 0.11669505962521295)

evidence suggests that algorithmic explanations and trans- parency do not actually improve human oversight.

NOTER_PAGE: (8 0.42273307790549175 . 0.5110732538330494)

explanations do not improve people’s abil- ity to make use of algorithmic predictions

NOTER_PAGE: (8 0.45083014048531295 . 0.626916524701874)

explanations can have the harmful effect of prompting people to place greater trust in al- gorithmic recommendations even when those recommenda- tions are incorrect

NOTER_PAGE: (8 0.47765006385696046 . 0.6286201022146508)

explanations have no basis in the algorithm’s ac- tual functioning

NOTER_PAGE: (8 0.541507024265645 . 0.596252129471891)

Algorithmic transparency similarly reduces people’s ability to detect and correct model errors

NOTER_PAGE: (8 0.5549169859514688 . 0.5442930153321976)

judges often make more punitive decisions regarding Black defendants than white defendants who have the same risk score, causing the introduction of risk assessments to exac- erbate racial disparities in pretrial detention

NOTER_PAGE: (8 0.5836526181353768 . 0.08432708688245315)

explanations and transparency ap- pear to hinder—rather than improve—people’s ability to iden- tify algorithmic mistakes and make effective use of algorith- mic recommendations.

NOTER_PAGE: (8 0.5983397190293742 . 0.6396933560477002)

human discretion can enable people to inject new forms of inconsis- tency and bias into decisions.

NOTER_PAGE: (8 0.6526181353767561 . 0.2563884156729131)

4.1.3. Even “meaningful” human oversight does not improve outcomes

NOTER_PAGE: (8 0.710727969348659 . 0.07921635434412266)

automation bias persists even after training and explicit in- structions to verify an automated system

NOTER_PAGE: (8 0.7228607918263091 . 0.5161839863713799)

risk assessments increase the weight that judges, law stu- dents, and laypeople place on risk relative to other consid- erations

NOTER_PAGE: (8 0.7982120051085568 . 0.5442930153321976)

do not provide a standard for determining whether any par- ticular form of human oversight is meaningful.

NOTER_PAGE: (8 0.842911877394636 . 0.08347529812606473)

people typically defer to automated tools and increase their attention to the factors emphasized by algorithms.

NOTER_PAGE: (8 0.9169859514687101 . 0.7896081771720613)

4.2. Flaw 2: Human oversight policies legitimize flawed and unaccountable algorithms in government

NOTER_PAGE: (9 0.09450830140485314 . 0.08858603066439523)

for human oversight to be mean- ingful, decision-makers must routinely disagree with the au- tomated system

NOTER_PAGE: (9 0.1347381864623244 . 0.616695059625213)

human overrides cannot ac- tually remedy the concerns that motivate overrides.

NOTER_PAGE: (9 0.31928480204342274 . 0.7385008517887564)

4.2.1. The assumption of effective human oversight provides a false sense of security in adopting algorithms

NOTER_PAGE: (9 0.35312899106002554 . 0.08943781942078365)

humans tend to override algorithms in detrimental rather than beneficial ways

NOTER_PAGE: (9 0.4968071519795658 . 0.7614991482112436)

these policies merely provide cover for fundamental concerns about the use of algorithms in government decision-making.

NOTER_PAGE: (9 0.5434227330779056 . 0.25468483816013626)

people cannot reliably balance an algorithm’s ad- vice with other factors,

NOTER_PAGE: (9 0.6424010217113666 . 0.6056218057921635)

deflect criticism of these tools but fail to mitigate the un- derlying concerns.

NOTER_PAGE: (9 0.7241379310344828 . 0.5187393526405452)

4.2.2. Relying on human oversight diminishes responsibility and accountability for institutional decision-makers

NOTER_PAGE: (10 0.33014048531289913 . 0.5161839863713799)

discretion is unlikely to be an effective remedy, as judges often use their discretion to override risk assess- ments in punitive and racially biased ways,

NOTER_PAGE: (10 0.3767560664112388 . 0.151618398637138)

Human oversight policies position frontline human operators as the scapegoats for algo- rithmic harms,

NOTER_PAGE: (10 0.40676883780332057 . 0.5817717206132879)

even if human oversight were a reliable form of quality control, hu- man oversight policies would still have the harmful effect of diminishing the accountability of agency leaders and vendors

NOTER_PAGE: (10 0.47956577266922096 . 0.8696763202725725)

if risk assessments can be used only to in- form outcomes that would have been reached independently, then there is no reason to use such tools at all.

NOTER_PAGE: (10 0.6685823754789273 . 0.1848381601362862)

judges and other people often defer to automated advice and change their decision-making processes due to algorithms, yet do not recognize that these behaviors are occurring

NOTER_PAGE: (10 0.7707535121328225 . 0.2282793867120954)

5.1. The upper bound of human oversight

NOTER_PAGE: (11 0.08748403575989784 . 0.5212947189097104)

algorithmic accuracy does not always lead to the opti- mal outcomes

NOTER_PAGE: (11 0.3097062579821201 . 0.5545144804088586)

decision sup- port tools instead of specific recommendations

NOTER_PAGE: (11 0.38122605363984674 . 0.8356047700170358)

adding greater structure to human-algorithm col- laborations

NOTER_PAGE: (11 0.409323116219668 . 0.5894378194207837)

although the human operators were the most prox- imate to the wrongful arrest, the police chief and vendors are more substantively responsible for the incident. They are the ones who should be held accountable.

NOTER_PAGE: (11 0.5402298850574713 . 0.151618398637138)

asking people to oversee au- tomated systems creates “an impossible task”

NOTER_PAGE: (11 0.6839080459770115 . 0.7308347529812607)

5. From human oversight to institutional oversight

NOTER_PAGE: (11 0.765006385696041 . 0.08517887563884156)

Evaluating the quality of an algorithmic prediction is more difficult than simply making a prediction on one’s own.

NOTER_PAGE: (11 0.872286079182631 . 0.6022146507666098)

institutional oversight ap- proach to governing public sector algorithms.

NOTER_PAGE: (12 0.1475095785440613 . 0.7359454855195912)

promote greater rigor and democratic participation in government decisions about whether and how to use algorithms.

NOTER_PAGE: (12 0.1781609195402299 . 0.7529812606473595)

5.2.1. Stage 1: Agency justification and evaluation

NOTER_PAGE: (12 0.4425287356321839 . 0.5127768313458262)

agencies must demonstrate that it is appropriate to use an algorithm at all.

NOTER_PAGE: (12 0.5312899106002554 . 0.6848381601362862)

Question 1: Is it appropriate to incorporate the algo- rithm into decision-making?

NOTER_PAGE: (12 0.5919540229885057 . 0.5698466780238501)

con- sider “red lines” that mark unacceptable uses of algorithms.

NOTER_PAGE: (12 0.640485312899106 . 0.8901192504258943)

applications such as fa- cial recognition and predictive policing violate fundamental notions of justice and human rights

NOTER_PAGE: (12 0.710727969348659 . 0.7367972742759795)

5.2. Institutional approach for overseeing government algorithms

NOTER_PAGE: (12 0.8295019157088123 . 0.08177172061328791)

the more that a decision requires individualized human dis- cretion, the less appropriate it is for algorithms to play a role in decision-making.

NOTER_PAGE: (13 0.5370370370370371 . 0.08943781942078365)

extent to which the algorithm in question is trustworthy,

NOTER_PAGE: (13 0.723499361430396 . 0.3321976149914821)

algorithm must be rigorously evaluated for the task at hand.

NOTER_PAGE: (13 0.9233716475095786 . 0.282793867120954)

Persistent monitoring is par- ticularly important in light of evidence that judicial uses of algorithms can shift over time

NOTER_PAGE: (14 0.21839080459770116 . 0.7155025553662692)

practitioner responses to algorithms depend on localized de- tails of institutional implementation

NOTER_PAGE: (14 0.2694763729246488 . 0.510221465076661)

monitor whether the algorithm distorts or erodes the moral agency of decision-makers.

NOTER_PAGE: (14 0.3818646232439336 . 0.6567291311754685)

Question 2: How should the algorithm be integrated with human decision-making?

NOTER_PAGE: (14 0.41187739463601536 . 0.13884156729131175)

5.2.2. Stage 2: Democratic review and approval

NOTER_PAGE: (14 0.45785440613026823 . 0.5110732538330494)

must be subject to review and approval by the public or a democratically accountable body.

NOTER_PAGE: (14 0.48914431673052367 . 0.7223168654173765)

there must be evidence suggesting that people can oversee the algorithm and that incorporating the algorithm into decision- making will improve outcomes.

NOTER_PAGE: (14 0.5312899106002554 . 0.075809199318569)

conduct experimental evaluations of human-algorithm col- laborations before implementing an algorithm in practice.

NOTER_PAGE: (14 0.6155810983397191 . 0.0809199318568995)

evaluate whether any proposed forms of human oversight are actually effective.

NOTER_PAGE: (15 0.09323116219667944 . 0.5212947189097104)

assump- tion should be that human oversight is likely to be ineffec- tive, unless proven otherwise.

NOTER_PAGE: (15 0.12260536398467434 . 0.696763202725724)

provide affirmative evidence that this mechanism actually im- proves outcomes

NOTER_PAGE: (15 0.17560664112388252 . 0.5212947189097104)

Notice-and-comment should be extended to cover agency uses of algorithms,

NOTER_PAGE: (15 0.23690932311621968 . 0.20357751277683134)

5.3. Benefits of institutional oversight approach

NOTER_PAGE: (15 0.30715197956577267 . 0.08943781942078365)

policymakers must place greater scrutiny on whether an algorithm is even appropriate to use in a given context.

NOTER_PAGE: (15 0.35185185185185186 . 0.5630323679727428)

6. Conclusion

NOTER_PAGE: (16 0.3103448275862069 . 0.06814310051107325)