The reanimation of pseudoscience in machine learning and its ethical repercussions

tags: Machine Learning Construct validity

Notes

INTRODUCTION

NOTER_PAGE: (1 0.7252155172413793 . 0.0874737578726382)

claimed the ability to predict unobservable latent character traits, including homosexuality, political ideology, and criminality, from photographs

NOTER_PAGE: (1 0.8125 . 0.6074177746675997)

discrepancies between the presumed epistemic operation of these tools and their in-prac- tice ability to achieve those aims.

NOTER_PAGE: (1 0.8922413793103449 . 0.2855143456962911)

claim to have trained ML classifiers to predict personal- ity, behavioral, or identity characteristics from image, text, voice, or other biometric data.

NOTER_PAGE: (2 0.19073275862068964 . 0.5654303708887334)

confused inferential bases of these studies are responsible for their ethically problematic nature.

NOTER_PAGE: (2 0.3011853448275862 . 0.09937018894331699)

Inferring sexual orientation

NOTER_PAGE: (2 0.5651939655172413 . 0.5206438068579425)

PHYSIOGNOMY RESURRECTED

NOTER_PAGE: (2 0.8356681034482758 . 0.10146955913226031)

support for a particular theory of the genesis of same-sex attraction. The proposed hy- pothesis is the prenatal hormone theory

NOTER_PAGE: (3 0.15948275862068964 . 0.3198040587823653)

Lie detection

NOTER_PAGE: (3 0.25808189655172414 . 0.5059482155353394)

Personality psychology

NOTER_PAGE: (3 0.3238146551724138 . 0.08817354793561931)

deep learning is invoked as a means to obtain objectivity beyond human judgment; however, the training data- set was self-labeled by human raters.

NOTER_PAGE: (3 0.4418103448275862 . 0.2015395381385584)

predictive accuracy of the model was taken to substantiate the hypothesis

NOTER_PAGE: (3 0.46012931034482757 . 0.7760671798460461)

Criminality detection

NOTER_PAGE: (3 0.5296336206896551 . 0.5073477956613016)

A HISTORY OF MISBEGOTTEN SCIENCE

NOTER_PAGE: (3 0.6993534482758621 . 0.5052484254723583)

‘‘Abnormality’’ classification

NOTER_PAGE: (3 0.740301724137931 . 0.09447165850244926)

What makes observational data suf- fice for some scientific inferences, while, in others, it appears woefully inadequate?

NOTER_PAGE: (4 0.18911637931034483 . 0.6417074877536738)

what confounds inference to unobservable latent variables in particular natural systems. These are complex systems, which exhibit extreme sensitivity to initial conditions. This is contrasted to the relative ease of inferring latent variables in more simplistic causal dy- namics (e.g., the purview of celestial mechanics). The dynamics of our solar system are not chaotic; the weather and human behavior are (although on different orders of magnitude).

NOTER_PAGE: (4 0.7634698275862069 . 0.8159552134359691)

Though the outputs of ML models are referred to as ‘‘predictions,’’ ML is rarely used to make actual predictions about the future

NOTER_PAGE: (5 0.1896551724137931 . 0.8488453463960811)

EPISTEMIC FOUNDATIONS OF ML

NOTER_PAGE: (5 0.27370689655172414 . 0.09307207837648705)

The value-free ideal in science and ML

NOTER_PAGE: (5 0.49299568965517243 . 0.5073477956613016)

ML models use evidence, or training data, to form predictions or classifications, which generalize what they have learned from their training set to unseen instances (i.e., novel data). The field of ML strives to automate inductive inference. Thus learning is fundamentally about generalization.

NOTER_PAGE: (5 0.5964439655172413 . 0.10286913925822253)

What is ‘‘learned’’ by the ‘‘machine’’ is hence a mathematical function. The key advantage of these training procedures lies in their abil- ity to discover correlations in very high-dimensional feature spaces.

NOTER_PAGE: (5 0.861530172413793 . 0.1245626312106368)

talk of ‘‘letting the [raw] data speak for themselves.’’

NOTER_PAGE: (6 0.14816810344827586 . 0.728481455563331)

belief that the algorithmic discovery of correlations in increasingly large data- sets, sans any of the guardrails typical of the practices of trained scientists, is sufficient to count as scientific knowledge.

NOTER_PAGE: (6 0.20366379310344826 . 0.8250524842547236)

underdetermination of our knowledge rela- tive to the results of our empirical efforts

NOTER_PAGE: (6 0.27316810344827586 . 0.23512946116165148)

Data, however, are always collected, ordered, deci- phered, and interpreted in light of our theories.

NOTER_PAGE: (6 0.3448275862068966 . 0.5843247025892232)

if an epistemic pursuit is undertaken for the express purpose of direct intervention on human lives, then the epistemic task is ineliminably normatively laden

NOTER_PAGE: (6 0.3588362068965517 . 0.23303009097270816)

The necessity of theory is over-determined by such theses as the material theory of induction¹ in philosophy of science, which exposes how empirically established background facts are necessary to license any inductive inference, and its formal, learning-theoretic equivalent, the no-free-lunch theorems.

NOTER_PAGE: (6 0.4816810344827586 . 0.5318404478656402)

The theory-free ideal

NOTER_PAGE: (6 0.5517241379310345 . 0.09937018894331699)

an even starker vision of objectivity: a science free from theory.

NOTER_PAGE: (6 0.6352370689655172 . 0.437368789363191)

it is the very fact that data are not raw, that they are, in a sense, ‘‘impure,’’ that makes them able to serve the meaningful epistemic role they do.

NOTER_PAGE: (6 0.6400862068965517 . 0.5857242827151854)

Datafication

it is not in spite of, but owing to, the theory-ladenness of data that empirical science garners us its epistemic results.

NOTER_PAGE: (6 0.665948275862069 . 0.7438768369489153)

The theory in theory-free ideal is to be understood as any prior commitment or conjecture to the nature of the target system

NOTER_PAGE: (6 0.7214439655172413 . 0.12176347095871237)

We are not denying that the researchers in the above-outlined instances achieved high classifier accuracy on their targets or that they were able to generalize to holdouts. We are also not denying that achieving high accuracy and limited generalization implicate the presence of robust patterns in the data. What we are questioning is the validity of interpretations lent to these patterns.

NOTER_PAGE: (7 0.17618534482758622 . 0.6067179846046186)

Theory and value neutrality in physiognomy

NOTER_PAGE: (7 0.25323275862068967 . 0.09097270818754373)

That robust patterns exist in natural data should come as no surprise. The world is structured. It contains regularity.

NOTER_PAGE: (7 0.27424568965517243 . 0.5227431770468859)

Part of the necessary and difficult work of science is distinguishing between imposed regularity and worldly regularity.

NOTER_PAGE: (7 0.42780172413793105 . 0.6249125262421273)

open-ended number of causal hypotheses that might be explanatory of observed worldly regularity.

NOTER_PAGE: (7 0.4709051724137931 . 0.6389083275017494)

there is always both measurement-imposed regularity and objective or worldly regularity. A plurality of hy- potheses are always available to explain both.

NOTER_PAGE: (7 0.49676724137931033 . 0.622113365990203)

what is being predicted is, straightforwardly, the judgment of the labeler. There is, therefore, no such thing as freedom from human bias

NOTER_PAGE: (7 0.5662715517241379 . 0.5458362491252624)

Potential rebuttal

NOTER_PAGE: (7 0.9089439655172413 . 0.09027291812456263)

the space of confounds that cannot, in principle, be ruled out is effectively open-ended.

NOTER_PAGE: (8 0.17510775862068964 . 0.1644506648005598)

the concept of criminal propensity is a human construct, and the data, to the extent that they encode this concept, do so only by virtue of having been shaped by human judgment of criminality. There is no objective signal of criminality that can be discovered independently of human judgment.

NOTER_PAGE: (8 0.17564655172413793 . 0.5584324702589223)

NEO-PHYSIOGNOMY: EXHIBIT OF EPISTEMIC FAILINGS

NOTER_PAGE: (8 0.22683189655172414 . 0.10216934919524143)

there remain an open-ended number of possible confounds.

NOTER_PAGE: (8 0.5953663793103449 . 0.6368089573128061)

plausible that individuals who have been convicted of a crime would show in facial posture the hallmarks of unhap- pier affect

NOTER_PAGE: (8 0.7079741379310345 . 0.5605318404478656)

features of attire and grooming reliably track so- cioeconomic status, which correlates heavily

NOTER_PAGE: (8 0.7505387931034483 . 0.6200139958012596)

certain chromosomal disorders carry a moderately increased risk of ASD. If a classifier is trained on images of individuals with chro- mosomal disorders labeled as autistic, it will learn whatever visu- ally perceptible phenotypic variations cluster with chromosomal disorders.

NOTER_PAGE: (8 0.7629310344827586 . 0.13925822253324002)

convicted criminals may also vary systematically by age.

NOTER_PAGE: (8 0.790948275862069 . 0.7893631910426872)

obvious confounds that are readily picked up by a clas- sifier are clearly present

NOTER_PAGE: (8 0.9186422413793104 . 0.16235129461161651)

The hypoth- esis that liberal-leaning individuals are biologically disposed to large foreheads, narrow chins, inward slanted brows, and small mouths appears to be the one favored by the authors. A much more parsimonious explanation holds that all of these features are indicative of a forward tilt of the head.

NOTER_PAGE: (9 0.4978448275862069 . 0.8166550034989503)

The reasoning here, however, is clearly seen to be circular: the gender-prototypical facial morphology is pre- defined by the same measures as utilized in the original sexuality classification task. ‘‘Deviation’’ therefrom in the case of homo- sexual facial morphology is hence a given,

NOTER_PAGE: (9 0.6099137931034483 . 0.2099370188943317)

While the college student participants were instructed to ensure that the ‘‘chin is at a 90-degree angle to [the] body,’’ it might be ventured that the average 19-year-old falls short of a perfectly calibrated proprioceptive sensibility of 90° chin-to-body angle.

NOTER_PAGE: (9 0.6756465517241379 . 0.7382785164450665)

The confounds, in the case of this study, turned out to be, in the first place, grooming choices: in particular, the wearing of glasses versus contacts, the presence of absence of makeup, and the presence or absence of facial hair. The most powerful signal, however, came from head tilt and angle from which the photo was taken.

NOTER_PAGE: (9 0.8448275862068966 . 0.27851644506648005)

THE VULNERABILITY OF ML TO PSEUDOSCIENCE

NOTER_PAGE: (10 0.13254310344827586 . 0.09937018894331699)

Both ML qua academic field and ML qua software engineering profession possess a culture that pushes to maximize output and quantitative gains at the cost of appropriate training and quality control.

NOTER_PAGE: (10 0.36853448275862066 . 0.6787963610916724)

Trained scientists possess an abundance of information con- cerning what is known and unknown in relation to their subject matter,

NOTER_PAGE: (10 0.49407327586206895 . 0.11406578026592022)

widely acknowledged that benchmarking is given undue import in the field of ML and, in many cases, is actively harmful in that it penalizes careful theo- rizing while rewarding kludgy or hardware-based solutions.

NOTER_PAGE: (10 0.6681034482758621 . 0.7445766270118964)

Increasingly, scholars and industry actors outsource the collection and labeling of their data to third parties. When—as we have argued—much of the theoretical com- mitments of a modeling exercise come in at the level of data collection and labeling, offloading these tasks can have damaging repercussions for the epistemic integrity of research.

NOTER_PAGE: (10 0.8329741379310345 . 0.6466060181945416)

ML has largely shrugged off the yoke of traditional peer-review mechanisms,

NOTER_PAGE: (10 0.9019396551724138 . 0.34359692092372285)

WHAT IS THE HARM IN FAILED INFERENCE?

NOTER_PAGE: (11 0.19504310344827586 . 0.08957312806158152)

Acting on model outputs is de facto causal inter- pretation.

NOTER_PAGE: (11 0.3286637931034483 . 0.13925822253324002)

DISCUSSION

NOTER_PAGE: (11 0.6567887931034483 . 0.5066480055983205)

Footnotes:

John D. Norton, “A Material Theory of Induction,” Philosophy of Science 70, no. 4 (2003): 647–70, https://doi.org/10.1086/378858.