Data Feminism

tags
Data

Notes

Introduction

why men with her credentials were placed in engineering positions, where they could be promoted through the ranks of the civil service, while women like herself were sent to the computing pools, where they languished until they retired or quit

NOTER_PAGE: (3 . 0.43270024772914945)

nobody’s ever complained

NOTER_PAGE: (3 . 0.5631709331131296)

data feminism: a way of thinking about data and its communication that is informed by direct experience, by a commitment to action, and by the ideas associated with intersectional feminist thought

NOTER_PAGE: (8 . 0.2981007431874484)

Champine knew to crunch the numbers only because Darden shared her personal experience of gender discrimination with her

NOTER_PAGE: (8 . 0.6597853014037984)

stats team brought together several datasets relating to property tax delinquency (an indicator of neglect), rat complaints (ditto), arrest locations (a proxy for poverty), and more, in order to rank the 25,000 complaints by fire risk

NOTER_PAGE: (10 . 0.5722543352601156)

inspectors issued five times more "vacate orders" than they had without the data- assisted ranking system

NOTER_PAGE: (10 . 0.6820809248554913)

Bring Back the Bodies

NOTER_PAGE: (15 . 0.17671345995045415)

“Our maternal data is embarrassing,”

NOTER_PAGE: (18 . 0.14368290668868702)

What we choose to measure is a statement of what we value

NOTER_PAGE: (18 . 0.21222130470685382)

Data science, as it is generally understood in the world today, has very little to do with bodies. But that is a fundamental misconception about the field, and about data more generally.

NOTER_PAGE: (18 . 0.8777869529314616)

individual experience, taken together, reveals a larger structural problem

NOTER_PAGE: (19 . 0.3988439306358381)

Data has been called “the new oil” for, among other things, its untapped potential for profit and its value once it’s processed and refined

NOTER_PAGE: (24 . 0.26754748142031376)
NOTER_PAGE: (28 . 0.6870355078447563)

how data is often presented as though it inhabits an omniscient, godlike perspectiv

NOTER_PAGE: (30 . 0.18001651527663087)

Haraway terms this “the view from nowhere.”

NOTER_PAGE: (30 . 0.28075970272502065)

narrowness of computationally conceived fairness

NOTER_PAGE: (32 . 0.5912469033856317)

We should be able to dream of data-driven systems that position co-liberation as their primary design goal

NOTER_PAGE: (33 . 0.7627245508982037)

Chapter Two: On Rational, Scientific, Objective Viewpoints from Mythical, Imaginary, Impossible Standpoints

NOTER_PAGE: (37 . 0.18488023952095808)

Periscopic's work is framed around a singular emotion

NOTER_PAGE: (40 . 0.18338323353293415)

deliberately neutral emotional field, a blank page in effect, upon which viewers are more free to choose their own response

NOTER_PAGE: (40 . 0.41242514970059885)

the "data-ink" ratio. In his view, a visualization designer should strive to use ink only to display the data

NOTER_PAGE: (40 . 0.75374251497006)

is visual minimalism really more neutral?

NOTER_PAGE: (41 . 0.4764657308009909)

theorists and practitioners have come from technical disciplines aligned with engineering and computer science, and have not been trained in the most fundamental of all Western communication theories: rhetoric

NOTER_PAGE: (41 . 0.5995045417010735)

"a rhetorical dimension is present in every design,"

NOTER_PAGE: (41 . 0.7861271676300577)

visualizing data involves editorial choices – some things are necessarily highlighted, while others are necessarily obscured

NOTER_PAGE: (42 . 0.1263418662262593)

"strong objectivity" which acknowledges that regular-grade, vanilla objectivity is mainly made by mostly rich white guys in power

NOTER_PAGE: (42 . 0.5243600330305532)

numbers that are accurate, but not technically facts

NOTER_PAGE: (43 . 0.5293146160198183)

Data Visceralization research group

NOTER_PAGE: (43 . 0.6738232865400495)

From a data analysis perspective, “A Sort of Joy” consists of simple operations: only counting and grouping. The results could easily have been represented by a bar chart or a tree map of first names.

NOTER_PAGE: (45 . 0.7341040462427745)

we do not see "the whole picture". We hear and see and experience each datapoint one at a time

NOTER_PAGE: (45 . 0.8364987613542526)

activating emotion, leveraging embodiment, and creating novel presentation forms help people learn more from data-driven arguments, and remember them more fully.

NOTER_PAGE: (46 . 0.5524360033030553)

four conventions of data visualization reinforce people's perceptions of its factual basis: 1) two-dimensional viewpoints, 2) clean layouts, 3) geometric shapes and lines, and 4) the inclusion of data sources at the bottom

NOTER_PAGE: (46 . 0.694467382328654)

feminist theory maintains that there is no such thing as a purely objective view of the world. Knowledge is always partial

NOTER_PAGE: (46 . 0.8166804293971923)

representing uncertainty is also a known problem in data journalism and visualization research

NOTER_PAGE: (47 . 0.15772089182493806)

visualization conventions reinforce those misjudgements

NOTER_PAGE: (48 . 0.2890173410404624)

role of emotion in interrupting those heuristics

NOTER_PAGE: (48 . 0.5210569777043765)

Hullman advocates for rendering experiences of uncertainty. In other words, leverage emotion and affect so that people experience uncertainty perceptually

NOTER_PAGE: (48 . 0.601981833195706)

The jittering election gauge was actually exhibiting current best practices for communicating uncertainty

NOTER_PAGE: (50 . 0.24607762180016515)

The fact that it unsettled so much of the Times readership probably had less to do with the ethics of the visualization and more to do with the outcome of the election

NOTER_PAGE: (50 . 0.2890173410404624)

"embellished" charts do not hinder people's ability to accurately read them, and in fact, they are actually better for memorability

NOTER_PAGE: (51 . 0.3534269199009083)

How did we arrive at conventions in data visualization that prioritize rationality, devalue emotion, and completely ignore the human body

NOTER_PAGE: (51 . 0.7093311312964492)

what is regarded as "excess" in any given system might possibly be the most interesting thing to explore because it tells us the most about what and who the system is trying to exclude.

NOTER_PAGE: (51 . 0.815028901734104)

Chapter Three: “What Gets Counted Counts”

NOTER_PAGE: (53 . 0.18909991742361684)

“data-driven” decisions are prioritized over anecdotal ones, and “evidence”–Fox News notwithstanding–is taken to mean “backed up by numbers and facts.”

NOTER_PAGE: (54 . 0.5491329479768786)

below the surface, Facebook continues to resolve users’ genders into one of either male or female

NOTER_PAGE: (56 . 0.2906688687035508)

most people believed that women were just inferior men, with penises located inside instead of outside of their bodies, and that– for reals!– could descend at any time in life.

NOTER_PAGE: (57 . 0.40132122213047067)

wat

there have always been more variations in gender identity than Anglo-Western societies have cared to outwardly acknowledge or collectively remember

NOTER_PAGE: (60 . 0.6515276630883567)

reliance on heuristics eventually leads to an accumulation of cognitive biases

NOTER_PAGE: (61 . 0.3137902559867878)

counting is not always an unmitigated good

NOTER_PAGE: (69 . 0.1593724194880264)

In O'odham tradition, however, the locations of burial sites constitute sacred knowledge, and cannot be shared with outsiders

NOTER_PAGE: (71 . 0.6176713459950454)

Chapter Four: Unicorns, Janitors, Ninjas, Wizards, and Rock Stars

NOTER_PAGE: (73 . 0.17010734929810073)

69% of no- fault evictions between 2011-13 occurred within four blocks of a tech bus stop

NOTER_PAGE: (77 . 0.1445086705202312)

The point of this map is not for the eyes to efficiently detect a correlation

NOTER_PAGE: (78 . 0.10074318744838975)

the visual point is simple and exhortative: "There are too many evictions"

NOTER_PAGE: (78 . 0.14616019818331957)

whose perspectives are lost in the process of dominating and disciplining data and whose perspectives are imposed on the results?

NOTER_PAGE: (78 . 0.8447563996696944)

In the perceived "messiness" of data there is actually rich information about the circumstances under which it was collected

NOTER_PAGE: (79 . 0.1791907514450867)

all data are "local," by which he means they are connected, sometimes inextricably, to the human and technical conditions under which they are collected and maintained

NOTER_PAGE: (79 . 0.4549958711808423)

Datafication

even though the outsider may be frustrated with the fact that the record doesn't use latitude and longitude, there is meaningful and precise geographic information contained in the "upstate" reference. Not only that, but there is meaningful metadata provided by this cultural insider reference: Only somebody collecting the data in South Carolina would have referred to that region as "upstate," so we can reason that the data was collected there

NOTER_PAGE: (79 . 0.630057803468208)

the production of "legible urban spaces." There is high economic value to legible spaces, particularly for large, international corporations

NOTER_PAGE: (80 . 0.14616019818331957)

one does not need street names for navigation until one has strangers in the landscape

NOTER_PAGE: (80 . 0.2254335260115607)

data does not need cleaning until there are strangers in the dataset

NOTER_PAGE: (80 . 0.24938067712634185)

negative externality of open data, APIs and the vast stores of training data sets available online: the data appear available and ready to mobilize, but what they represent is not always well-documented or easily understood by outsiders

NOTER_PAGE: (80 . 0.805945499587118)

The AEMP wanted to know more – about privacy protections and how the Eviction Lab would keep the data from falling into landlord hands. Instead of continuing the conversation, Eviction Lab turned to a real estate data broker and purchased data of lower quality

NOTER_PAGE: (81 . 0.3963666391412056)

embracing pluralism – as this concept is sometimes described – does not mean that everything is relative, nor that all truth claims have equal weight, nor that feminists don't believe in science. It simply means that when people make knowledge, they do so from a particular standpoint

NOTER_PAGE: (82 . 0.3955408753096614)

Applying this to computational systems design, Shaowen Bardzell calls for starting first and foremost with the perspective of the "marginal user."

NOTER_PAGE: (83 . 0.20231213872832368)

As Kim Tallbear says, "If we promiscuously account for standpoints, objectivity will be strengthened."

NOTER_PAGE: (83 . 0.42031379025598675)

as Kimmel articulates it: "privilege is invisible to those that have it."

NOTER_PAGE: (83 . 0.7522708505367465)

As whiteness scholar Robin DiAngelo says, "a significant aspect of white identity is to see oneself as an individual outside of or innocent of race, 'just human'."

NOTER_PAGE: (83 . 0.8166804293971923)

disclose your own project's methods – rather than sweeping them under the proverbial rug. This is called self-disclosure

NOTER_PAGE: (85 . 0.7101568951279934)

revealing other details about the human process of making decisions about data storytelling. Who was on the team? Which hypotheses were pursued but ultimately proved false? What were points of tension and disagreement? When did data need some ground-truthing by talking to data owners or domain experts?

NOTER_PAGE: (86 . 0.2890173410404624)

data appears so neutral because it is unclear who is the author

NOTER_PAGE: (86 . 0.4541701073492981)

Self-disclosure illustrates the feminist method of reflexivity—rigorous interrogation and transparency about one's own position in the world

NOTER_PAGE: (87 . 0.6540049545829892)

the choice to prioritize one idea over another would carry real weight and material consequences for the people of Boston, consequences that a natural language processing expert or a statistician could not understand simply by looking at word frequencies in the data

NOTER_PAGE: (93 . 0.6407927332782823)

as soon as data start to become information that can be operationalized for decision-making, they leave the technical domain

NOTER_PAGE: (94 . 0.12221304706853839)

Chapter Five: The Numbers Don’t Speak for Themselves

NOTER_PAGE: (96 . 0.17175887696118908)

even when you get past the marketing hype aimed at funders, the GDELT technical documentation is not quite forthright when it comes to whether it is counting media reports (as Simpson asserts) or events

NOTER_PAGE: (100 . 0.24690338563170933)

there's a larger problem at work here that has to do with context. One of the central tenets of feminist thinking, outlined by Donna Haraway, is that all knowledge is "situated."

NOTER_PAGE: (100 . 0.8166804293971923)

Rather than seeing knowledge artifacts – like datasets – as neutral and objective fodder to use for more knowledge making, a feminist perspective advocates for connecting them back to their context

NOTER_PAGE: (101 . 0.12303881090008258)

"Zombie data" is data that has been published without any purpose or clear use case in mind

NOTER_PAGE: (103 . 0.14368290668868702)

it is the corporation's responsibility to understand racism in page-linking. Correlation, without context, is not enough when it means that Google recirculates racism.

NOTER_PAGE: (103 . 0.7960363336085879)

"Institutions that have high numbers—it’s not always just that high incidents are happening. It’s that you’ve created a culture where people feel they can report and will be supported in that process."

NOTER_PAGE: (105 . 0.4616019818331957)

there are imbalances of power in the data setting, so we cannot take the numbers in the data set at face value

NOTER_PAGE: (107 . 0.2097440132122213)

assumption that data are a raw input rather than seeing them as artifacts that have emerged fully cooked into the world, birthed out of a complex set of social and political circumstances

NOTER_PAGE: (107 . 0.48885218827415355)

Datafication

But data is an output first

NOTER_PAGE: (107 . 0.6729975227085053)

in order to train an algorithm to understand the context of subjugated standpoints, significant human infrastructure and ethical navigation is required

NOTER_PAGE: (109 . 0.875309661436829)

Are you representing only the four numbers that we see in the chart? Or are you representing the context from which they emerged?

NOTER_PAGE: (111 . 0.7621800165152766)

I don't want to tell people what to think

NOTER_PAGE: (112 . 0.34516928158546656)

As the data journalist in this scenario, you are in a position of power

NOTER_PAGE: (112 . 0.3905862923203963)

you have a responsibility – precisely because of your position of privilege – to communicate both the data and the most accurate interpretation of the data. If you let the numbers speak for themselves, this is emphatically not more ethical and more democratic

NOTER_PAGE: (112 . 0.4549958711808423)

As applied to data science, an equity pause would involve questioning your research questions, questioning your categories and questioning your expectations, particularly as they relate to data about people

NOTER_PAGE: (113 . 0.5367464905037159)

time and space to research contemporary ideas about gender and mobile technology and incorporate them into your work

NOTER_PAGE: (113 . 0.7514450867052023)

advocate for data publishers to create a short, 3-5 page document that accompanies data sets

NOTER_PAGE: (114 . 0.2700247729149463)

Craveiro and her team created a tool to make this spending data more accessible to citizens by adding context to the presentation of the information

NOTER_PAGE: (114 . 0.518579686209744)

intermediaries who clean and contextualize the data for public use have potential (and have fewer conflicts of interest), but there would have to be a funding mechanism, significant capacity building, and professional norms-setting

NOTER_PAGE: (115 . 0.4913294797687861)

In fact, those of us who work with data must actively prevent numbers from speaking for themselves because when those numbers come from a data setting with a power imbalance or misaligned collection incentives (read: pretty much all data settings!), and especially when the numbers have to do with human beings, then they run the risk of being not only discriminatory, not only empirically wrong, but actually dangerous in their reinforcement of an unjust status quo.

NOTER_PAGE: (115 . 0.6267547481420314)

Chapter Six: Show Your Work

NOTER_PAGE: (118 . 0.18909991742361684)

invisible labor is what sustains the world of data science as well

NOTER_PAGE: (122 . 0.8241123038810899)

the invisible unpaid labor of our likes and tweets is precisely what enables the Facebooks and Twitters of the world to profit and thrive

NOTER_PAGE: (123 . 0.39801816680429397)

the work of data entry is profoundly undervalued in proportion to the knowledge it helps to create

NOTER_PAGE: (124 . 0.47729149463253506)

in 1781, the British slave ship, Zong, made a series of navigational errors while crossing the Atlantic, resulting in a shortage of drinking water for the 17 crew members and 133 captives on board. After performing a cost-benefit analysis, the crew decided to throw their enslaved human “cargo” overboard, calculating that they could collect enough insurance money on that loss of life to come out ahead

NOTER_PAGE: (126 . 0.45582163501238643)

When designing data products from feminist perspectives, we must aspire to show the work involved in the entire lifecycle of the project, even if it can be difficult to do

NOTER_PAGE: (129 . 0.35260115606936415)

darker areas of the chart don’t just indicate a larger number of books entered into the catalog, after all. They also indicate the people who typed in all of those records–millions and millions of them.

NOTER_PAGE: (131 . 0.28819157720891825)

Similarly, the step-like formations don’t just indicate a higher volume of data entry. They indicate strategic decisions

NOTER_PAGE: (131 . 0.33360858794384807)

As described by feminist sociologist Arlie Hochschild, emotional labor describes the work involved in managing one’s feelings, or someone else’s, in response to the demands of society or a particular job

NOTER_PAGE: (132 . 0.7762180016515277)
NOTER_PAGE: (132 . 0.8827415359207267)

the Atlas of Caregiving, an ongoing project aimed at documenting the work involved in caring for a chronically ill family member

NOTER_PAGE: (133 . 0.43270024772914945)

design her visualization, her goal was to “evoke empathy,” and make her audience “feel a part of a story of a human’s life.”

NOTER_PAGE: (135 . 0.5037159372419487)

as theorized by Folbre, care work is undertaken out of a sense of compassion with, or responsibility for others, rather than with a goal of monetary gain

NOTER_PAGE: (136 . 0.16845582163501238)

The Maintainers are trying to counter the current tendency to celebrate technological innovation and discovery. The work that should be celebrated, they argue, is the work that sustains and maintains the world we live in today; and not work that passes over the problems of the present in order to look ahead

NOTER_PAGE: (136 . 0.33278282411230387)

Chapter Seven: The Power Chapter

NOTER_PAGE: (137 . 0.16845582163501238)

Redlining began as a visual technique of red shading for all the neighborhoods in a city that were deemed "undesirable" for granting loans. All of Detroit's Black neighborhoods in 1940 fall in red areas on this map. Denying loans to Black residents set the stage for decades of structural racism and blight that was to follow.

NOTER_PAGE: (139 . 0.1866226259289843)

direct comparison between yesterday's redlining maps and today's risk assessment algorithms

NOTER_PAGE: (140 . 0.7167630057803468)

The values in evidence in redlining maps and risk assessment algorithms are about preserving a race- and class-based status quo. White, wealthy men working in powerful institutions adopt a focus on risk

NOTER_PAGE: (142 . 0.30718414533443433)

"Power concedes nothing without a demand."

NOTER_PAGE: (142 . 0.722543352601156)

Examining how power is wielded through data means doing projects that wield it back

NOTER_PAGE: (143 . 0.1568951279933939)

"it's not just about creating accurate algorithms but creating equitable systems," she says. We can't just build more precise surveillance apparatuses; we also need to look at the deployment, governance, use and impacts of these technologies

NOTER_PAGE: (146 . 0.6647398843930635)

data is by and large a tool of management, wielded by those institutions in power

NOTER_PAGE: (147 . 0.3129644921552436)

Joseph Weizenbaum, artificial intelligence trailblazer and creator of the famous ELIZA experiment in the 1960s, looked back on the history of computing and said it like this: "What the coming of the computer did, 'just in time,' was to make it unnecessary to create social inventions, to change the system in any way. So in that sense, the computer has acted as fundamentally a conservative force, a force which kept power or even solidified power where it already existed."

NOTER_PAGE: (147 . 0.35260115606936415)

data journalist Jonathan Stray asserts, "Quantification is representation."

NOTER_PAGE: (152 . 0.3542526837324525)

nobody needs to say "yellow banana" because it is implied by our shared concept of banana. This is called "reporting bias" in artificial intelligence research

NOTER_PAGE: (156 . 0.5995045417010735)

"Intersectional Media Equity Index," one could fairly easily quantify the collective privilege of an organization and then create a prediction score for just how likely that institution is to create racist, sexist data products

NOTER_PAGE: (157 . 0.20313790255986786)

report entitled Recommendations for Equitable Open Data

NOTER_PAGE: (158 . 0.36829066886870354)

recommendations for the City of Detroit to adopt to make their open data practices more equitable and more likely to benefit people of color and low-income communities

NOTER_PAGE: (158 . 0.5631709331131296)

blanket ethical logic is easy to code into large systems. But it's important to note that this approach was explicitly designed to exclude half of humanity

NOTER_PAGE: (159 . 0.6292320396366639)

a feminist ethics of care prioritizes responsibilities, issues in context, and, above all else, relationships

NOTER_PAGE: (159 . 0.7101568951279934)

rather than valuing impartiality, an ethics of care prioritizes intimacy and honors the deep, emotional, personal investment that comes with being responsible for the well-being of another

NOTER_PAGE: (160 . 0.14698596201486375)

accept that your privilege and power are not just an asset, but also a liability.

NOTER_PAGE: (160 . 0.35672997522708505)

reframe "doing good" with data as something more akin to "doing equity" or "doing co-liberation" with data to remove some of its paternalistic overtones

NOTER_PAGE: (160 . 0.5639966969446738)

If you have come here to help me, you are wasting your time. But if you have come because your liberation is bound up with mine, then let us work together.

NOTER_PAGE: (161 . 0.0990916597853014)

Following a logic of co-liberation leads to different metrics of success. The success of a single project would not only rest on whether the database was organized according to spec or whether the algorithm was able to classify things properly, but also on how much trust was built between institutions and communities, how effectively those with power and resources shared their power and resources, how much learning happened in both directions, how much the people and organizations were transformed in the process, and how much inspiration for future work, together, was co-conspired.

NOTER_PAGE: (161 . 0.14616019818331957)

Chapter Eight: Teach Data Like an Intersectional Feminist!

NOTER_PAGE: (165 . 0.3047068538398018)

Imagine teaching as a way to model the world

NOTER_PAGE: (172 . 0.5986787778695293)
Elite men lead.
NOTER_PAGE: (172 . 0.6193228736581338)
data science is abstract and technical
NOTER_PAGE: (172 . 0.6886870355078447)
the goal of learning data science is modeled as individual mastery of technical concepts and skills
NOTER_PAGE: (172 . 0.7274979355904211)

Becoming socialized into the CS109 model of the world means that one sets aside any concern with the social and political, with justice and fairness, with values and motivations

NOTER_PAGE: (172 . 0.8323699421965317)

Feels a lot like law school

feminist pedagogy of bell hooks draws from Freire to assert that if learning is to be a practice of freedom then it must be a two-way street – a process of mutual transformation

NOTER_PAGE: (173 . 0.49710982658959535)

equityXdesign framework that we discussed in The Numbers Don't Speak for Themselves, retools IDEO's human-centered design process with an explicit focus on oppression and deliberately centers equity as core value

NOTER_PAGE: (174 . 0.6465730800990916)

contact as the first step in fighting oppression

NOTER_PAGE: (176 . 0.2865400495458299)

there can be no "data for good" and no "ethical AI" without contact, relationship-building and trust-building between systems designers and the people with the least power in the system.

NOTER_PAGE: (176 . 0.37902559867877783)

Conclusion: Now Let’s Multiply

NOTER_PAGE: (194 . 0.1593724194880264)

data science version of the “refusal of legibility,” to borrow Jack Halberstam’s term, that characterizes much of queer life.

NOTER_PAGE: (196 . 0.4194880264244426)