- tags
- Data
Notes
Introduction
NOTER_PAGE: (3 . 0.43270024772914945)
nobody’s ever complained
NOTER_PAGE: (3 . 0.5631709331131296)
NOTER_PAGE: (8 . 0.2981007431874484)
Champine knew to crunch the numbers only because Darden shared her personal experience of gender discrimination with her
NOTER_PAGE: (8 . 0.6597853014037984)
stats team brought together several datasets relating to property tax delinquency (an indicator of neglect), rat complaints (ditto), arrest locations (a proxy for poverty), and more, in order to rank the 25,000 complaints by fire risk
NOTER_PAGE: (10 . 0.5722543352601156)
inspectors issued five times more "vacate orders" than they had without the data- assisted ranking system
NOTER_PAGE: (10 . 0.6820809248554913)
Bring Back the Bodies
NOTER_PAGE: (15 . 0.17671345995045415)
“Our maternal data is embarrassing,”
NOTER_PAGE: (18 . 0.14368290668868702)
What we choose to measure is a statement of what we value
NOTER_PAGE: (18 . 0.21222130470685382)
Data science, as it is generally understood in the world today, has very little to do with bodies. But that is a fundamental misconception about the field, and about data more generally.
NOTER_PAGE: (18 . 0.8777869529314616)
individual experience, taken together, reveals a larger structural problem
NOTER_PAGE: (19 . 0.3988439306358381)
Data has been called “the new oil” for, among other things, its untapped potential for profit and its value once it’s processed and refined
NOTER_PAGE: (24 . 0.26754748142031376)
Searching the millions of catalog entries for “black” yielded a rich array of objects related to Black people, Black culture, and Black history in the US : the civil rights movement, the jazz era, the history of enslavement, and so on. But searching for “white” yielded only white-colored visual art
NOTER_PAGE: (28 . 0.6870355078447563)
how data is often presented as though it inhabits an omniscient, godlike perspectiv
NOTER_PAGE: (30 . 0.18001651527663087)
Haraway terms this “the view from nowhere.”
NOTER_PAGE: (30 . 0.28075970272502065)
narrowness of computationally conceived fairness
NOTER_PAGE: (32 . 0.5912469033856317)
We should be able to dream of data-driven systems that position co-liberation as their primary design goal
NOTER_PAGE: (33 . 0.7627245508982037)
Chapter Two: On Rational, Scientific, Objective Viewpoints from Mythical, Imaginary, Impossible Standpoints
NOTER_PAGE: (37 . 0.18488023952095808)
Periscopic's work is framed around a singular emotion
NOTER_PAGE: (40 . 0.18338323353293415)
deliberately neutral emotional field, a blank page in effect, upon which viewers are more free to choose their own response
NOTER_PAGE: (40 . 0.41242514970059885)
the "data-ink" ratio. In his view, a visualization designer should strive to use ink only to display the data
NOTER_PAGE: (40 . 0.75374251497006)
is visual minimalism really more neutral?
NOTER_PAGE: (41 . 0.4764657308009909)
theorists and practitioners have come from technical disciplines aligned with engineering and computer science, and have not been trained in the most fundamental of all Western communication theories: rhetoric
NOTER_PAGE: (41 . 0.5995045417010735)
"a rhetorical dimension is present in every design,"
NOTER_PAGE: (41 . 0.7861271676300577)
visualizing data involves editorial choices – some things are necessarily highlighted, while others are necessarily obscured
NOTER_PAGE: (42 . 0.1263418662262593)
"strong objectivity" which acknowledges that regular-grade, vanilla objectivity is mainly made by mostly rich white guys in power
NOTER_PAGE: (42 . 0.5243600330305532)
numbers that are accurate, but not technically facts
NOTER_PAGE: (43 . 0.5293146160198183)
NOTER_PAGE: (43 . 0.6738232865400495)
From a data analysis perspective, “A Sort of Joy” consists of simple operations: only counting and grouping. The results could easily have been represented by a bar chart or a tree map of first names.
NOTER_PAGE: (45 . 0.7341040462427745)
we do not see "the whole picture". We hear and see and experience each datapoint one at a time
NOTER_PAGE: (45 . 0.8364987613542526)
NOTER_PAGE: (46 . 0.5524360033030553)
four conventions of data visualization reinforce people's perceptions of its factual basis: 1) two-dimensional viewpoints, 2) clean layouts, 3) geometric shapes and lines, and 4) the inclusion of data sources at the bottom
NOTER_PAGE: (46 . 0.694467382328654)
feminist theory maintains that there is no such thing as a purely objective view of the world. Knowledge is always partial
NOTER_PAGE: (46 . 0.8166804293971923)
representing uncertainty is also a known problem in data journalism and visualization research
NOTER_PAGE: (47 . 0.15772089182493806)
visualization conventions reinforce those misjudgements
NOTER_PAGE: (48 . 0.2890173410404624)
role of emotion in interrupting those heuristics
NOTER_PAGE: (48 . 0.5210569777043765)
Hullman advocates for rendering experiences of uncertainty. In other words, leverage emotion and affect so that people experience uncertainty perceptually
NOTER_PAGE: (48 . 0.601981833195706)
The jittering election gauge was actually exhibiting current best practices for communicating uncertainty
NOTER_PAGE: (50 . 0.24607762180016515)
The fact that it unsettled so much of the Times readership probably had less to do with the ethics of the visualization and more to do with the outcome of the election
NOTER_PAGE: (50 . 0.2890173410404624)
"embellished" charts do not hinder people's ability to accurately read them, and in fact, they are actually better for memorability
NOTER_PAGE: (51 . 0.3534269199009083)
How did we arrive at conventions in data visualization that prioritize rationality, devalue emotion, and completely ignore the human body
NOTER_PAGE: (51 . 0.7093311312964492)
what is regarded as "excess" in any given system might possibly be the most interesting thing to explore because it tells us the most about what and who the system is trying to exclude.
NOTER_PAGE: (51 . 0.815028901734104)
Chapter Three: “What Gets Counted Counts”
NOTER_PAGE: (53 . 0.18909991742361684)
“data-driven” decisions are prioritized over anecdotal ones, and “evidence”–Fox News notwithstanding–is taken to mean “backed up by numbers and facts.”
NOTER_PAGE: (54 . 0.5491329479768786)
below the surface, Facebook continues to resolve users’ genders into one of either male or female
NOTER_PAGE: (56 . 0.2906688687035508)
most people believed that women were just inferior men, with penises located inside instead of outside of their bodies, and that– for reals!– could descend at any time in life.
NOTER_PAGE: (57 . 0.40132122213047067)
wat
there have always been more variations in gender identity than Anglo-Western societies have cared to outwardly acknowledge or collectively remember
NOTER_PAGE: (60 . 0.6515276630883567)
reliance on heuristics eventually leads to an accumulation of cognitive biases
NOTER_PAGE: (61 . 0.3137902559867878)
counting is not always an unmitigated good
NOTER_PAGE: (69 . 0.1593724194880264)
In O'odham tradition, however, the locations of burial sites constitute sacred knowledge, and cannot be shared with outsiders
NOTER_PAGE: (71 . 0.6176713459950454)
Chapter Four: Unicorns, Janitors, Ninjas, Wizards, and Rock Stars
NOTER_PAGE: (73 . 0.17010734929810073)
69% of no- fault evictions between 2011-13 occurred within four blocks of a tech bus stop
NOTER_PAGE: (77 . 0.1445086705202312)
The point of this map is not for the eyes to efficiently detect a correlation
NOTER_PAGE: (78 . 0.10074318744838975)
the visual point is simple and exhortative: "There are too many evictions"
NOTER_PAGE: (78 . 0.14616019818331957)
whose perspectives are lost in the process of dominating and disciplining data and whose perspectives are imposed on the results?
NOTER_PAGE: (78 . 0.8447563996696944)
NOTER_PAGE: (79 . 0.1791907514450867)
all data are "local," by which he means they are connected, sometimes inextricably, to the human and technical conditions under which they are collected and maintained
NOTER_PAGE: (79 . 0.4549958711808423)
Datafication
even though the outsider may be frustrated with the fact that the record doesn't use latitude and longitude, there is meaningful and precise geographic information contained in the "upstate" reference. Not only that, but there is meaningful metadata provided by this cultural insider reference: Only somebody collecting the data in South Carolina would have referred to that region as "upstate," so we can reason that the data was collected there
NOTER_PAGE: (79 . 0.630057803468208)
the production of "legible urban spaces." There is high economic value to legible spaces, particularly for large, international corporations
NOTER_PAGE: (80 . 0.14616019818331957)
one does not need street names for navigation until one has strangers in the landscape
NOTER_PAGE: (80 . 0.2254335260115607)
data does not need cleaning until there are strangers in the dataset
NOTER_PAGE: (80 . 0.24938067712634185)
negative externality of open data, APIs and the vast stores of training data sets available online: the data appear available and ready to mobilize, but what they represent is not always well-documented or easily understood by outsiders
NOTER_PAGE: (80 . 0.805945499587118)
The AEMP wanted to know more – about privacy protections and how the Eviction Lab would keep the data from falling into landlord hands. Instead of continuing the conversation, Eviction Lab turned to a real estate data broker and purchased data of lower quality
NOTER_PAGE: (81 . 0.3963666391412056)
embracing pluralism – as this concept is sometimes described – does not mean that everything is relative, nor that all truth claims have equal weight, nor that feminists don't believe in science. It simply means that when people make knowledge, they do so from a particular standpoint
NOTER_PAGE: (82 . 0.3955408753096614)
Applying this to computational systems design, Shaowen Bardzell calls for starting first and foremost with the perspective of the "marginal user."
NOTER_PAGE: (83 . 0.20231213872832368)
As Kim Tallbear says, "If we promiscuously account for standpoints, objectivity will be strengthened."
NOTER_PAGE: (83 . 0.42031379025598675)
as Kimmel articulates it: "privilege is invisible to those that have it."
NOTER_PAGE: (83 . 0.7522708505367465)
As whiteness scholar Robin DiAngelo says, "a significant aspect of white identity is to see oneself as an individual outside of or innocent of race, 'just human'."
NOTER_PAGE: (83 . 0.8166804293971923)
disclose your own project's methods – rather than sweeping them under the proverbial rug. This is called self-disclosure
NOTER_PAGE: (85 . 0.7101568951279934)
revealing other details about the human process of making decisions about data storytelling. Who was on the team? Which hypotheses were pursued but ultimately proved false? What were points of tension and disagreement? When did data need some ground-truthing by talking to data owners or domain experts?
NOTER_PAGE: (86 . 0.2890173410404624)
data appears so neutral because it is unclear who is the author
NOTER_PAGE: (86 . 0.4541701073492981)
Self-disclosure illustrates the feminist method of reflexivity—rigorous interrogation and transparency about one's own position in the world
NOTER_PAGE: (87 . 0.6540049545829892)
the choice to prioritize one idea over another would carry real weight and material consequences for the people of Boston, consequences that a natural language processing expert or a statistician could not understand simply by looking at word frequencies in the data
NOTER_PAGE: (93 . 0.6407927332782823)
as soon as data start to become information that can be operationalized for decision-making, they leave the technical domain
NOTER_PAGE: (94 . 0.12221304706853839)
Chapter Five: The Numbers Don’t Speak for Themselves
NOTER_PAGE: (96 . 0.17175887696118908)
NOTER_PAGE: (100 . 0.24690338563170933)
there's a larger problem at work here that has to do with context. One of the central tenets of feminist thinking, outlined by Donna Haraway, is that all knowledge is "situated."
NOTER_PAGE: (100 . 0.8166804293971923)
Rather than seeing knowledge artifacts – like datasets – as neutral and objective fodder to use for more knowledge making, a feminist perspective advocates for connecting them back to their context
NOTER_PAGE: (101 . 0.12303881090008258)
"Zombie data" is data that has been published without any purpose or clear use case in mind
NOTER_PAGE: (103 . 0.14368290668868702)
it is the corporation's responsibility to understand racism in page-linking. Correlation, without context, is not enough when it means that Google recirculates racism.
NOTER_PAGE: (103 . 0.7960363336085879)
"Institutions that have high numbers—it’s not always just that high incidents are happening. It’s that you’ve created a culture where people feel they can report and will be supported in that process."
NOTER_PAGE: (105 . 0.4616019818331957)
there are imbalances of power in the data setting, so we cannot take the numbers in the data set at face value
NOTER_PAGE: (107 . 0.2097440132122213)
NOTER_PAGE: (107 . 0.48885218827415355)
Datafication
But data is an output first
NOTER_PAGE: (107 . 0.6729975227085053)
in order to train an algorithm to understand the context of subjugated standpoints, significant human infrastructure and ethical navigation is required
NOTER_PAGE: (109 . 0.875309661436829)
Are you representing only the four numbers that we see in the chart? Or are you representing the context from which they emerged?
NOTER_PAGE: (111 . 0.7621800165152766)
I don't want to tell people what to think
NOTER_PAGE: (112 . 0.34516928158546656)
As the data journalist in this scenario, you are in a position of power
NOTER_PAGE: (112 . 0.3905862923203963)
you have a responsibility – precisely because of your position of privilege – to communicate both the data and the most accurate interpretation of the data. If you let the numbers speak for themselves, this is emphatically not more ethical and more democratic
NOTER_PAGE: (112 . 0.4549958711808423)
As applied to data science, an equity pause would involve questioning your research questions, questioning your categories and questioning your expectations, particularly as they relate to data about people
NOTER_PAGE: (113 . 0.5367464905037159)
time and space to research contemporary ideas about gender and mobile technology and incorporate them into your work
NOTER_PAGE: (113 . 0.7514450867052023)
advocate for data publishers to create a short, 3-5 page document that accompanies data sets
NOTER_PAGE: (114 . 0.2700247729149463)
Craveiro and her team created a tool to make this spending data more accessible to citizens by adding context to the presentation of the information
NOTER_PAGE: (114 . 0.518579686209744)
intermediaries who clean and contextualize the data for public use have potential (and have fewer conflicts of interest), but there would have to be a funding mechanism, significant capacity building, and professional norms-setting
NOTER_PAGE: (115 . 0.4913294797687861)
In fact, those of us who work with data must actively prevent numbers from speaking for themselves because when those numbers come from a data setting with a power imbalance or misaligned collection incentives (read: pretty much all data settings!), and especially when the numbers have to do with human beings, then they run the risk of being not only discriminatory, not only empirically wrong, but actually dangerous in their reinforcement of an unjust status quo.
NOTER_PAGE: (115 . 0.6267547481420314)
Chapter Six: Show Your Work
NOTER_PAGE: (118 . 0.18909991742361684)
invisible labor is what sustains the world of data science as well
NOTER_PAGE: (122 . 0.8241123038810899)
the invisible unpaid labor of our likes and tweets is precisely what enables the Facebooks and Twitters of the world to profit and thrive
NOTER_PAGE: (123 . 0.39801816680429397)
the work of data entry is profoundly undervalued in proportion to the knowledge it helps to create
NOTER_PAGE: (124 . 0.47729149463253506)
NOTER_PAGE: (126 . 0.45582163501238643)
When designing data products from feminist perspectives, we must aspire to show the work involved in the entire lifecycle of the project, even if it can be difficult to do
NOTER_PAGE: (129 . 0.35260115606936415)
darker areas of the chart don’t just indicate a larger number of books entered into the catalog, after all. They also indicate the people who typed in all of those records–millions and millions of them.
NOTER_PAGE: (131 . 0.28819157720891825)
Similarly, the step-like formations don’t just indicate a higher volume of data entry. They indicate strategic decisions
NOTER_PAGE: (131 . 0.33360858794384807)
As described by feminist sociologist Arlie Hochschild, emotional labor describes the work involved in managing one’s feelings, or someone else’s, in response to the demands of society or a particular job
NOTER_PAGE: (132 . 0.7762180016515277)
NOTER_PAGE: (132 . 0.8827415359207267)
the Atlas of Caregiving, an ongoing project aimed at documenting the work involved in caring for a chronically ill family member
NOTER_PAGE: (133 . 0.43270024772914945)
design her visualization, her goal was to “evoke empathy,” and make her audience “feel a part of a story of a human’s life.”
NOTER_PAGE: (135 . 0.5037159372419487)
as theorized by Folbre, care work is undertaken out of a sense of compassion with, or responsibility for others, rather than with a goal of monetary gain
NOTER_PAGE: (136 . 0.16845582163501238)
The Maintainers are trying to counter the current tendency to celebrate technological innovation and discovery. The work that should be celebrated, they argue, is the work that sustains and maintains the world we live in today; and not work that passes over the problems of the present in order to look ahead
NOTER_PAGE: (136 . 0.33278282411230387)
Chapter Seven: The Power Chapter
NOTER_PAGE: (137 . 0.16845582163501238)
Redlining began as a visual technique of red shading for all the neighborhoods in a city that were deemed "undesirable" for granting loans. All of Detroit's Black neighborhoods in 1940 fall in red areas on this map. Denying loans to Black residents set the stage for decades of structural racism and blight that was to follow.
NOTER_PAGE: (139 . 0.1866226259289843)
direct comparison between yesterday's redlining maps and today's risk assessment algorithms
NOTER_PAGE: (140 . 0.7167630057803468)
The values in evidence in redlining maps and risk assessment algorithms are about preserving a race- and class-based status quo. White, wealthy men working in powerful institutions adopt a focus on risk
NOTER_PAGE: (142 . 0.30718414533443433)
"Power concedes nothing without a demand."
NOTER_PAGE: (142 . 0.722543352601156)
Examining how power is wielded through data means doing projects that wield it back
NOTER_PAGE: (143 . 0.1568951279933939)
"it's not just about creating accurate algorithms but creating equitable systems," she says. We can't just build more precise surveillance apparatuses; we also need to look at the deployment, governance, use and impacts of these technologies
NOTER_PAGE: (146 . 0.6647398843930635)
NOTER_PAGE: (147 . 0.3129644921552436)
Joseph Weizenbaum, artificial intelligence trailblazer and creator of the famous ELIZA experiment in the 1960s, looked back on the history of computing and said it like this: "What the coming of the computer did, 'just in time,' was to make it unnecessary to create social inventions, to change the system in any way. So in that sense, the computer has acted as fundamentally a conservative force, a force which kept power or even solidified power where it already existed."
NOTER_PAGE: (147 . 0.35260115606936415)
data journalist Jonathan Stray asserts, "Quantification is representation."
NOTER_PAGE: (152 . 0.3542526837324525)
nobody needs to say "yellow banana" because it is implied by our shared concept of banana. This is called "reporting bias" in artificial intelligence research
NOTER_PAGE: (156 . 0.5995045417010735)
NOTER_PAGE: (157 . 0.20313790255986786)
NOTER_PAGE: (158 . 0.36829066886870354)
recommendations for the City of Detroit to adopt to make their open data practices more equitable and more likely to benefit people of color and low-income communities
NOTER_PAGE: (158 . 0.5631709331131296)
blanket ethical logic is easy to code into large systems. But it's important to note that this approach was explicitly designed to exclude half of humanity
NOTER_PAGE: (159 . 0.6292320396366639)
a feminist ethics of care prioritizes responsibilities, issues in context, and, above all else, relationships
NOTER_PAGE: (159 . 0.7101568951279934)
rather than valuing impartiality, an ethics of care prioritizes intimacy and honors the deep, emotional, personal investment that comes with being responsible for the well-being of another
NOTER_PAGE: (160 . 0.14698596201486375)
accept that your privilege and power are not just an asset, but also a liability.
NOTER_PAGE: (160 . 0.35672997522708505)
reframe "doing good" with data as something more akin to "doing equity" or "doing co-liberation" with data to remove some of its paternalistic overtones
NOTER_PAGE: (160 . 0.5639966969446738)
If you have come here to help me, you are wasting your time. But if you have come because your liberation is bound up with mine, then let us work together.
NOTER_PAGE: (161 . 0.0990916597853014)
NOTER_PAGE: (161 . 0.14616019818331957)
Chapter Eight: Teach Data Like an Intersectional Feminist!
NOTER_PAGE: (165 . 0.3047068538398018)
Imagine teaching as a way to model the world
NOTER_PAGE: (172 . 0.5986787778695293)
Elite men lead.
NOTER_PAGE: (172 . 0.6193228736581338)
data science is abstract and technical
NOTER_PAGE: (172 . 0.6886870355078447)
the goal of learning data science is modeled as individual mastery of technical concepts and skills
NOTER_PAGE: (172 . 0.7274979355904211)
Becoming socialized into the CS109 model of the world means that one sets aside any concern with the social and political, with justice and fairness, with values and motivations
NOTER_PAGE: (172 . 0.8323699421965317)
Feels a lot like law school
NOTER_PAGE: (173 . 0.49710982658959535)
NOTER_PAGE: (174 . 0.6465730800990916)
NOTER_PAGE: (176 . 0.2865400495458299)
NOTER_PAGE: (176 . 0.37902559867877783)
Conclusion: Now Let’s Multiply
NOTER_PAGE: (194 . 0.1593724194880264)
data science version of the “refusal of legibility,” to borrow Jack Halberstam’s term, that characterizes much of queer life.
NOTER_PAGE: (196 . 0.4194880264244426)
Notes that link to this note