AI Is a Lot of Work

tags
Ghost Work

Notes

Remotasks is the worker-facing subsidiary of a company called Scale AI, a multibillion-dollar Silicon Valley data vendor that counts OpenAI and the U.S. military among its customers. Neither Remotasks’ or Scale’s website mentions the other.

NOTER_PAGE: (1 0.7891420261754727 . 0.10602258469259723)

work stripped of all its normal trappings: a schedule, colleagues, knowledge of what they were working on or whom they were working for.

NOTER_PAGE: (2 0.16383906931652933 . 0.2672521957340025)

a sense among engineers that it’s a passing, inconvenient prerequisite to the more glamorous work of building models.

NOTER_PAGE: (2 0.6243334949103247 . 0.6254705144291091)

But annotation is never really finished. Machine-learning systems are what researchers call “brittle,” prone to fail when encountering something that isn’t well represented in their training data.

NOTER_PAGE: (2 0.6951042171594766 . 0.22710163111668757)

$1.20 an hour

NOTER_PAGE: (3 0.48860882210373247 . 0.2534504391468005)

This tangled supply chain is deliberately hard to map.

NOTER_PAGE: (4 0.24042656325739215 . 0.060225846925972396)

companies buying the data demand strict confidentiality.

NOTER_PAGE: (4 0.2447891420261755 . 0.39962358845671264)

Annotation reveals too much about the systems being developed, and the huge number of workers required makes leaks difficult to prevent.

NOTER_PAGE: (4 0.290838584585555 . 0.191969887076537)

leads work to be broken up and distributed across a system of specialized algorithms and to equally specialized humans.

NOTER_PAGE: (4 0.5671352399418323 . 0.16938519447929734)

“AI doesn’t replace work,” he said. “But it does change how work is organized.”

NOTER_PAGE: (4 0.6621425109064469 . 0.5169385194479297)

When AI comes for your job, you may not lose it, but it might become more alien, more isolating, more tedious.

NOTER_PAGE: (5 0.10615608337372759 . 0.28481806775407775)

Right away, it threw an ontological curveball: a picture of a magazine depicting photos of women in dresses. Is a photograph of clothing real clothing?

NOTER_PAGE: (6 0.44886088221037324 . 0.40087829360100374)

the instructions I’d been struggling to follow had been updated and clarified so many times that they were now a full 43 printed pages of directives:

NOTER_PAGE: (6 0.6282113427047988 . 0.178168130489335)

Where a human would get the concept of “shirt” with a few examples, machine-learning programs need thousands,

NOTER_PAGE: (6 0.782355792535143 . 0.17942283563362607)

Who bears the cost for these fluctuations?” said Jindal of Partnership on AI. “Because right now, it’s the workers.”

NOTER_PAGE: (8 0.7721764420746486 . 0.4071518193224592)

I made somebody a billionaire and I’m earning a couple of bucks a week.”

NOTER_PAGE: (8 0.8288899660688318 . 0.3400250941028858)

companies located vaguely elsewhere,

NOTER_PAGE: (9 0.2956858943286476 . 0.5395232120451694)

fully automated post-work future.

NOTER_PAGE: (9 0.4493456131846825 . 0.05708908406524466)

Certain types of specialist annotation can go for $50 or more per hour.

NOTER_PAGE: (9 0.7377605428986913 . 0.38644918444165616)

a Slack room of 1,500 people who were training a project code-named Dolphin, which she later discovered to be Google DeepMind’s chatbot, Sparrow, one of the many bots competing with ChatGPT. Her job is to talk with it all day.

NOTER_PAGE: (9 0.8162869607367911 . 0.835633626097867)

This circuitous technique is called “reinforcement learning from human feedback,” or RLHF,

NOTER_PAGE: (11 0.19049927290353855 . 0.06085319949811794)

The model is still a text-prediction machine mimicking patterns in human writing, but now its training corpus has been supplemented with bespoke examples, and the model has been weighted to favor them.

NOTER_PAGE: (11 0.27096461463887545 . 0.24027603513174403)

mimicking the confident style and expert jargon of the accurate text while writing things that are totally wrong.

NOTER_PAGE: (11 0.3436742607852642 . 0.8343789209535758)

researchers found they agreed only 60 percent of the time that a summary was good. “Unlike many tasks in [machine learning] our queries do not have unambiguous ground truth,”

NOTER_PAGE: (11 0.6223945710130878 . 0.42534504391468003)

Until recently, it was relatively easy to spot bad output from a language model. It looked like gibberish. But this gets harder as the models get better — a problem called “scalable oversight.”

NOTER_PAGE: (12 0.5797382452738731 . 0.059598494353826845)

This trajectory means annotation increasingly requires specific skills and expertise.

NOTER_PAGE: (12 0.6771691711100339 . 0.45106649937264737)

30 percent of the labels were wrong.

NOTER_PAGE: (14 0.30925836160930686 . 0.20639899623588456)

GPT-4-trained models may be learning to mimic GPT’s authoritative style with even less accuracy,

NOTER_PAGE: (14 0.7503635482307319 . 0.12923462986198242)

so far, when improvements in AI have made one form of annotation obsolete, demand for other, more sophisticated types of labeling has gone up.

NOTER_PAGE: (14 0.778477944740669 . 0.17314930991217062)

predicted AI labs will soon be spending as many billions of dollars on human data as they do on computing power;

NOTER_PAGE: (14 0.8259815802229763 . 0.35194479297365117)

he believes the path forward will involve AI systems helping humans oversee other AI.

NOTER_PAGE: (15 0.05719825496849249 . 0.4127979924717691)

Machine-learning systems are just too strange ever to fully trust.

NOTER_PAGE: (15 0.2845370819195347 . 0.14868255959849433)

“The companies shift from one region to another,” Joe said. “They don’t have infrastructure locally, so it makes them flexible to shift to regions that favor them in terms of operation cost.”

NOTER_PAGE: (15 0.49539505574406206 . 0.42597239648682556)

Another Kenyan annotator said that after his account got suspended for mysterious reasons, he decided to stop playing by the rules. Now, he runs multiple accounts in multiple countries, tasking wherever the pay is best. He works fast and gets high marks for quality, he said, thanks to ChatGPT. The bot is wonderful, he said, letting him speed through $10 tasks in a matter of minutes. When we spoke, he was having it rate another chatbot’s responses according to seven different criteria, one AI training the other.

NOTER_PAGE: (16 0.6160930683470675 . 0.060225846925972396)