Transformers

tags
Machine Learning

The transformer is an architecture for machine learning systems that handles NLP tasks especially well. It incorporates an "attention" mechanism that helps with handling long, complex sequences.

Introduced in Attention is All You Need, 2017

Train a transformer on a lot (maybe 1B words) of data and you get something your marketing department can call a "Large Language Model" (LLM), or a "foundation model" if you work for Stanford. BERT and GPT are, as of this writing, the most widely recognized examples.

Tools

LLM: A CLI utility and Python library for interacting with Large Language Models
llamafile is the new best way to run a LLM on your own computer
LlamaIndex 🦙
mlc-llm
GitHub - jmorganca/ollama: Get up and running with Llama 2 and other large language models locally
GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs
Willow (DIY Alexa-ish)

Evaluation

Many widely-used LLM benchmarks are of dubious usefulness1
It's still easy to come up with simple reasoning tasks that even the largest LLMs choke on2
Legal research tools from large, well-funded vendors give incorrect or incomplete answers around 25% of the time3
GPT-4 and GPT-4V still don't have robust abstraction abilities4
Strengths and weaknesses of transformers make more sense when you remember they are trying to predict the likeliest next token, e.g. worse at predicting rare sequences even in deterministic contexts5
GPT-3.5e answers to StackOverflow questions contain errors more than half the time, but programmers often do not notice6
Transformers seem to do compositional reasoning by reducing it to linearized subgraph matching7

Confabulation

Transformer systems asked to generate text will basically produce what they determine to be the most likely continuation. Sometimes this produces statements that align with factual reality, but this is coincidence. People call the resulting tendency to make things up "hallucination", or "confabulation", or "hallucitation", or more prosaically "bullshit."8
There's some criticism of the "hallucination" label on the grounds that (1) it credits computers with mental process, and (2) the mental process in question is actually related to processing sensory input, not producing output, so the label doesn't even make sense to begin with.
Some work toward evaluating factual consistency9
“Model collapse”: Training transformers on transformer output produces irreversible defects10

Explanations

Transformer systems are typically pretty much black boxes, and explanation of their outputs is an open problem. Combined with the tendency toward confabulation, this is pretty bad.
Can't just ask for an explanation, because they'll make stuff up (convincingly!): Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Alignment

It would be nice if we could align a transformer's behaviour with some set of normative values. How exactly to do this and what the values should be to begin with remain open problems.
I suspect values are actually quite a bit harder to formalize than language.
The Waluigi Effect: Strongly-represented normative values are very easy to invert

Security

BEAST: Automated adversarial jailbreaking and membership testing11

Prompt injection

Because transformers operate on undifferentiated streams of text, separating commands from input is difficult (impossible?). AFAIK there are no reliable, robust defences against prompt injection.
Implications: Do not feed transformers untrusted input and be extremely careful about feeding them sensitive information (because they could be manipulated into coughing it up later).
Invisible Indirect Injection: A Puzzle for ChatGPT

Resources

How do Transformers work? - Hugging Face NLP Course
What Is ChatGPT Doing … and Why Does It Work?—Stephen Wolfram Writings
sannykim/transformers: A collection of resources to study Transformers in depth.
GitHub - f/awesome-chatgpt-prompts

Footnotes:

1

Jon Keegan, “Everyone Is Judging AI by These Tests. But Experts Say They’re Close to Meaningless,” The Markup, July 17, 2024, https://themarkup.org/artificial-intelligence/2024/07/17/everyone-is-judging-ai-by-these-tests-but-experts-say-theyre-close-to-meaningless.

2

Marianna Nezhurina et al., “Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models” (arXiv.org, June 4, 2024), https://arxiv.org/abs/2406.02061v2.

3

Varun Magesh et al., “Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools,” May 30, 2024, https://doi.org/10.48550/arXiv.2405.20362.

4

Melanie Mitchell, Alessandro B. Palmarini, and Arseny Moskvichev, “Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks,” December 11, 2023, https://doi.org/10.48550/arXiv.2311.09247.

5

R. Thomas McCoy et al., “Embers of Autoregression: Understanding Large Language Models Through the Problem They Are Trained to Solve,” September 24, 2023, https://doi.org/10.48550/arXiv.2309.13638.

6

Samia Kabir et al., “Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions,” in Proceedings of the CHI Conference on Human Factors in Computing Systems, CHI ’24 (New York, NY, USA: Association for Computing Machinery, 2024), 1–17, https://doi.org/10.1145/3613904.3642596.

7

Nouha Dziri et al., “Faith and Fate: Limits of Transformers on Compositionality,” October 31, 2023, https://doi.org/10.48550/arXiv.2305.18654.

8

Arvind Narayanan and Sayash Kapoor, “ChatGPT Is a Bullshit Generator. But It Can Still Be Amazingly Useful,” Substack newsletter (AI Snake Oil, December 6, 2022), https://aisnakeoil.substack.com/p/chatgpt-is-a-bullshit-generator-but.

9

Jing Fan, Dennis Aumiller, and Michael Gertz, “Evaluating Factual Consistency of Texts with Semantic Role Labeling,” May 22, 2023, https://doi.org/10.48550/arXiv.2305.13309.

10

Ilia Shumailov et al., “The Curse of Recursion: Training on Generated Data Makes Models Forget,” May 31, 2023, https://doi.org/10.48550/arXiv.2305.17493.

11

Vinu Sankar Sadasivan et al., “Fast Adversarial Attacks on Language Models In One GPU Minute,” February 23, 2024, https://doi.org/10.48550/arXiv.2402.15570.