Machine Learning

tags
Predictive Analytics


The idea is, given some information-processing problem(s), to find solutions that would be too complex or detailed for human beings to develop directly. In practice, it is rarely clear what anyone means when they say “machine learning”. Most likely they're trying to sell you something.

Critique

(Sculley_Pasanek_2008_Meaning and mining.pdf): ML assumptions, interpretation, and humanities. Includes an interesting case study in evolving interpretations.
Temporality is a problem (see e.g. Hildebrandt): Target functions are trained on historical data and assumed not to change over time
Forget privacy: you're terrible at targeting anyway - apenwarr: ML is actually not that good at most of what it's used for, esp recommender systems
AI and the American Smile. How AI misrepresents culture through a… | by jenka | Mar, 2023 | Medium
“Money laundering for bias”

Cool applications

FunSearch: Making new discoveries in mathematical sciences using Large Language Models - Google DeepMind
Exa: Embeddings-based web search
Perplexity
VecFusion: Vector Font Generation with Diffusion
GitHub - samim23/polymath: Convert any music library into a music production sample-library with ML
{2102.07492} DOBF: A Deobfuscation Pre-Training Objective for Programming Languages
Runway: ML-powered creative computing
AI Art Machine by Hillel Wayne
Otter.ai: Automatic meeting notes
Gen AI landscape October 2022
@thesephist on fine-tuning GPT-3 on his own writing
Multi
Vector databases: Chroma, pgvector, Pinecone, OP
Marqo | Tensor-based Search and Analytics engine
Fermat: Generative infinite canvas

Tools

GitHub - hwchase17/langchain: ⚡ Building applications with LLMs through composability ⚡
Sequence labelling: doccano - doccano, brat rapid annotation tool, INCEpTION

Explanation and auditing

Black-Box Access is Insufficient for Rigorous AI Audits1
Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR
Counterfactual explanations might not be interpreted the way you'd hope2
Area under the curve is probably not a good metric despite being widely used3
CFPB Issues Guidance on Credit Denials by Lenders Using Artificial Intelligence

Fairness

Models display an alarming amount of "arbitrariness" (i.e. models trained on different subsets of the same data often disagree with each other), raising questions about the value of common fairness metrics for individual models4

Guides

What Are Embeddings?
Thread by @omarsar0, In case you missed them, here is a thread (in no special…
Introduction to Deep Learning
fast.ai · Making neural nets uncool again
Situated AI - Google Docs
Methods of prompt programming :: — Moire
Prompt Engineering Guide (DAIR)
SVM works better than kNN: randomfun/knn_vs_svm.ipynb at master · karpathy/randomfun · GitHub
Alpaca Eval Leaderboard

Footnotes:

1

Stephen Casper et al., “Black-Box Access Is Insufficient for Rigorous AI Audits,” January 25, 2024, https://doi.org/10.48550/arXiv.2401.14446.

2

Yaniv Yacoby et al., “‘If It Didn’t Happen, Why Would I Change My Decision?’: How Judges Respond to Counterfactual Explanations for the Public Safety Assessment,” Proceedings of the Aaai Conference on Human Computation and Crowdsourcing 10 (October 14, 2022): 219–30, https://doi.org/10.1609/hcomp.v10i1.22001.

3

Kweku Kwegyir-Aggrey et al., “The Misuse of AUC: What High Impact Risk Assessment Gets Wrong,” May 29, 2023, https://doi.org/10.48550/arXiv.2305.18159.

4

A. Feder Cooper et al., “Is My Prediction Arbitrary? Measuring Self-Consistency in Fair Classification,” May 31, 2023, https://doi.org/10.48550/arXiv.2301.11562.