Machine Learning | Will O'Pedia

tags: Predictive Analytics

The idea is, given some information-processing problem(s), to find solutions that would be too complex or detailed for human beings to develop directly. In practice, it is rarely clear what anyone means when they say “machine learning”. Most likely they're trying to sell you something.

Critique

Could AI slow science? ML-based science, even if it increases productivity, won't necessarily lead to progress
None of 62 ML-based COVID-19 screening methods examined in a review were found to be clinically useful¹
Reproducibility seems to be a problem in ML-based science²
Even if average-case performance is quite good, models with weak worst-case performance can be vulnerable to adversarial attack,³ even after specific defensive effort.⁴
(Sculley_Pasanek_2008_Meaning and mining.pdf): ML assumptions, interpretation, and humanities. Includes an interesting case study in evolving interpretations.
Temporality is a problem (see e.g. Hildebrandt): Target functions are trained on historical data and assumed not to change over time
Forget privacy: you're terrible at targeting anyway - apenwarr: ML is actually not that good at most of what it's used for, esp recommender systems
AI and the American Smile. How AI misrepresents culture through a…
“Money laundering for bias”
Driven Out By AI: On automated driver account deactivations at Uber

Cool applications

AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization): Apparently pretty good
aider: Terminal-based coding assistant
FunSearch: Making new discoveries in mathematical sciences using Large Language Models - Google DeepMind
Exa: Embeddings-based web search
Perplexity
VecFusion: Vector Font Generation with Diffusion
GitHub - samim23/polymath: Convert any music library into a music production sample-library with ML
{2102.07492} DOBF: A Deobfuscation Pre-Training Objective for Programming Languages
Runway: ML-powered creative computing
AI Art Machine by Hillel Wayne
Otter.ai: Automatic meeting notes
Gen AI landscape October 2022
@thesephist on fine-tuning GPT-3 on his own writing
Multi
Vector databases: Milvus, Chroma, pgvector, Pinecone (+ OpenAI = OP stack)
Marqo | Tensor-based Search and Analytics engine
Fermat: Generative infinite canvas

Tools

GitHub - hwchase17/langchain: ⚡ Building applications with LLMs through composability ⚡
Sequence labelling: doccano - doccano, brat rapid annotation tool, INCEpTION

Explanation and auditing

Trying to get explanations out of models optimized for prediction is a bad idea⁵
Black-Box Access is Insufficient for Rigorous AI Audits⁶
Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR
Counterfactual explanations might not be interpreted the way you'd hope⁷
Area under the curve is probably not a good metric despite being widely used⁸
CFPB Issues Guidance on Credit Denials by Lenders Using Artificial Intelligence

Fairness

Models display an alarming amount of "arbitrariness" (i.e. models trained on different subsets of the same data often disagree with each other), raising questions about the value of common fairness metrics for individual models⁹

Guides

What Are Embeddings?
Thread by @omarsar0, In case you missed them, here is a thread (in no special…
Introduction to Deep Learning
fast.ai · Making neural nets uncool again
Situated AI - Google Docs
Methods of prompt programming :: — Moire
Prompt Engineering Guide (DAIR)
SVM works better than kNN: randomfun/knn_vs_svm.ipynb at master · karpathy/randomfun · GitHub
Alpaca Eval Leaderboard

Footnotes:

Michael Roberts et al., “Common Pitfalls and Recommendations for Using Machine Learning to Detect and Prognosticate for Covid-19 Using Chest Radiographs and Ct Scans,” Nature Machine Intelligence 3, no. 3 (2021): 199–217, https://doi.org/10.1038/s42256-021-00307-0.

Sayash Kapoor and Arvind Narayanan, “Leakage and the Reproducibility Crisis in Machine-Learning-Based Science,” Patterns 4, no. 9 (2023): 100804, https://doi.org/10.1016/j.patter.2023.100804.

Tony T. Wang et al., “Adversarial Policies Beat Superhuman Go Ais,” prepublished July 13, 2023, https://doi.org/10.48550/arXiv.2211.00241.

⁴

Tom Tseng et al., “Can Go Ais Be Adversarially Robust?,” prepublished June 18, 2024, https://doi.org/10.48550/arXiv.2406.12843.

⁵

Marco Del Giudice, “The Prediction-Explanation Fallacy: A Pervasive Problem in Scientific Applications of Machine Learning,” Methodology: European Journal of Research Methods for the Behavioral and Social Sciences (Denmark) 20, no. 1 (2024): 22–46, https://doi.org/10.5964/meth.11235.

⁶

Stephen Casper et al., “Black-Box Access Is Insufficient for Rigorous Ai Audits,” prepublished January 25, 2024, https://doi.org/10.48550/arXiv.2401.14446.

⁷

Yaniv Yacoby et al., “‘If It Didn’t Happen, Why Would I Change My Decision?’: How Judges Respond to Counterfactual Explanations for the Public Safety Assessment,” Proceedings of the Aaai Conference on Human Computation and Crowdsourcing 10 (October 2022): 219–30, https://doi.org/10.1609/hcomp.v10i1.22001.

⁸

Kweku Kwegyir-Aggrey et al., “The Misuse of Auc: What High Impact Risk Assessment Gets Wrong,” prepublished May 29, 2023, https://doi.org/10.48550/arXiv.2305.18159.

⁹

A. Feder Cooper et al., “Is My Prediction Arbitrary? Measuring Self-Consistency in Fair Classification,” prepublished May 31, 2023, https://doi.org/10.48550/arXiv.2301.11562.