Scientists: Why Predictive AI Will Never Succeed

暗涌Waves·November 4, 2024

Trapped in a counterfeit drug system.

By Muxin Xu

In the late 19th century, a health tonic called snake oil appeared in the United States, claiming to cure all ailments and extend life. It became wildly popular. This so-called miracle drug had a localized version in China too. The film The Piano in a Factory includes a line describing the antics of counterfeit drug peddlers: "Two pounds of luguo biscuits mixed with one paracetamol tablet, stuffed into broken capsule shells. Won't fill you up, won't kill you either." Today, two Silicon Valley computer scientists, Arvind Narayanan and Sayash Kapoor, argue that the field of artificial intelligence is flooded with snake oil of the same variety. And this isn't just a Silicon Valley problem — after ChatGPT ignited the hype, numerous domestic companies swiftly pivoted to brand themselves as "AI companies," when in reality their AI components were minuscule, sometimes nothing more than interns filling in. This doesn't just harm investor returns; just as fake medicine finds its way into ordinary households, this AI snake oil affects all users, even trapping them inside these "counterfeit systems."

In their book AI Snake Oil: What Artificial Intelligence Can Do, What It Can't, and How to Tell the Difference, the two authors studied fifty AI applications and found snake oil running rampant in predictive AI — applications already riddled with flaws that, by their very operating logic, will never achieve what they claim.

AI Snake Oil: What Artificial Intelligence Can Do, What It Can't, and How to Tell the Difference, Princeton University Press

Predictive AI differs from generative AI, which encompasses text generation (exemplified by ChatGPT), text-to-image, and text-to-video. Predictive AI spans numerous domains touching everyday life: law, medicine, finance, education — AI-powered hiring, AI interviews, AI assessment of insurance claims, and more.

The following is an excerpt from Waves —

1. In the United States, roughly three-quarters of employers use AI tools for recruiting, including AI resume screening and AI video interviews. After job seekers discovered this, they deployed a series of countermeasures. Applicants could sprinkle impressive keywords into their resumes — "Harvard graduate," "ten years of experience," "led a hundred-person team" — rendered in white text invisible to human eyes but legible to computers. Investigative journalists found that in AI video interviews, simply wearing a scarf or glasses could dramatically alter one's AI score. Other effective tactics included placing a bookshelf in the background, dimming the video, or merely converting a resume from PDF to plain text.

2. In summer 2022, Toronto deployed an AI tool to predict bacterial levels at public beaches, using the results to decide when to open or close them. The tool proved wrong 64% of the time — meaning you had a six-in-ten chance of swimming in bacteria-laden waters. When the government responded, its strategy was: the prediction tool is merely supplementary; human supervisors make the final call. But journalists discovered that supervisors never once overruled the AI's decisions.

3. In the United States, adults over 65 qualify for federally subsidized health insurance. To cut costs, insurers began using AI to predict how long patients would need hospitalization. The logic seemed sound: without such a system, hospitals would theoretically want patients to stay as long as possible to maximize revenue. But in one case, a 75-year-old woman was assessed as ready for discharge in 17 days. Though she still couldn't walk independently, she was pushed out of the hospital based on the AI's evaluation.

4. Allstate wanted to raise its insurance rates, so it used AI to calculate how many customers would tolerate premium hikes without switching providers. The result was an AI-generated "sucker list" — predominantly people over 62, because elderly customers were less likely to shop around.

5. Pennsylvania once adopted a "family screening tool" to predict which children faced abuse risk. If results indicated a child was likely being abused, social workers could remove the child and place them in foster care. The tool's flaw: its dataset drew from public welfare records, excluding those with private insurance. In short, models built on this data could never make predictions about affluent families.

6. Datasets are the core of predictive AI. But as sample noise increases, the number of samples needed to build accurate models grows exponentially. Social datasets are extremely noisy. The underlying patterns of social phenomena aren't fixed; they vary enormously across different environments, times, and places. A pattern successfully identified in one time and place becomes worthless with even a slight parameter shift.

7. The authors previously launched a challenge: use roughly ten thousand sociological data points per child to predict whether their academic performance would improve. It failed spectacularly. In post-mortem analysis, they discovered that much data directly relevant to academic outcomes was impossible to capture in datasets. For instance, a child's sudden grade improvement might stem from a neighbor giving them blueberries and helping with homework — these extra-familial influences matter too.

8. So why does predictive AI exist at all? One major reason is our profound aversion to randomness. Psychological experiments have repeatedly demonstrated this — we even fantasize that we can predict things that are in fact random. But using AI for prediction only pushes us further from the future we actually want. After all, few would welcome a future where predictions have extremely limited success rates while systematically discriminating against the vulnerable.

References:

[1] Arvind Narayanan, Sayash Kapoor, AI Snake Oil: What Artificial Intelligence Can Do, What It Can't, and How to Tell the Difference

Image source: Still from The Piano in a Factory