The Trolley Problem, or How We Pretend to Think About Ethics

You've seen this image before — or some version of it. A trolley barrels toward five people tied to the tracks. You stand at a switch. Pull the lever, and the trolley diverts to a side track where only one person lies. Do nothing, and five die. Act, and one dies by your hand.

The trolley problem was introduced by philosopher Philippa Foot in 1967 and later refined by Judith Jarvis Thomson. It was never meant to be a puzzle with a correct answer. It was meant to expose the fault lines between two ways of thinking about right and wrong.

The question isn't what you'd do. The question is why you'd do it — and whether your reasons survive the next variation.

Two frameworks, one lever

The utilitarian says: pull the lever. Five lives saved outweigh one life lost. The math is simple. Maximise well-being, minimise suffering. One is less than five. Done.

The deontologist says: wait. By pulling the lever, you're actively choosing to kill someone. There's a moral difference between allowing harm to happen and causing it. You are not the trolley. You didn't tie anyone to the tracks. But the moment you pull that lever, the death on the side track is yours.

Most people, when asked casually, say they'd pull the lever. The interesting part comes with the variations.

The fat man on the bridge

Thomson's variation: there's no lever. Instead, you're standing on a bridge above the tracks. Next to you is a very large man. If you push him onto the tracks, his body will stop the trolley, saving the five. He will die. The math is identical — one death to prevent five — but suddenly almost everyone refuses.

Why? The numbers haven't changed. Only the mechanism has. Pulling a lever feels abstract. Pushing a person off a bridge feels like murder. Our moral intuitions, it turns out, aren't doing arithmetic. They're doing something closer to pattern matching — and the patterns care about physical contact, directness, and intent in ways that pure consequentialism doesn't.

We don't reason our way to moral judgments. We feel them first, then construct reasons afterward — like a lawyer defending a client who's already decided to plead innocent.

Why it matters now

The trolley problem used to be a classroom exercise. Then self-driving cars happened. Suddenly the thought experiment became an engineering specification. If a car's brakes fail, should the algorithm swerve into a wall (killing the passenger) to avoid a crowd of pedestrians? Who decides? Who's liable?

The same tension shows up in AI alignment. When a model has to choose between conflicting values — privacy vs. safety, individual benefit vs. collective harm, honesty vs. kindness — it's running a trolley problem at scale, thousands of times per second, without ever touching a lever.

And here's the uncomfortable part: we're asking machines to be more consistent about these tradeoffs than we've ever been ourselves.

Here's a rough simulation — same math, wildly different answers depending on directness:

import random

def would_divert(lives_saved, directness):
    """Directness penalty: people resist hands-on harm."""
    return (lives_saved - directness * 3.0) > random.gauss(0.5, 1.2)

for name, args in [("lever", (4, 0.0)), ("bridge push", (4, 1.0))]:
    rate = sum(would_divert(*args) for _ in range(1000)) / 1000
    print(f"{name}: {rate:.0%} would divert")

The real lesson

The trolley problem doesn't teach us what to do. It teaches us that we don't know why we do what we do. Our moral intuitions are powerful but inconsistent. They evolved for small-scale social life — face-to-face interactions, tribal dynamics, immediate consequences. They weren't designed for lever-pulling at a distance, algorithmic triage, or policy decisions affecting millions.

The trolley is always already moving. The only question is whether you'll admit you're standing at the lever.