Dispatches · The 37th Chamber

How to read your AI

A confident bluffer is more dangerous than an obvious liar. Three concrete grading moves you can run on any AI output in under two minutes — before you ship it to a client, a regulator, or yourself.

An AI gave you something. A summary, an email, a chunk of code, a research brief. It looks clean. Maybe even elegant. The temptation is to act on it.

Don’t. Not yet.

The thing about modern language models is they speak the same language whether they’re right or wrong. Confidence is a vibe, not a signal. The same paragraph that nailed a tricky derivation can, a sentence later, invent a citation that doesn’t exist — in the same calm voice. That’s not a flaw to be patched out next quarter; it’s the medium itself. If you’re going to use these tools, the work of reading the output is part of using them. It’s not optional.

Here’s the operating principle:

You don’t use what you can’t see into. If the output can’t be inspected — if you can’t check it without re-reading it through your own brain — it isn’t ready to ship.
The glass-box rule.

Two minutes. Three moves. You’ll catch most of the trouble.

The three moves

01Independent re-derivation of one claim

Pick a specific fact. Find it yourself, in one click, from a primary source.

Not all of the claims — one. The one a real reader would push back on. If the AI says “Sennrich et al. introduced BPE to NLP in 2016”, open a tab, search the title, read the abstract. If the AI says “the law in Texas requires X”, open the statute. If it says “the patient’s LDL was 132”, open the chart.

If the one you randomly tested holds, that’s a positive signal — but not clearance. Hallucinations cluster: a model that invented one thing is a model whose calibration slipped, and calibration failures don’t appear in isolation. If your tested claim doesn’t hold — if the citation is fabricated, the statute is misnamed, the number is wrong — treat the whole output as untrusted and start over. One bad claim in confident prose is a warning about the pour, not a stray drop: it tells you the model was generating beyond what it reliably knew.

02Read the code, not just the comment

For code: ignore the explanation. Read the body.

AI-written code comes with a story: “This function does X by doing Y.” The story is often correct and pleasant. But the model writes it without re-reading or verifying the code it just produced — in standard inference there is no checking step between them, and even when a model is asked to self-review, that loop is not visible to you. The two can drift. The story can describe behavior the code doesn’t implement, or paper over a subtle bug the code has.

The move is simple: cover the comments and explanations and read what the code actually does, line by line. Trace one input through it by hand. If it does what the story says, fine. If it doesn’t, you’ve just caught the drift — before production caught it for you.

03Check the meta-claims about what the AI did

When the AI tells you what it did, verify the doing — not the report.

“I ran the tests and they pass.” “I searched the database and found three matches.” “I read the file and updated section 3.” These are claims about actions — and an AI describing actions is just as prone to confident error as an AI describing facts. Maybe more.

If the AI says it ran tests, look for the test output. If it says it searched, look at the search results. If it says it edited a file, open the file and look at the diff. The minute and a half this costs is the cheapest insurance you’ll buy that day. And if you cannot find the artifact at all — no output, no log, no diff — treat the action as unverified, not done.

The honest limit

These three moves do not catch everything. They catch most of the trouble — the confident hallucination, the drifted comment, the falsely reported action — in under two minutes per artifact, for most work. In regulated domains — legal, medical, financial, anything touching compliance — two minutes is the floor, not the ceiling; the moves above are still the right moves, just the beginning of a longer audit. That’s the honest limit. If the work matters more than two minutes’ worth of trouble, the moves pay for themselves immediately. If it doesn’t matter that much, you’re probably not reading this dispatch.

The point isn’t suspicion. The point is partnership. AI is a tool that’s vastly more useful when you can see into it — and a tool you should not use, on anything that matters, until you’ve built the habit of looking. The work of looking is the work. There’s no shortcut, but there’s also no excuse for not doing it.

If you’re using AI on anything regulated, anything published, anything that costs someone money or reputation, you don’t need a vendor; you need a discipline. The three moves above are the floor. Everything else is upholstery.

Filed from the 37th Chamber · The Woodlands, TX · 2026.06.06
More dispatches → | Work with me →

The worked example’s real paper — Sennrich, Haddow & Birch, ACL 2016 · On hallucination as inherent — Xu et al., 2024 (arXiv:2401.11817)