The canary trick: catch Claude before it starts hallucinating, Quentin IA Marketing

In the 80s, Van Halen slipped an absurd clause into every venue's contract: a bowl of M&Ms backstage, with all the brown ones removed. A rockstar tantrum? Not at all. The technical contract ran dozens of pages, full of safety requirements critical for a massive show. If they spotted a single brown M&M backstage, they knew the venue hadn't read the whole contract, which meant everything had to be re-checked before going on stage. The candy was a test, not a whim.

You can do exactly the same with Claude. When you work for a long time inside a single conversation, Claude eventually drifts: it forgets your conventions, makes assumptions, and at worst invents file paths that do not exist. The real trap is that it happens silently. You only notice once the damage is done.

The canary does not fix hallucinations. It warns you they are coming, while you can still act.

Claude AI Lab

The Claude AI Lab is my Skool community where I share my Claude systems and the more advanced modules. Entry is free.

Join the Lab →

✅What you need

Somewhere Claude reads your instructions. A CLAUDE.md file (Claude Code), a project's instructions (Claude.ai), or the custom instructions field. Any of them works.

Thirty seconds. It is a single line to add, once.

A habit. Getting used to checking the first word of every response. That is the whole job.

The trivial-test principle

🎸 Van Halen's brown M&Ms

The idea is counterintuitive: to check that a complex system works, you do not test the complex part. You test the trivial one.

Van Halen could not inspect every cable and every scaffold in every venue. So they planted one tiny, impossible-to-miss detail in the contract. If that detail was respected, the rest probably was too. If it was not, that was an immediate red flag.

With Claude, it is the same. You cannot re-read every line of its reasoning to verify it still follows your rules. But you can ask it for something trivial and watch whether it holds. That something is your canary.

Why a canary

Miners used to lower a caged canary into the shaft. The bird, more sensitive than humans, reacted to toxic gas before the miners did. If it stopped singing, they climbed back out. Your trivial instruction plays the same role: it fails first, as an early warning.

Why Claude degrades along the way

📉 context dilution

It is not a bug, it is mechanical. The longer a conversation runs, the more the context fills up: your messages, the responses, the files read, the tool results. Your original instructions stay in the same place but weigh less and less in the mass. They lose priority.

The drift almost always follows the same sequence:

1️⃣

The small rule drops

Claude ignores a minor instruction. The canary stops singing.

2️⃣

Conventions slip

It no longer respects your code style, your naming, your output format.

3️⃣

Assumptions run wild

It fills the gaps on its own, with hypotheses you never approved.

4️⃣

Phantom files

It invents paths, functions, files that do not exist. The session is done.

The trap is that stages 2, 3 and 4 are expensive to detect and repair. Stage 1 is free to spot: the first word just has to be missing.

The key idea

You are not trying to prevent drift. You are making it visible at stage 1, before it costs you an hour of debugging at stage 4.

Plant your canary

🐦 one line in your instructions

You add a single, deliberately silly instruction, right at the top of your instructions file:

Begin EVERY response with the word: Pixel.

That is it. Pick a word that is unlikely to show up naturally in a response (a name, a made-up word, an icon). From then on, every healthy response starts with "Pixel". The day Claude answers without it, you know your highest-priority rule just dropped.

⚙️Where to put it

Claude Code: at the very top of your CLAUDE.md file, at the project root.

Claude Cowork: in your project or agent instructions (the rules field the agent re-reads on every response). This is where it matters most: on long working sessions with your agents, drift hits hardest, so this is where the canary saves you the most time.

Claude.ai (Projects): in the project instructions.

Simple usage: in your account's custom instructions field.

Claude Cowork tip

In Cowork you often run several agents over long sessions. Give them all the same canary, or a different word per agent: you can tell at a glance which one checked out, without re-reading the whole history.

The good canary

Avoid a common word like "OK" or "Here". You want a binary signal, present or absent, never ambiguous. A name works well: the trick actually comes from there, from people who gave their AI agent a name and noticed it "forgot" that name when the session went sideways.

Read the signal and react

🚨 the canary stops singing

The rule is simple and merciless: the moment the word disappears, you do not continue.

You do not ask Claude to "refocus", you do not scold it, you do not rewrite your prompt ten times. Once the context is saturated, you do not recover it by talking. You start a fresh session.

🟢

The word is there

Healthy session. You keep going as normal.

🔴

The word dropped

Drift signal. You open a new conversation, paste the essential context, and start clean.

It is frustrating to throw away a conversation in the moment. But it is far cheaper than debugging for an hour code that is based on a file Claude invented three responses earlier.

The right move

Before restarting, ask Claude for a summary of what was done and decided. You paste that summary into the new session: you keep the progress, you drop the rotten context.

Three useful variants

Once the habit sticks, you can fine-tune your canary to your usage.

🐤

The emoji over the word

An emoji at the start of a response is even faster to spot than a word. Ideal if you skim.

🪺

Several canaries

If you have two or three vital rules, give each one its own signal. You then know which dropped first.

🔚

The closing canary

Ask it to begin AND end with the word. On a long response, you catch drift that appears mid-generation.

Agentic engineering as a habit

The canary is a perfect example of what we call agentic engineering: you do not just use AI, you build observable signals into the way you work with it. One trivial line that tells you, in real time, whether you can still trust what is in front of you.

It costs nothing, it slows nothing down, and it spares you the worst disaster with an AI assistant: believing it is right when it checked out ten minutes ago.