RC RANDOM CHAOS

Our incident report was a vendor blog post

Anthropic's invisible-guardrails apology misses the point: production agents need output contracts, audit ledgers, and sentinel checks, not model-default trust.

· 6 min read

Between 2026-05-28 and 2026-06-09, schema-valid output from the foundry content pipeline fell from 99.2% to 71.4%. No deploy on our side. Same pinned model string in every config. Anthropic’s apology this week gave the cause a name: “invisible guardrails,” server-side behavior layers on claude-fable-5 that changed without a changelog entry. The apology is fine as far as it goes. The framing is not. If a silent upstream change can take your pipeline from 99% to 71% and you learn about it from a vendor blog post, the bug is in your architecture. The vendor just published your incident report for you.

foundry runs six properties on Claude Code and the Anthropic SDK: two blogs, two IG/FB accounts, two newsletter products, all driven by skills, claude -p, and APScheduler. Three deployments hit this. The failure signature was identical in all three, and the deciding variable was never the model. It was whether the pipeline had sentinel checks.

PropertyPipelineContract checkTime to detect
blog Aclaude -p batch render12 days
IG captionsSDK + APScheduler✗ (manual review)4 days
newsletterSDK digest job✓ (added 06-02)41 minutes

Same model, same dates, same upstream change. 12 days versus 41 minutes. That delta is the entire post.

What drift looked like

The failure modes, before the explanation:

  • JSON that parsed but violated the implicit contract: word counts collapsing from ~1,500 to ~600, required keys present but empty
  • Hedging boilerplate appended to otherwise valid completions (“As an AI, I should mention…”); our banned-phrase scanner tracks 47 patterns, and the hit rate went from 0.3% of posts to 19%
  • Markdown fences wrapping JSON that had come back clean for six consecutive weeks
  • Caption length on one property drifting from a 92-word median to 340 with zero prompt changes
  • Truncation under identical max_tokens, completions ending mid-sentence at roughly 60% of typical length

Note the direction is inconsistent. Two properties drifted toward more constrained output: shorter, hedged, refusal-adjacent. One drifted unconstrained: longer, looser, banned phrases back in circulation. “Invisible guardrails” implies a single dial somebody turned. From the consumer side it looked like a distribution shift, full stop. Which direction your pipeline drifts is luck. Whether you notice is architecture.

What we were looking for vs what we found

We were not hunting guardrails. On 06-01 an editor flagged IG captions running long. I assumed a template regression, diffed the prompt files, found nothing. git log on the skill directory: no changes since 05-19. The prompt was identical; the output was not. That is the moment the mental model should break, and it took me an embarrassing 40 minutes to accept it.

Next question: when did this start? For the newsletter property, the audit ledger answered in one query - contract pass rate by hour, clean step function at 2026-05-28 14:07 UTC. For blog A there was no ledger to query. We reconstructed the timeline by re-running the contract checker against 12 days of published posts pulled back out of the CMS. 63 posts violated the contract we thought we had. They were live the whole time.

That asymmetry was the actual finding. The model change was the trigger; the inability to see it was the incident.

Root cause

foundry/pipeline/render.py:138, pre-incident:

resp = client.messages.create(
 model=MODEL_ID, # pinned, for all the good it did
 max_tokens=4096,
 messages=[{"role": "user", "content": prompt}],
)
post = json.loads(resp.content[0].text)
publish(post) # straight to the CMS

The bad state was reachable because the happy path was the only path we coded. json.loads was the entire validation story. The mental model behind that line: pinned model ID plus stable prompt equals stable output distribution. Wrong three ways:

  1. The model string pins weights, not the serving stack. Safety layers, formatting post-processors, and sampling infrastructure sit outside the thing your config names.
  2. Even pinned weights give you a distribution, not a function. We coded a function call.
  3. The contract you care about - word count, schema, tone, banned tokens - is never the contract the API gives you. The API promises bytes back. Everything above bytes is yours to enforce.

Treating Claude as an unbounded API - call it, trust it, publish it - works until the first silent upstream change. Which means it does not work.

The fix

Commit b41e7a9: an explicit OutputContract at the publish boundary, fail closed, every verdict written to the ledger.

CONTRACT = OutputContract(
 min_words=1200,
 max_words=2400,
 required_keys=("title", "content", "meta_description", "tags"),
 banned=BANNED_RE, # 47 phrases, compiled once
 max_hedge_ratio=0.02, # hedge tokens / total tokens
)

verdict = CONTRACT.enforce(payload)
ledger.write(job_id, model=resp.model, verdict=verdict.as_row())
if not verdict.ok:
 raise ContractViolation(verdict)

Violations trigger one retry with the specific failures inlined into the prompt (“output was 640 words; minimum is 1200”), then quarantine. Nothing auto-publishes on a failed check.

Commit c7d02e1 added the sentinel: a nightly job comparing the trailing 24-hour pass rate against a 7-day baseline, per property. Deviation past 2σ in either direction sends a page. The either-direction part matters - a pass rate that jumps is as much a signal of upstream change as one that craters.

Three numbers on cost:

  • Contract enforcement is local (jsonschema plus compiled regex): 14 ms median, $0 marginal
  • Retries run about $0.06 per regeneration at current fable-5 pricing; inlining the violation into the retry prompt fixes it on the first attempt 89% of the time
  • Week one post-fix: $4.20 total retry spend, 63 quarantined posts, 58 regenerated clean, 5 hand-edited

The 12-day blind window on blog A cost more than $4.20 in editor time before lunch on the first day.

What it caught since

On 06-09, the day before the apology, the sentinel paged again: newsletter pass rate snapped from 84% back to 99% inside a single window. That is the signature of an upstream rollback, timestamped in our own data instead of reconstructed from a vendor post. We handed support the exact UTC boundaries of both transitions; both matched their internal rollout times.

It also caught something unrelated on 06-10: a Jinja variable in one of our own templates rendering empty, halving word counts. Same alert path, 23 minutes to page. That is the real argument for contracts over vendor trust - the check does not care whose fault the drift is.

Lessons

  • Model default behavior is weather, not geology. Build for variance, not for the snapshot you tested against.
  • Pinning narrows variance; it does not eliminate it. Server-side layers live outside the model string.
  • The contract you care about is never the one the API gives you. Encode it, enforce it at the boundary, fail closed.
  • Ledger the passes, not just the failures. Baselines turn “the output feels off” into “drift started 2026-05-28 14:07 UTC.”
  • A guardrail you cannot observe is indistinguishable from a bug, in both directions. Anthropic should ship changelogs. You should never need them to know your own pipeline changed.
  • The vendor apology is not your incident review. Write your own. This one is mine.

Follow-ups on the board:

  1. Backfill contract verdicts for blog A’s full archive (62k generations, batch job, roughly $11 estimated)
  2. Per-property contracts instead of one shared config - IG captions and 1,500-word posts should not share min_words logic (done for 2 of 6 properties)
  3. A canary lane: 5% of jobs run against the unpinned latest model with contract results compared, so the next upstream change shows up as a canary delta before it shows up in production

Anthropic apologized. Accepted. An apology fixes their rollout process; it does nothing for your architecture. Explicit contracts, an audit ledger, sentinel baselines - that is the part you own, and it is the part that turns 12 days into 41 minutes.

Try Claude Code yourself: https://claude.ai/code


Contains a referral link.

Share

Keep Reading

Stay in the loop

New writing delivered when it's ready. No schedule, no spam.