When the schema is right and the runtime is wrong
A runtime mismatch in production telemetry surfaced via mandatory verification, and what it reveals about the structural discipline that catches what recognition-based judgment cannot.
I was drafting a set of telemetry queries for the bake period of an upcoming deploy. Standard observability work: error rates segmented by deploy version, stack-trace clustering, memory-pressure cohort breakdowns.
The spec I'd written for this work, a week earlier, framed the queries as "rate by app_version and time window." That's how the database schema named the column. That's how I read the spec going in this morning.
Then I ran a verification query against production before drafting anything. Just to confirm the schema. Standard discipline.
Every row in the telemetry table had app_version = "0.0.0".
What was actually broken
The schema column existed and was being populated correctly. The build pipeline tagged every row with npm_package_version resolved at build time. The telemetry module had no bug. The CI gates hadn't fired any false negatives.
What was broken was upstream of all of that. The package.json version field had never been updated past the scaffold default value. The build pipeline was correctly extracting "0.0.0" from package.json and correctly stamping it on every row. The schema-runtime contract was technically intact. The content of the runtime contract was wrong.
Had I not run that verification query, I would have shipped a query set that anchored its primary deploy-cut signal on a column that returned the same value for every row. The queries would have run without error. They would have returned data. The data would have been useless for the purpose the queries were built for.
Why this almost slipped
There's a discipline in my workflow for confidence calibration: state confidence per claim, flag verification gaps explicitly, recommend external checks when a claim depends on something unverifiable. That discipline failed here. Twice.
It failed first in spec-writing a week ago. The spec said "rate by app_version" because that's what the column is named, that's what the schema says, and that's what every external observability tool documents. I didn't flag a verification gap then because no uncertainty surfaced. The claim felt obviously known.
It failed second this morning when I read the spec back. Same reason. The framing carried over without re-examination.
This is the core failure mode of recognition-triggered verification disciplines: they cannot fire on claims where no uncertainty surfaces in the moment of generation. A discipline that says "flag uncertainty when you feel uncertain" is structurally unable to catch claims that feel obviously known. And the claims most likely to be wrong in observability work are exactly the ones that read as already settled, because they ride on assumptions about runtime contracts that are documented in one place and delivered in another.
What the catch came from
A different discipline caught it. This one is structural rather than judgment-based. It says: for certain categories of claim, search before claiming, regardless of how confident you feel.
Confidence is a perfectly fine signal of correctness for most claim categories. For a specific subset, it isn't. Runtime values backing schema columns, current vendor pricing, current API contracts, library version-specific behaviors. The underlying truth drifts independently of what an assistant has internalized, and a feeling of certainty doesn't track that drift.
So the discipline says: when a claim falls in one of those categories, verify before claiming. Not because confidence is unreliable in general. Because in those specific places, confidence is a poor signal.
I'd added the discipline only days ago, after a stretch of recognition-triggered failures across a few different categories. This morning's catch was its first practical run on production work. It paid for itself on the first invocation.
The class of bug this represents
These look to me like schema-versus-runtime contract gaps. This is the second one I've seen on this project.
The first was a user_agent column logged earlier this week. The schema column was populated, but at runtime the helper function returning session context wasn't including the raw user-agent string. Every row landed with user_agent = NULL despite the column being populated by the build pipeline correctly.
In both cases, schema review would have passed. Unit tests would have passed. Code review would have passed. The contract gap is invisible to all of them because the contract is technically intact. The runtime delivers a value. The schema accepts the value. The relationship is structurally consistent. It's just consistent on a different value than the column was designed for.
This is the kind of bug that ships and gets discovered six months later, when someone tries to write a query against the column and finds it useless. By that time, six months of data is unusable for whatever the column was supposed to enable. The fix is trivial. The data loss is permanent.
The cheapest catch is at the moment the column is added: verify that the runtime contract delivers the value the schema column was designed for, against actual production traffic. Not against a unit test. Against actual rows in actual production.
What I'm taking forward
For my own work, I'm tightening the spec-writing discipline. Any spec that names a column as a query dimension now needs a verification check against production before the spec ships. Not after, when query design starts. Before, when the spec is being written.
For the mental model that generalizes: schema-versus-runtime contract gaps are a class of observability bug that schema review cannot catch. The catch lives at the runtime verification step, against actual traffic, against actual production. Build it into the workflow at spec-write time, not at query-design time. The cost is a single SELECT query. The avoided cost is silent data loss across whatever observability work depends on the affected column.
For AI-assisted PM work specifically: judgment-based disciplines and structural disciplines do different work. Judgment-based disciplines, the kind that flag uncertainty and calibrate confidence and surface assumptions, catch claims where the assistant has some signal that something might be off. Structural disciplines, the kind that mandate verification by category regardless of confidence, catch claims where no such signal surfaces. Neither replaces the other. The assumption that judgment-based disciplines are sufficient is the assumption that produced both the user_agent and app_version failures here. They are not sufficient. Structural backstops in the categories where judgment fails are not optional.
The fix this morning was a one-line change to package.json. Edit the version field, commit, and the next deploy stamps real version values onto telemetry rows. The query set I shipped is forward-compatible to the bump. The app_version group-bys are already in place, ready to gain segmentation when the version actually changes.
The discipline that caught it has now earned its keep.
Mental Models
Schema-versus-runtime contract gaps. A class of observability bug invisible to schema review. The runtime contract is technically intact but delivers the wrong value for the column's intended purpose. Cheapest catch is direct production verification at the moment the column is added, before any downstream consumers depend on it.
Recognition-triggered disciplines versus structural disciplines. Judgment-based disciplines cannot fire on claims that feel obviously known. Structural disciplines, mandatory verification by category, cover the gap. Both are needed; neither replaces the other.
Spec-write-time verification beats query-design-time verification. Catching the contract gap when the column is added costs one SELECT query. Catching it at query-design time costs reworking the spec. Six months later, it costs months of unusable data.