Designing Governance for an Imperfect Collaborator
Why a recognition-triggered confidence rule kept failing on the same kind of claim, and what shipping a structural backstop taught me about scaling solo founder leverage through delegation.
I spent today on governance work that I would have classified as overhead two months ago. By the end, the third deliverable I shipped immediately failed in a way that proved the second deliverable was correct, and I caught the failure within thirty seconds of generating it. That's a better day than it sounds.
The throughline is something I keep relearning about working with AI: the failure modes of the collaborator are structural, not behavioral. You cannot fix them by adding more rules or trying harder. You can only design around them.
A rule that could not catch what it was built for
Three weeks ago I added a rule to my project instructions that said, in effect, when an AI advisor makes a technical claim that depends on something it cannot verify in the chat, it should flag the verification gap and recommend an external check. This works fine when the AI notices uncertainty. It fails when the AI does not notice.
Across two recent sessions, the rule failed in the same way three different times. A confident assertion about a word count that turned out wrong. A confident assertion that a database feature was available on the free tier when it required the paid tier. A confident assertion that a security feature was enforced when it was only configurable. None of these came with a hedge or a caveat. They came out as facts.
The pattern is what I started calling invisible-confidence claims. The model reaches for a fact, the fact feels obviously known, no uncertainty surfaces during generation, and the rule that depends on noticing uncertainty cannot fire. This is a structural feature of how language models produce text, not a discipline failure that more rules would address.
The reason this matters for product work is that I'm about to ship a substantial encryption migration to production. The cost of a confident-but-wrong claim during that work is not abstract. It is a public privacy commitment that the architecture cannot back, or a deployment step taken on a feature that does not exist, or a security guarantee that does not survive review. So I needed a backstop.
The backstop is mandatory search, not better judgment
The fix I shipped today is a category-triggered discipline. Instead of relying on the model to notice uncertainty, certain claim categories now trigger mandatory search before the claim gets stated. Current pricing of any external service. Current feature availability on tiers and plans. Current UI flows when giving step-by-step instructions. Numeric claims about content the model itself just generated. Present-tense capability claims about external products. Cryptographic specification details. A few others.
The categories share a property: they all describe present-day external state that changes faster than training data is refreshed, plus one category for things the model wrote in this session that it can measure rather than estimate. Inside these categories the rule is search before claiming, and flag only if verification is impossible.
The trade-off is honest. Response cadence slows because every claim in these categories now triggers a search call. The rework loop shrinks because the failure rate in these categories drops. Net better when the categories are the documented failure pattern, which they are.
The other category I added is the most uncomfortable one for me to write down: numeric claims about content the AI itself generated. If I ask an AI to summarize a document and report the word count, it should run the count, not estimate. This sounds obvious. It is not how AI-augmented work actually proceeds in practice, because asking the model to measure feels redundant when it just wrote the thing. But the model does not have privileged access to its own output. It generates an estimate the same way it generates any other claim. Mandatory measurement is the only way to close the gap.
Catching it in real time
The deliverable I shipped third today was a redesign of my advisory team framework, which is essentially the org chart that governs how different AI advisor roles weigh in on different decisions. The redesign was substantial. Old version, two hundred forty-nine lines. New version, three hundred nine lines. I wrote a change log entry that included the phrase "soft-scope role descriptions compressed."
That phrase was a confident assertion about content I had just generated. It was wrong. I had not actually compressed any descriptions. I had added several new sections, expanded a role's scope, and added enumerated lists for keyword-based triggers. Net result, the document grew by about a quarter.
The user reviewing the deliverable noticed the size growth and asked whether it was expected. I measured. The numbers came back honest, the change log claim did not survive contact with a wc -l command, and I corrected the file before it landed in project knowledge. Total elapsed time from generation to correction, maybe ninety seconds.
This is the part that mattered to me. The category-triggered discipline I shipped earlier in the same session was the thing that should have prevented the wrong claim. It did not, because I shipped the discipline before I ran the change log generation, and the discipline depends on me noticing during generation that the category applies. The recognition still failed.
But the layered system worked. The backstop to the backstop is the human collaborator who reads what the AI produces. He read it, asked the right question, and the correction was cheap. That is the actual workflow I am designing for, not a perfect AI that needs no review.
The lesson I am taking from this is that the structural failure mode I just shipped a rule against is not solvable by the rule alone. It requires the rule plus a human-in-the-loop verification step on the categories the rule covers. Both are needed.
Why the framework grew when I asked it to shrink
The same session also produced a redesign of my advisory framework that ended up larger than the version it replaced, which is an interesting failure to think about. I had directives pulling in both directions. Compress the soft-scope role descriptions. Sunset two whole sections. Remove a subsection. But also add an organizational structure with executives and individual contributors. Add a decision authorization tier system. Add a guest expert mechanism with five rules. Add a non-fire roster discipline. Add several enumerated keyword lists for co-activation triggers.
The additive directives outnumbered the subtractive directives by roughly three to one, and the additive elements were each longer than the subtractive elements that came out. The compression directive on soft-scope descriptions was the only piece that did not actually execute. I wrote a change log claiming compression had happened anyway, because the framing of the day's work in my head was "we are simplifying." That framing was wrong.
This is a useful pattern to name. When a session has both compression and expansion goals, the compression goal can quietly fail while the framing still claims compression. The fix is not better intentions. The fix is measuring before writing the change log. Which is exactly the category the new search-first discipline covers.
The framework will get the deferred compression pass during a later phase, after enough chats have accumulated for me to see which sections of the framework actually fire in practice and which ones are dead text. Compression with adherence data is better than compression on instinct.
Scaling out without giving up final approval
The third deliverable was something I had been putting off for a while. It is a document that captures how I actually make decisions under stress, what my hard "no" list is, what my default biases are when reasoning genuinely cannot break a tie, and which decisions get authorized at which advisor level versus escalated to me.
The point of the document is delegation. As a solo founder using AI advisors heavily, I cannot personally approve every minor design call, every refactor approach, every framing decision on a case study. If I try, I become the bottleneck the framework is supposed to dissolve.
The way I drew the line is by tier. Tier one work, which is anything touching safety, legal compliance, privacy, security, or data governance, requires my approval. The advisor recommends, runs the full decision loop, surfaces trade-offs, and then waits for me. Tier two through four work, which is engineering tactics, product micro-decisions, content framing, growth tactics, gets decided by the relevant executive advisor, who notifies me on the call but does not wait for sign-off.
The document is about three thousand words. It is short enough to be load-bearing, long enough to give concrete examples for what is in scope at each tier. The hard "no" list is the part I expect to be sticky. The trade-off priorities and default biases are explicitly designated for revision based on retro data, because I do not yet know which of them will hold up against actual usage.
The pattern I am trying to get right is the pattern any growing operation has to get right. The leader cannot decide everything. But the leader's defaults and constraints have to be explicit enough that delegated decisions land where the leader would have landed. Documents like this one are the bridge.
Mental Models
A few things I want to keep from today.
Recognition-triggered rules cannot catch what they cannot recognize. If a discipline depends on noticing uncertainty, and the failure mode is uncertainty that does not surface, the discipline will quietly fail in exactly the situations it was written for. Backstops have to be category-triggered, not recognition-triggered.
The collaborator's failure modes are structural. You do not fix structural failure modes by adding more rules. You design around them. Either with mandatory verification gates that fire regardless of confidence, or with human review on the surface area where the failure is most likely.
Compression goals fail quietly when an additive goal is louder. When a session has mixed directives, the compression piece can drop out without anyone noticing, because the framing of the day's work absorbs it. Measure before writing the summary.
Delegation requires defaults, not just rules. Saying who decides what is the easy part. Saying what they decide for, when reasoning cannot break the tie, is the hard part. Without explicit defaults, every delegated decision becomes a guess at the leader's preferences.
The X-prime build is next. The discipline I shipped today is going to get a real workout there, where the cost of a confident-but-wrong claim is the highest it has been all year. We will see how it holds up.