When You Own the Vision but Not the Code · Journal

Today I shipped the last piece of a privacy architecture I could not verify visually. It was a CI check, a small one, the kind of thing that adds a few lines to a config file and wires a script into a build hook. Nothing about the code was exotic. But the decisions the code encoded were load-bearing for a public privacy claim, and I do not read code well enough to catch a quiet mistake.

That meant someone else had to own the technical judgment. That someone was the AI assistant I was working with. By the time the session ended, we had surfaced, refined, and codified a working model for how that delegation should run. The model is still provisional, but it was built live from a real problem, and the underlying structure generalizes past the specific decision that produced it.

The gap that made the working model necessary

Solo founders get told to "own the whole stack." For product and UX, that is how I actually work. I can feel when a flow is wrong before I can explain why. I can tell when a copy choice erodes trust. Product and user experience judgment is mine.

Code judgment is not. I did not come up through engineering. When my AI collaborator tells me a regex pattern is calibrated correctly, I cannot verify that claim by reading the pattern. I can verify it indirectly (did the test pass, does the build succeed, does the deployed system behave correctly) but I cannot verify it at the level of the decision that produced the pattern.

For most of the project, this has not mattered. The decisions were either small enough that getting them wrong was reversible, or big enough that the AI could explain them to me at the level of product consequence. But the piece I shipped today sat in between. It was small in scope (one config file, one script, one build-hook wiring) and large in consequence (the check enforces an invariant underlying the app's public privacy posture). I needed the AI to own the technical judgment without me rubber-stamping whatever came out.

The working model that emerged has five elements, plus one meta-check. I will walk through each, then show how they played against a concrete decision that came up during the session.

Pre-applied lens checks

When I delegate a technical decision, I ask the AI to run five checks on its own recommendation before surfacing it to me. Each check gets named in the response, even when the answer is "no impact."

The lenses are the ones that matter for this project: privacy posture, feature flexibility for features I have not built yet, interaction with currently-open questions, reversibility, and priority posture against my existing framework. A different project would have a different set. What matters is that the set is written down somewhere and the AI applies it without being asked.

The reason to name the lens even when the finding is "no impact" is trust-surface. Saying "no impact on the privacy posture" out loud is different from silently assuming it. The first is a statement I can hold the AI to later. The second is a gap I have to detect on my own, which I probably cannot.

Explicit confidence calibration per component

High, medium, low, with the reasoning stated on anything below high. Confidence is not the same as claim strength. The AI can be highly confident in a genuinely ambiguous trade-off. It can be medium-confident in a seemingly simple choice if an assumption about the code structure might be wrong.

The effect of calibrating openly is that I get something I can pressure-test without needing the technical grounding to do it myself. A medium-confidence flag is a cue to ask "what would change if the assumption fails." That conversation is shorter and more productive than a full technical review, and I can run it from my side of the table.

Named assumptions

What is the AI assuming about the codebase, the deploy workflow, or my intent, any of which would change the recommendation if wrong?

These are the things I can poke at. I do not always know if an assumption is correct. I often know which ones feel load-bearing, and naming them gives me somewhere to point when my gut says something is off.

Escalation triggers

Cases where the AI stops and asks rather than deciding. The line I draw is not "technical versus non-technical." It is "best answer within constraints" versus "what do you value more."

Technical decisions have a best answer within the stated constraints. Priority decisions require stating the values that rank one outcome above another. The AI cannot decide the second kind on my behalf, no matter how technical the surface looks. If a decision couples to product vision, irreversibility on a public claim, cost implications beyond the current stack, or a Tier 1 trade-off where the question is really "how do you weigh rigor against velocity," it comes back to me.

This trigger list is also where I caught myself almost undercooking this session. The question "do we match identifier names loosely or strictly" sounded technical. It was, in the sense that the regex syntax was the surface. But it was also a trade-off between false-positive friction and false-negative leak risk, and "what do you trade against what" is my lane. The AI surfaced options; I picked the posture.

Plain-language translation on load-bearing technical terms

When a user-experience or product-vision consequence is hiding behind a technical term, translate the term inline, in the same breath as the technical framing. When a term is just scaffolding (build tooling, shell syntax, framework internals) leave it technical. I do not need to understand build hooks to approve a build-hook decision.

The test is whether I can answer one question. If I approve this without understanding the technical details, what product or UX consequence am I approving? Whatever the answer is, that is the thing to state plainly.

For cryptography specifically, translate the property, not just the term. "AES-256-GCM provides confidentiality and integrity" becomes "the encryption scrambles the entry so only the key can unscramble it, and it detects if anyone tampered with the scrambled version." One is a fact I can verify at the level of product trust claim. The other is a jargon wrapper.

Translation is not over-explaining. I do not want a crypto tutorial every time AES comes up. I want the piece of the term that matters for the decision I am making right now.

The governance-rule check that almost got skipped

The meta-check is the piece of the working model that surprised me. Partway through the session, the AI recommended a model setting based on a rule I had written down at a previous working session. The rule was correct at the time it was written. It balanced cost and capability in a reasonable way for a project running under tight budget constraints.

The rule was also, I realized, no longer applying to my actual situation. My subscription plan had changed, which meant the cost-sensitivity trade-off that originally drove the rule no longer applied. The AI had applied the rule mechanically without checking whether the rule's premise still held.

The fix is small. When invoking an existing governance rule, state the rule's underlying trade-off and verify the trade-off still applies to current constraints. Rules capture the trade-off you saw when you wrote them. When the constraints shift, the rule can produce wrong answers while looking correct on paper.

This is the part of the working model I did not see coming. I was worried about the AI overreaching on decisions it should not own. I was not worried about the AI under-thinking a call it should own, by leaning on rules I had given it in a different context. Both failure modes matter. The meta-check catches the second.

What happens when the loop is the default

The session ran across eleven per-operation approval gates. Three times, the interface offered a "yes, and do not ask again" option. Three times, we declined it. The discipline pattern only works if it stays costly enough to mean something each time.

The output was clean. The gate is live in production. The five-element loop plus the meta-check is codified in my governance documents as a provisional behavioral rule. If it holds through the next two ships, it promotes to the advisory team framework as a defined working mode alongside planning and execution.

The deeper thing the session surfaced is that "owning the vision" is not a clean boundary. Some fraction of every technical decision has vision in it. The question is not whether I should be in the loop (I am, always) but what shape the loop has to take so that I can make the calls that are genuinely mine, and trust the calls that are not.

Mental Models

Delegation is not abdication. When an AI collaborator owns a technical decision, your job shifts from deciding to pressure-testing the decision surface. Named assumptions, confidence calibration, and pre-applied lens checks are the pressure-test levers. If any of them are missing, you do not have enough surface area to push against.

Governance rules are snapshots of a trade-off. Every rule you write captures the constraints you saw at the time. When constraints change, rules can produce answers that look correct but aren't. Make it a habit for your collaborator to state the underlying trade-off whenever the rule is invoked, and to check whether the trade-off still applies.

"Best answer within constraints" is delegable. "What do you value more" is not. The line between a technical decision and a priority decision runs along this distinction. The technical surface can be deceptive. Surface the priority question explicitly whenever the decision reduces to what you weigh against what.

Translate crypto claims at the level of product trust, not cryptographic theory. A user-facing privacy claim is defensible or not based on the architecture it rests on. The architecture is describable in plain terms. If you cannot explain the claim in those terms to yourself, do not ship the claim.