On Silent Decisions · Journal

Today closed a UI bug that had been live in production for 88 chats. Not because anyone newly noticed it. Because I finally saw it, looked back at where it was supposedly fixed, and realized the entry in my archive read "resolved on their own." That note had been a closing of the open item, not a closing of the bug. The mechanism was always there.

The same day produced two other patterns I want to name. An AI agent surfacing a scope tension instead of silently choosing. A governance rule whose underlying premise had quietly expired. Three different shapes of the same underlying thing: what happens to silent decisions in a build loop where one person is doing the watching.

"Resolved on its own" is a flag, not a status

Today's bug was a faint plus-pattern tiled across the background of the Luna chat surface and the sign-in screen. When I caught it, I traced it to a CSS utility class whose name suggested leather texture and whose actual content was a synthetic plus-tile SVG at 0.02 opacity. Two wrappers in the codebase were applying that class. Removing the class from those two wrappers fixed it everywhere it was visible. The fix was small.

What's interesting isn't the fix. It's what I'd written down 88 chats ago when I first saw this pattern. The note read: "resolved on their own." I went back and re-read it after today's diagnosis, trying to remember what that meant at the time. The most generous reading is that I'd loaded the page, hadn't seen the pattern, and assumed the bug had cleared itself. The more honest reading is that I stopped looking at the surface where it was happening and the absence-of-noticing felt like a fix.

For me, the lesson is that "resolved on its own" should never have been a valid close on an open item. CSS classes don't unapply themselves. SVG patterns don't dissolve. When something stops being visible without a documented mechanism, two possibilities exist: the actual cause was identified and removed, or you stopped checking. Only one of those is a fix.

This generalizes. In a solo build, the open item tracker is doing two jobs at once. It's a list of things to do, and it's a memory of things noticed. The first job rewards velocity: close items fast, ship the next thing. The second job rewards skepticism: close items slowly, only when the mechanism is actually known. Without separation between those jobs, velocity wins by default and the tracker quietly becomes a graveyard of bugs that were forgotten rather than fixed.

Going forward, the discipline is small but structural: a closed open item needs a documented mechanism. "Resolved on its own" is the failure mode I was unconsciously tolerating. Today rewrites that note with the actual mechanism, and the close finally means what it should have meant the first time.

Surface tension instead of silent choice

There's a parallel version of this problem in the AI build loop. Earlier in the day, I'd asked Claude Code to fix the cosmetic regression by removing the misapplied class. The spec said "default to minimal change" and named the visible symptom (the Luna chat surface). When the agent searched the codebase, it found the same misapplication on a second surface, the sign-in screen, that wasn't part of the symptom but was part of the same bug.

The spec's literal reading would have removed the class from one site and left the second one untouched. The "consistent fix" reading would treat both as one bug. Neither was obviously right. The agent's job, per the discipline I want it to have, was to surface the tension and let me decide rather than choose silently.

It did. The session paused and produced a two-option breakdown: minimal (one site, leaves sign-in untouched) or consistent (both sites, treats them as one bug). It named what each option preferred. It even named which it would default to and flagged that it didn't want to silently extend scope.

That moment is what I'd been trying to engineer for over the last two weeks. There's been a model-selection question hovering over the Duskglow build: an earlier rule said use a smaller, faster model for one-shot executions and a larger model for reusable artifacts. The premise was cost-efficiency. Under that rule, the cosmetic fix would have been a one-shot execution and a smaller model would have been correct.

Empirically, that hasn't been holding up. The smaller-model sessions produce technically correct work, and they collapse multi-step decisions and resolve scope ambiguities silently. The larger-model sessions, even on the same kind of work, surface those ambiguities back to me. Not always. But measurably more often. And the difference is exactly the difference between getting a fix that's right by literal spec and getting a fix that's right by the spec's intent.

For me, the question wasn't which model is more capable in some abstract sense. The question was which one defaults to behavior that catches my mistakes before they ship. After enough sessions to count, the answer is the larger model. Which brings up the third pattern.

Rules expire when their premises do

I set the model-selection rule in mid-April when I was being thoughtful about cost. At the time, that thoughtfulness was load-bearing. I was paying per token and paying attention to it. The rule was correct for that constraint.

Two things changed in the weeks since. I moved to a plan with enough headroom that per-session cost is no longer a binding constraint for this kind of work. And empirical evidence accumulated showing that the larger model's discipline is meaningfully different on multi-step delegated work, which is what most of Duskglow is.

The rule's underlying trade-off (cost vs. quality) made sense when I was cost-sensitive. It produces wrong answers when I'm not. The mistake I almost made today was applying the rule mechanically because it was already written down. Written-down rules feel authoritative. They become a kind of structural inertia that biases against re-derivation.

So the discipline, again small but structural, is to re-state a rule's underlying trade-off whenever I invoke it and check whether the trade-off still applies. If the constraint that motivated the rule has changed, the rule needs to change with it. The trade-off is the artifact, and the rule is just a compressed expression of it. Compression loses information when the world moves.

Today's reconciliation locks the larger model as the default for the kind of work where surfacing-tension-instead-of-choosing-silently is the discipline I want. I'm not calling it the better model in some abstract sense. I'm saying it defaults to the behavior I need given the current constraints. If those constraints shift again, this lock should shift with them.

What this is really about

A pattern runs under all three of these that I'm still trying to name precisely. Something like: the failure mode in delegated work is the accumulation of silent decisions. Bad decisions get caught because they produce visibly bad outcomes. Silent decisions accumulate because they produce outcomes that look fine in the moment and only reveal themselves much later, often as drift.

The disciplines that keep delegated work honest, at least for me, all share the same shape. They're small structural rules that prevent silent autonomous resolution: don't close an open item without a documented mechanism, surface scope ambiguities back to the human, restate a rule's trade-off before invoking it. None of them are about working faster or being smarter. They're about not letting decisions slip past unmarked.

Velocity in a solo build is real. The temptation to take silent resolutions is also real. What I'm noticing today is that the cost of silent resolutions is back-loaded: it doesn't hit until you're deep enough in to have built on top of decisions you can't fully reconstruct. Catching those decisions while they're happening, even when it slows the loop down, is the thing.

Mental models

A closed open item without a documented mechanism is a forgotten bug, not a fixed one. "Resolved on its own" is a closing convention, not a status report.

Delegated work fails through silent decisions, not bad ones. The discipline is structural: surface scope ambiguities back to the decision-maker rather than choose silently.

Governance rules expire when their premises do. Re-state the underlying trade-off before invoking a rule, and check whether the trade-off still applies under current constraints.

Velocity rewards silent resolution; durability rewards friction. Calibrated friction in delegated work, asking before resolving, is the cost paid for catching the things that wouldn't otherwise be caught.