What 'done' means when you don't write the code
How a non-engineer PM verifies delegated technical work, and why the cleanest 'all done' report is often the one worth checking hardest.
I spent this week's build time on something a user will never see and that I can't show off in a demo. Not a feature. A safety fix, the kind of work that only becomes visible when it's missing. Near the end of it, the work came back reported as done, with nothing errored. I read that, felt the pull to mark it complete, and went and checked the system myself anyway. The reasoning behind that turned out to be more useful than the result, so that's what this is about.
I can't read my way to "done"
There's a part of my situation I've made peace with. I set the direction for Duskglow and make the product and safety calls, but I don't write the code that turns those calls into the actual system. So when work comes back marked done, I'm holding a claim I can't verify by reading the thing itself. I don't have the training to audit database logic line by line and feel sure I caught everything.
For a long time I treated that as a gap to apologize for. I've stopped. Pretending I can read the code would be dishonest, and it wouldn't help anyone. What actually works is building a way to check that doesn't depend on me reading the code at all.
Why a clean report made me nervous
The report said no errors, and that is the thing that made me want to look closer. It took me a second to articulate why.
Some of the checks I had set up were supposed to fail. They were guardrail tests, where the system is meant to refuse the thing I'm asking it to do, and the refusal is the whole point. If a check like that comes back with no error, that's the opposite of good news. It means the protection I just added isn't doing anything. So a blanket "nothing errored" wasn't reassurance for me. At best it was ambiguous, and at worst it pointed straight at the failure I was trying to rule out.
So I didn't verify against the summary. I went and read the system's own state. Was the safeguard actually in place? Did the stored values look the way they should? And when I deliberately tried to do the thing the system is supposed to forbid, did it stop me? The two checks that mattered most came back as errors, which is what I was hoping for. An error message was the pass.
I want to name the thing I caught in myself here, because that's the reusable part. The moment a task gets reported as finished, there's a real pull toward closure, toward filing it and moving on. For me that pull is strongest when the report is cleanest. A messy report, I dig into by reflex. The clean ones are the ones I have to talk myself into doubting.
This was the product work
None of this showed up on a roadmap a user would recognize. No screen changed. And spending a pre-launch stretch on it, rather than on one more visible feature, is the product call I'd defend, not a detour from product work.
My view on this has gotten less hedged the more time I spend building in this category. On an AI product, owning the safety question is the PM's job, not something to pass to legal or compliance or leave for future-me. The field is shipping faster than it's checking, and that gap is producing real harm to real people now, with the lawsuits and the regulation following behind it. From my experience, the comfortable version of being a builder is to keep adding to the surface that people can see and click. The version I'd stand behind is the one where, before you let more people in, you go make sure the floor holds.
Mental models I'm taking from this
A report of "done" is a claim, not a fact. When you can't verify it by inspecting the work directly, verify it against the system's own state.
Build some checks so the pass condition can't be misread. A guardrail test where success looks like a refusal is about as unambiguous as it gets.
The cleanest report earns the most scrutiny, because the pull toward calling something finished is strongest when nothing looks wrong.
Work no user will ever see can still be the real product work. On an AI product, the floor holding is a feature, even when nobody can point to it.