The House Style Runs Deeper Than the Colors
An exploration session testing whether you can prompt your way out of an AI tool's house style, run on the tool itself.
The premise behind today's session was uncomfortable. Anthropic's own documentation describes Claude Opus 4.7's default visual style as warm cream backgrounds, serif display type, italic accents, and terracotta or amber accents. That description is also approximately what the product I have been building looks like. Not because I copied it. Because the AI tools I have been using during the build all converge on the same aesthetic when given default instructions. The aesthetic Duskglow inherited is the aesthetic the model wanted to give every product.
That observation, on its own, is a finding worth holding. What I tested today is whether you can do anything about it.
The hypothesis under the hypothesis
The session was framed as a directional design exploration. Three briefs, each pushing in a different direction, each carrying explicit anti-pattern language about what to avoid. The hypothesis at the surface was the obvious one: pushing harder against the defaults produces more differentiated output. The hypothesis underneath was the one I actually cared about. If I write briefs that name the model's house style as the thing to avoid, will the model push back against itself?
I expected one of three outcomes. Either the briefs would bite and I would get something visibly distinct. Or the house style would reassert itself in subtler ways, with the colors changing but the underlying composition staying the same. Or the directions would all collapse into the same generic alternative, which would be its own kind of evidence that the defaults run deeper than the surface treatments.
What I got from Direction A was the first outcome, but only after meaningful work. The system did not fight me, but it also did not arrive at the differentiated version on its own. The first generation came back recognizably executed but compositionally close to what I already had. It took three rounds of chat refinement to reframe the home screen as the cover of a journal rather than another page inside one. That reframe was the thing that made the direction read as paper rather than as another warm app.
What the iteration loop actually cost
The token budget tells a story the visual output does not. Direction A consumed seventy-nine percent of my weekly Claude Design allowance, which forced an honest decision about how to spend the remaining capacity. The session plan called for three directions in roughly equal slices. The actual cost of one direction was three slices worth of allowance. Stretching the remaining twenty-one percent across two more directions would have produced shallower exploration of both.
The decision I made was to defer Directions B and C to the next allowance cycle rather than enable extra usage to compress them into tonight. The reasoning is worth saying plainly because it generalizes. The session goal was rubric evidence across three directions, not a fast finish. Compressed B and C runs would have produced thinner data, and the thing the rubric is supposed to evaluate is whether the tool can do this kind of work well. Running it in a tight token budget that I induced by overrunning Direction A would have been measuring the wrong thing.
The pattern that surfaced inside the session, which I want to remember: the best refinement controls in this tool are the ones that cost zero tokens. Sliders for typography, spacing, and element-level color sat at zero token cost across the entire session. Inline comments on specific elements were cheap. Full chat refinements, where you describe a layout-level change in prose and let the system regenerate, were the expensive operations and the ones that did the heavy lifting on directional reframes. The right discipline is to use chat sparingly for the moves that actually require it, then drive the rest of the iteration with sliders and comments. I was not disciplined enough about that on Direction A.
Where the model fights you and where it does not
The system surprised me in two directions. It did not push back at all on the anti-pattern language in the brief. The deep walnut palette I asked for landed cleanly, the warm cream defaults stayed away, sparkle iconography never appeared. That is the surface battle, and the model lets you win it without much resistance once you name the patterns.
It did push back, indirectly, on the underlying compositional patterns. The first generation of the home screen was a page from a book opened mid-chapter, with the same hierarchy of elements I already had. The model interpreted "make it feel like a journal" as "render the existing layout in journal-style chrome." The reframe to "the home screen is the cover of the journal, the closed book before you open it" was the move that broke the compositional inertia. The visual treatment was downstream of that reframe; the typography and texture choices were not actually the load-bearing decisions.
The lesson, tentatively, is that the documented house style is not just a color palette. It is a set of compositional defaults that the model carries even when the surface treatments are different. Pushing against the colors is the easy part. Pushing against the compositions is where the actual differentiation work lives, and the brief language has to address compositions explicitly or the defaults will quietly persist.
Three friction points worth naming
The session produced specific evidence about where this kind of tool struggles, separate from the directional question. Three patterns showed up that I did not expect.
The first was confabulation in the clarifying round. The model invented details that sounded grounded but were not. "Volume the third" appeared as cover language with no basis in product reality. The user's initials defaulted to M.M. rather than asking which middle initial to use. These are small individually but they signal that the system fills in plausible details rather than flagging gaps, and you have to catch them on review.
The second was that some instructions need explicit exceptions baked in. I wrote "recede the UI" into the brief, and the system applied that uniformly, including to the primary call to action. The first generation hid the most important affordance on the screen behind the same recessive treatment as everything else. Telling the model to recede the UI without saying "primary CTA stays prominent" produces an interface that buries the action you want users to take. That is a brief-craft lesson, and it generalizes to any "less is more" instruction given to a generative system.
The third was that the system does not know what your existing product looks like as deeply as the assets suggest it should. I uploaded screenshots and a tokens file, and the system confidently rendered a deep walnut interpretation that was darker than what I actually ship. The interpretation was good, but it was an interpretation, not an extraction. When I tested whether to accept the darker baseline or push back toward what was actually shipping, I chose to accept it because the session goal was exploration. In a session where the goal was production handoff, that mismatch would have been a real problem.
Mental Models
The house style runs deeper than the colors. When the model produces a visual default, the colors and typography are the shallowest layer. The compositional patterns underneath, what gets emphasized, what hierarchy elements take, how cards stack, how primary actions get framed, those run deeper and survive surface changes. Briefs that only address the surface will produce reskinned defaults.
Iteration discipline is a token strategy. The best refinement controls in any tool are the ones that cost the least to use. When chat regenerations are the expensive operation, the discipline is to use them only for the moves that actually require them and to drive the rest of the iteration with cheaper controls. Failing to triage produces shallow exploration across more surface area.
Confabulation is the cost of fluency. Generative systems that fill in plausible details rather than flagging gaps produce outputs that read polished but contain invented specifics. The work is to catch them on review, not to design briefs that prevent them. Expecting the system to ask is not the right model.
Differentiation is now an active practice for solo builders. The work that used to come from picking your own colors and typography is being pre-done by the model, identically across every product the model touches. If the goal is for the product to read as itself rather than as another instance of the model's defaults, the differentiation work is now part of the PM job, and it has to be done deliberately. There is no neutral starting position anymore.