Seven security layers mapped to OWASP LLM Top 10, a system prompt rewritten as a behavioral spec, and the discovery that your safety infrastructure can fight itself at the worst possible moment.
Most product managers talk about AI safety in abstractions. This week I built it, layer by layer, into an emotional wellness app called Duskglow — and the trade-offs were sharper than any whiteboard session could have predicted.
Here's what I shipped, what broke, and the mental models I'm carrying forward.
I used Lovable to generate the frontend (React, Tailwind, PWA) but deliberately chose a self-managed Supabase instance over Lovable's hosted cloud. The reasoning was simple: when your product handles journal entries — some of the most intimate text a user will ever write — you cannot outsource control of OAuth, Edge Functions, Row Level Security, or secret storage. Convenience is not a valid trade-off against custody of user data.
The same logic drove model selection. Gemini 2.5 Flash on Paid Tier 1 guarantees that user entries are excluded from Google's model training. For a journaling app, this isn't a feature. It's a prerequisite.
I designed and deployed a security architecture inside the Edge Function with seven discrete layers: JWT authentication, input sanitization, tone validation, history cap, Gemini safety filters, output filtering, and rate limiting. Each maps to a specific threat in the OWASP LLM Top 10 framework.
The interesting design tension was in the safety filter thresholds. I set Gemini's filters to BLOCK_MEDIUM_AND_ABOVE — aggressive enough to catch genuinely harmful content, but permissive enough to avoid blocking the kind of raw, difficult emotional reflections that are the entire point of journaling. Getting this calibration wrong in either direction kills the product: too loose and you have liability; too strict and you have a tool nobody can actually use for its intended purpose.
The biggest mental model shift this week: the system prompt is the product spec.
I moved from ad-hoc prompting to writing eight self-contained system prompt blocks — Crisis Detection, Emotional Dependency, Identity, and five others — each structured as a behavioral specification with explicit testable rules, "MUST NEVER / MAY" constraints, and example dialogue. These aren't suggestions to the model. They're acceptance criteria.
I then established a strict priority hierarchy across blocks: Crisis overrides Identity, which overrides Dependency, which overrides Off-Topic, and so on down the chain. In a traditional product, you'd encode this in application logic. In an AI product, your prompt architecture is your application logic, and ambiguity in priority ordering creates unpredictable behavior at the worst possible moments.
Two decisions this week were explicitly about limiting capability to reduce risk:
Memory depth. I chose structured working memory with user-initiated retrieval instead of full conversational memory. The reason is counterintuitive: a journaling companion that remembers everything you've ever told it is a parasocial dependency risk, not a feature. Constraining memory is a product decision, not a technical limitation.
Crisis handling. The system logs only metadata during crisis detection — timestamps and resources surfaced — rather than storing conversation content. This creates legal defensibility without the compounding liability of retaining the most sensitive text a user could possibly generate. Combined with an 18+ DOB gate to sidestep COPPA, CA SB 243, and emerging state-level AI regulations, the compliance posture is built into the architecture rather than bolted on later.
Safety filters fighting themselves. The most important finding this week was a conflict I'm calling the "zero finding": Google's built-in Gemini safety filters can preemptively block the app's own crisis response protocol. The model designed to help a user in distress gets blocked by the infrastructure designed to prevent harmful output. The fix requires building a pre-Gemini detection layer in the Edge Function that bypasses the LLM entirely during a crisis — routing around the model at exactly the moment it matters most.
Context decay nearly killed velocity. Before I implemented a structured extraction-and-merge methodology into a dedicated Claude Project, every session began with significant time lost re-establishing project state. The AI would drift, forget architectural decisions, and generate code that contradicted earlier sessions. The lesson: for any multi-session AI-assisted build, your context management strategy is as important as your tech stack.
Filtering false positives. Overly aggressive keyword filtering on output created absurd collisions — the word "gemini" blocked zodiac conversations, "lovable" blocked common English phrases. A reminder that naive string matching is never sufficient for content moderation.
Spec drift across sessions. A personality definition established in session one (four tones) silently degraded to three tones by session six, creating a specification mismatch that is currently blocking final prompt assembly. In AI-assisted development, the system's memory of your spec is not your spec. You need an external source of truth.
The frontend is not yet wired to the deployed Edge Function. Until that integration is complete, journal saving, streak tracking, and history population remain non-functional. This is the current critical path.