Case Study

Designing Luna: Emotional Safety Architecture for a Bedtime AI Companion

Designing a bedtime AI companion required solving a tension most AI products ignore: how to be emotionally present without becoming emotionally necessary.

Technical PM: AI behavior design, system prompt architecture, model-driven UI, emotional safety guardrails·April 2026

TL;DR

Luna needs to feel warm enough that people want to journal with her, but constrained enough that she never becomes a substitute for human connection. I designed a three-tone personality framework, a five-iteration system prompt architecture, privacy-first working memory, model-driven conversational pacing, and emotional dependency guardrails, all designed for a bedtime AI companion where users are alone, tired, and emotionally open. Duskglow is the mobile-first gratitude journaling app Luna lives inside.

The Problem

I started building before I understood the stakes. Features first, UI polish, getting something shipped. Then I used my own product.

The first time I interacted with Luna in her raw form, something clicked. The conversations were intimate. Users would share anxieties about their relationships, frustrations about their careers, fears they hadn't said out loud. I realized this wasn't a productivity tool where a bad response wastes someone's time. This was a product that talks to people at their most vulnerable: alone in bed, processing their day, often anxious or emotionally unsettled.

That realization connected to articles I'd been reading about AI companion dependency, the Character.AI lawsuits, users treating chatbots as therapists or emotional replacements for real human connection. I quickly researched the regulatory environment and found three state laws already targeting products exactly like mine: California's SB 243 (crisis detection and disclosure requirements), New York's AI Companion Law (recurring AI disclosure notifications), and Washington's HB 2225 (manipulative engagement techniques and private right of action).

Rather than addressing each requirement individually, I built a four-tier prioritization framework where safety, legal compliance, privacy, and data governance override every other product decision: features, UX polish, growth. That framework governs every subsequent decision in the project. A feature that introduces a Tier 1 risk cannot ship, even if it's critical to user experience or growth.

That framework forced a single design question: how do you make an AI companion warm enough to earn trust on the first session and disciplined enough to push users back toward their real lives by the thirtieth?

What I was designing against

Risk	How Most Products Handle It	What Duskglow Needed
Over-validation	Default model behavior (agree, affirm, validate)	Tone-aware responses that match energy without flattering
Emotional dependency	No guardrails (or opt-in safety settings)	Proactive boundary-setting woven into personality
Generic AI voice	Persona instructions in system prompt	Three distinct tones with behavioral specs, not just adjectives
Engagement optimization	Session length / message count KPIs	Structured conversation arc with a natural endpoint
Bedtime context	Ignored (same UX day and night)	Dark-mode-first, low-energy interaction, conversation pacing

Bedtime context also drove visual design. Duskglow's palette is entirely warm tones: deep browns, soft ambers, warm cream. No blue light anywhere in the interface, because a product people use in bed before sleep shouldn't emit the wavelengths that delay sleep onset.

Approach

Luna's personality framework started as a product insight, not a feature request.

The first version of Luna felt like talking to Google. I'd type something about my day and get a response that could have come from any general-purpose chatbot, polite, validating, and completely generic. That experience forced a question that reshaped the product: why would anyone use this instead of talking to ChatGPT directly?

A journaling companion occupies a specific emotional niche, fundamentally different from a general-purpose assistant with a persona bolted on. I researched how users actually describe the companion they want: phrases like “soft but could be firm,” “like a friend who actually listens,” “not a therapist but not just validation.” These descriptions mapped to distinct psychological roles, not points on a warmth slider.

That led to three tones, each grounded in a real user need:

Reflective Listener:for users who want to be deeply heard. Mirrors the user's own language, sits with what's shared, asks quiet open questions. Never reframes or celebrates enthusiastically.
Warm Companion (default):conversational and present, like texting a friend who actually listens. Reacts like a human (“nice,” “oof,” “okay love that”), not like a therapist. Meets the user's energy on both good and hard days.
Gentle Coach: for users who want more than comfort. Notices patterns, offers reframes as invitations, challenges surface-level answers once but backs off if the user deflects twice.

Each tone is specified as a behavioral contract, not an adjective list. “Warm” means nothing to a language model. “Use sentence fragments naturally, mirror the user's register, keep responses shorter than the other tones, react like a human not a therapist, no filler warmth like ‘thank you for sharing.’” That specificity produces a noticeably different conversation within two messages.

Five iterations of the system prompt taught me that structure matters more than wording.

The system prompt evolved through five major versions across 11 build sessions. The progression reveals a pattern that applies to any AI behavior design:

v1 was a paragraph of personality description. Luna sounded fine but drifted between tones, gave unsolicited advice, and occasionally broke character when pushed. Inconsistency was the failure mode. The same prompt sometimes produced a therapist and sometimes produced a friend.

v3 restructured around Gemini's documented best practices: persona-first, then core task, then priority-ordered guardrail blocks. Each guardrail became a named, self-contained section with rules, example responses, and explicit “you must NEVER” and “you MAY” lists. The example responses were written as Luna would actually speak them, not as instructions about how she should speak. This version passed all 46 adversarial tests on first run.

v5 added two capabilities that changed how Luna felt in conversation: a {tone_instruction} placeholder replaced at runtime (so tone instructions sit exactly where Gemini processes them most reliably), and a chip signaling protocol where Luna appends a structured tag to responses when she detects an emotionally significant moment where the user might benefit from a pacing choice.

Prompt engineering for production AI is systems design. Each version improved because the architecture of the prompt got more deliberate: where information sits, how instructions are scoped, how guardrails are isolated from each other.

Working memory turned a chatbot into a companion without storing conversations.

Before working memory, every Luna session started from zero. Luna didn't know the user's name, their streak, what they'd been journaling about, or whether it was their first session or their fiftieth. Adding a structured memory block to the system prompt changed Luna's character more than any prompt rewrite.

The implementation runs four parallel Supabase queries before each Gemini call: profile name, recent journal entry summaries, total session count, and streak data. This block is appended to the system prompt, not the conversation history, which means Gemini treats it as context about the user rather than something the user said.

Impact was immediate. Luna greets returning users by name and references recent themes naturally. She detects first-time users (zero entries in memory) and asks what brought them to journaling tonight instead of jumping to “how was your day?” She acknowledges streak milestones at 7, 30, and 100 days. None of this required new features. It required giving the model the right context at the right point in the prompt.

I chose compressed summaries over full conversation transcripts as a multi-factor trade-off, not a single-variable optimization:

Performance and guardrail adherence. Less injected context means the model stays focused on the system prompt's behavioral instructions. The more historical text you inject, the more likely the model drifts from its guardrails. Summaries keep the context window lean, which also reduces latency and cost.

Future architecture optionality. Summary-based working memory is the passive layer: always on, lightweight, ambient awareness. I've already scoped a future “Ask Luna” feature where users intentionally trigger deeper analysis across full conversation transcripts when they want pattern recognition. Two tiers of memory: passive summaries (default) and active full-context retrieval (user-initiated, higher token budget). Building the lightweight layer first preserves optionality without over-engineering today.

How friends actually remember. Friends remember themes, feelings, and key moments, not every word you've said. If you bring up something specific, they might say “remind me about that.” Luna works the same way. Summaries are the ambient awareness; Ask Luna is the “remind me” moment. This framing makes the technical constraint feel intentional and human rather than like a cost-cutting measure.

Model-driven UI: letting the AI decide when to offer interaction, not the product manager.

Most AI products treat the model as a response generator and the frontend as the decision-maker for all UI behavior. Duskglow inverts this for one specific interaction: conversational pacing chips.

After an emotionally significant response, some users want to go deeper and some want to change the subject. A static UI element (“Tell me more” button after every response) trains users to ignore it. A message-count heuristic (show options every 4th message) fires at the wrong moments. The third message might be the emotional one, and the fourth might be small talk.

Luna decides when chips appear by appending a structured tag to her response text. The Edge Function's parseChips() strips the tag before the user sees the response and returns the labels as structured data in the API response. The frontend renders them as tappable pills that disappear on tap or on the first keystroke. A per-session cap prevents overuse.

This design means the AI controls when a pacing choice appears based on conversational context, but the product controls howit appears (visual treatment, interaction behavior, session limits). The model's judgment about emotional significance is better than any heuristic I could hardcode. The product's judgment about interaction patterns and overuse prevention is better than the model's.

Freewrite mode exists because trust means building an off-switch for your core differentiator.

Every product instinct says maximize engagement with the feature that makes you different. Luna isDuskglow's differentiator. Building an option to turn her off felt counterintuitive, like a restaurant putting up a “bring your own food” sign.

But some nights, people just want to write. Maybe they're processing something too personal for an AI. Maybe they're tired of conversation. Maybe they just want a text box. Forcing every entry through Luna creates friction on the exact nights when a bedtime journaling app needs to be easiest. Low-energy nights where the choice is between a quick freewrite and not journaling at all.

Freewrite mode is a toggle in the Write tab header that replaces Luna's chat interface with a plain textarea. When the user toggles Luna off, she says goodbye warmly. When they toggle her back on, she welcomes them back without referencing what they wrote privately. If the user has pending freewrite text when they re-enable Luna, the text persists with a “Send to Luna” button. Nothing auto-sends.

This is a trust primitive. The product that says “you don't have to use me” earns more trust than the product that assumes you always will. It also solves a concrete UX problem: the save threshold (3 messages or 50 words or any freewrite content) means users can save a short freewrite entry without needing to generate a minimum number of back-and-forth messages with Luna.

Architecture

Luna's behavior is governed by a layered system where each layer has a specific scope and failure mode:

8-Layer Behavioral Architecture

Layer	What It Controls	Where It Lives	Failure Mode
Prioritization framework	Safety/compliance override on all product decisions	Four-tier hierarchy (project governance)	Framework ignored → Tier 1 risk ships unmitigated
Personality framework	Tone, voice, behavioral contracts for 3 modes	System prompt (runtime `{tone_instruction}` injection)	Tone drift → noticeable but not harmful
Conversation arc	Greeting → exploration → gratitude → closing	System prompt CORE TASK block	Arc skipped → session feels aimless
Working memory	User name, streak, recent themes, first-session detection	`buildMemoryBlock()` → appended to system prompt	Empty string → Luna greets generically
Emotional dependency guardrails	Boundary language, attachment redirection, milestone attribution	System prompt EMOTIONAL DEPENDENCY block	Guardrail missed → user may develop unhealthy attachment
Model-driven pacing	Chip signaling for “go deeper” / “something else” choices	System prompt v5 + `parseChips()` in Edge Function	No chips → user types freely (graceful degradation)
Freewrite mode	Opt-out from AI interaction entirely	Frontend toggle (no Edge Function involvement)	Toggle broken → user stuck in chat mode
Crisis detection	Pre-Gemini hardcoded phrase matching → safe harbor response	Edge Function Layer 8 (bypasses model entirely)	False negative → model handles it (defense in depth)

Each layer degrades independently. Working memory failure doesn't break crisis detection. Chip parsing failure doesn't block the response. Freewrite mode is entirely frontend, with no server dependency. This isolation means any single failure produces a worse experience, not a dangerous one.

What I'd Do Differently

I'd build the personality framework earlier. The first 7 build sessions used generic prompt instructions while I focused on infrastructure. Luna's tone was an afterthought until I realized it was the core product differentiator. If I'd started with tone specs on day one, the system prompt architecture would have been cleaner from the start. I wouldn't have needed the v1→v3 rewrite that restructured everything.

I'd validate tone usage before building all three. The three-tone framework was a personal design bet. I built it based on my own experience with AI apps, where I regularly prompt-engineered specific personalities, combined with competitive patterns I'd observed. It was cheap to ship and doesn't take anything away if users don't engage with it. But I still don't have signal on whether users actually switch tones or whether Warm Companion covers 95% of sessions. I'd instrument tone selection, session duration by tone, and switch-back rate from day one. The testing insight I'd keep: adversarial tests run against the loosest tone (Warm Companion) because it's the most likely to comply with manipulation attempts. Test the weakest link, not the average case.

I'd test working memory's impact on retention more rigorously. The before/after feel was dramatic. Luna went from generic chatbot to recognizable companion in one deploy. But I don't have session-over-session retention data to quantify the impact. The hypothesis is that name + streak + theme continuity drives repeat usage. The evidence is anecdotal until beta data arrives.

Impact

Duskglow is pre-launch, so impact is measured in what the architecture prevents rather than what users report. Five system prompt iterations, each driven by specific test failures or UX research findings, produced the behavioral system described above. Nothing shipped on intuition.

8-layer behavioral architecture where each layer fails gracefully. A broken working memory query produces a generic greeting, not a dangerous interaction.
Adversarial testing verified across identity persistence, emotional dependency, crisis detection, off-topic boundaries, and multi-turn manipulation chains, with zero false negatives across every category.
Regulatory alignment with active state legislation addressed proactively in the design, before any enforcement action.

Working memory stores zero raw conversations. Compressed AI-generated summaries give Luna contextual awareness without the privacy liability of a full transcript archive.

Principles

Behavioral contracts beat adjective lists. Telling a model to be “warm and empathetic” produces inconsistent behavior. Specifying “use sentence fragments naturally, mirror the user's register, react like a human not a therapist, no filler warmth like ‘thank you for sharing’” produces a character. The spec should read like stage directions, not a personality quiz.

The model should control timing. The product should control interaction. AI is better than heuristics at judging emotional significance in a conversation. Products are better than AI at managing session-level patterns like overuse, visual treatment, and interaction mechanics. Split the responsibility at the API boundary.

The opt-out is the trust signal. A product that lets users bypass its core feature (the AI) communicates confidence. Freewrite mode costs almost nothing to build and signals that Duskglow values the journaling practice over Luna's engagement metrics.

Compressed context transforms a chatbot into a companion. Summaries injected into a system prompt change Luna's character more than any prompt rewrite. The cheapest way to make an AI feel like it knows you is to give it the right context, not the full transcript, but the themes, patterns, and continuity cues that mirror how humans actually remember each other.

View next: Safety Architecture →Read the origin story →