We built a running app where your heart rate writes the story

Q: What AI model does Runnory use for narrative generation?

Runnory uses Gemini 2.5 Flash Lite (gemini-2.5-flash-lite) with max_output_tokens set to 200 and temperature 1.2. The model generates 30-word narrative beats with 600–900ms end-to-end latency, which fits within the 7-second cycle time.

Most running apps record what you did. Runnory uses what you're doing — right now, this heart rate, this pace, this terrain — to generate a story beat. Not a pre-scripted one. One that wouldn't exist if your heart rate were five BPM lower.

The problem with audio running experiences

Zombies, Run! pioneered the genre in 2012 and it's good. But its narrative is fixed. You run, the story plays. Your 190 BPM sprint and your recovery jog hear the same next chapter.

We wanted the story to be a direct function of your physiology. Not "you're in chapter 4" but "your heart rate just crossed Zone 4 — the world reacts."

That's a harder problem. It requires real-time biometric ingestion, a state model that converts raw BPM into narrative intensity, and a language model you can prompt in ways that produce 30 spoken words of coherent dark fantasy on demand, every 7 seconds, without repeating itself.

The architecture

Every 7 seconds, the mobile app fires a POST to /session/ping:

{
  "session_id": "uuid",
  "lat": 40.7128,
  "lon": -74.0060,
  "heart_rate": 145,
  "pace": 5.8,
  "elapsed_s": 420
}

The backend is a 5-node LangGraph graph that runs on each ping:

START
 ├─ environment_scanner ─┐
 └─ biometric_analyst ───┴─→ safety_guard ──[blocked]──→ END
                                          └─[safe/warn]─→ narrative_director
                                                                  ↓
                                                       audio_synthesizer → END

The two leading nodes run in parallel. environment_scanner does an O(1) Redis lookup on the runner's H3 cell — we pre-cache OpenStreetMap data so we know if the runner is on a bridge, in a forest, near a cliff, or on a motorway. biometric_analyst computes the HR zone and handles debouncing. They converge at safety_guard, which can suppress the LLM entirely if the terrain is hazardous (railway crossing, active motorway). Then narrative_director calls Gemini. Then the TTS script goes back to the phone.

Latency budget: we need to be done in well under 7 seconds so there's no silent gap between beats. In practice the graph completes in 600–900ms end-to-end, most of which is the Gemini call.

Converting heart rate into a narrative world

We map BPM to one of five zones using the runner's estimated HRmax (220 − age, with manual override):

Zone	% HRmax	Narrative mode	What the world does
1	0–57%	Lore	Calm. The world reveals its secrets.
2	57–64%	Setup	Something stirs. Foreshadowing.
3	64–77%	Tension	Enemies spotted. The sky changes.
4	77–95%	Chase	Active pursuit. Run.
5	95%+	Crisis	Last stand.

Each zone maps to a WorldState — three floats that encode narrative intensity:

_ZONE_WORLD_MAP: dict[int, WorldState] = {
    1: WorldState(storm_intensity=0.0, enemy_proximity=0.0, visibility=1.0),
    2: WorldState(storm_intensity=0.2, enemy_proximity=0.1, visibility=0.9),
    3: WorldState(storm_intensity=0.5, enemy_proximity=0.4, visibility=0.6),
    4: WorldState(storm_intensity=0.8, enemy_proximity=0.75, visibility=0.2),
    5: WorldState(storm_intensity=1.0, enemy_proximity=1.0, visibility=0.0),
}

These three values go into the LLM prompt. The model doesn't see "Zone 4." It sees storm_intensity: 0.8, enemy_proximity: 0.75, visibility: 0.2 — and it knows, from the system prompt, that high enemy proximity means immediate danger and the narrator should convey that without using game language.

Pace modulates the state too. If the runner slows below their baseline pace, enemy proximity increases regardless of heart rate — because slowing when you're being chased should feel dangerous:

pace_ratio = pace / baseline_pace
proximity_delta = min(0.25, max(0.0, (pace_ratio - 1.0) * 0.5))
world_state["enemy_proximity"] = min(1.0, base + proximity_delta)

At 20% slower than baseline, proximity bumps by 0.10. At 50% slower, it caps at +0.25.

The BLE noise problem

Bluetooth heart rate sensors are noisy. A single corrupt packet can spike BPM by 30 points. If we committed every zone change immediately, the narrator would thrash — "enemies spotted" followed immediately by "silence returns" followed immediately by "they've found you."

The fix is debouncing: a pending zone transition must hold for N consecutive pings before it's committed.

def zone_debounce_threshold(from_zone: int, to_zone: int) -> int:
    if to_zone < from_zone:
        return 3   # descending: ~21s — avoids "you're safe" firing mid-effort
    return 2 if (to_zone - from_zone) == 1 else 3
    # +1 zone: 2 pings (~14s) — reactive but not jittery
    # +2 zones: 3 pings (~21s) — large jumps are almost always noise

Descending transitions (recovery) require three pings because runners frequently dip in HR mid-effort without actually recovering. We don't want to tell someone "the threat fades" when they're 30 seconds into a 5-minute tempo block.

When a zone transition is committed, it bypasses the LLM entirely and fires an instant pre-written beat:

TRANSITION_BEATS = {
    (2, 3): "The sky darkens. Something moves in the trees.",
    (3, 4): "They've found you. RUN.",
    (4, 5): "Nowhere left to run. Turn. Face it.",
    (5, 4): "It retreats. You survived. Keep moving.",
    (3, 2): "The threat fades. The world exhales. Well done.",
}

Zero latency, no LLM call, immediate delivery. These lines are short enough to read in under two seconds. The next regular ping picks up the new zone's world state.

Prompting the narrator

The LLM constraint that shaped everything: 30 spoken words maximum, bracket pause tags excluded. At a comfortable narration pace, 30 words is about 12 seconds — long enough to be meaningful, short enough not to overlap with the next ping.

The narrative_director node builds a structured prompt from the session state and calls Gemini 2.5 Flash with temperature=1.2 and max_output_tokens=200. High temperature because we want genuine variety; capped tokens because we don't need it to write a novel, just a line.

The prompt includes:

The story world flavour (one of five)
Current WorldState values
Geospatial context from the environment scanner
The last 5 beat lines (so it doesn't repeat itself)
The last 2 imagery categories used (so it doesn't, say, reference ravens three times in a row)
Hard constraints: no direction instructions, no pace references, no mention of game mechanics

We have five story worlds, each with a distinct narrator voice and pacing style:

Story	Setting	Narrator
Shadow Realm	Dark fantasy ruins	Ancient herald — gravelly, ominous
Wandering Wilds	Enchanted nature	Warm naturalist — soft, observant
Last Signal	Sci-fi survival	Mission control — cold, accepting
Blood & Olympus	Greek mythology	Blind epic poet — commanding
Neon Fugitive	Cyberpunk thriller	Street hacker — sardonic, fast

Each world has ~15 narrative seeds — (opening line, imagery category) pairs — that the director rotates through, never picking a category used in the last two beats:

_NARRATIVE_SEEDS["shadow_realm"] = [
    ("Frost-rimed stone. Silence older than the ruins themselves.", "stone"),
    ("Something vast moves beneath the earth — a breath, not a tremor.", "earth"),
    ("What manner of creature left these marks upon the bark?", "creature"),
    # ... ~15 total
]

The seed gives the model a starting image. The zone's world state tells it how intense to make the response. The result is a beat that's coherent, contextual, and different every time.

What we keep between pings (Redis)

The LangGraph nodes are stateless. State lives in Redis with a 2-hour TTL per session:

{
    "narrative_arc":             "intro",
    "beat_history":              [...],    # last 5 TTS lines
    "prev_hr_zone":              3,
    "pending_zone":              None,
    "zone_hold_count":           0,
    "recent_imagery_categories": [],      # last 2 seed categories
}

This means a runner can pause their run, take a call, and resume 20 minutes later. The story knows where it left off. The debounce counters are intact. The narrator won't repeat the last thing it said.

What we got wrong (and fixed)

Temperature too low. We started at 0.8. The beats were coherent but repetitive. At that setting, the model gravitates toward similar constructions when there's not enough entropy. 1.2 produces more variance without becoming incoherent for 30-word outputs.

Debounce threshold too aggressive. Early builds used 4 pings for all transitions (~28 seconds). For a runner doing intervals, that meant the zone change landed after the sprint was already over. Dropping ascending transitions to 2 pings made the system feel reactive without adding jitter.

LLM for transition beats. First version called Gemini for zone transitions too. The 600ms latency meant the "they've found you" line arrived noticeably late, right when the runner had already been in Zone 4 for several seconds. Pre-written instant beats fixed this.

WorldState as discrete prompts. Before encoding world state as three floats, we used strings like "the enemies are close." The model treated these as hard facts and wrote beats that made no narrative sense when the zone changed ("the enemy, who was just here, has vanished"). Numerical values let the model interpret intensity without hard anchoring to specific story facts.

What's next

Right now geospatial context is limited to terrain type (forest, urban, water, elevated). Specific OSM features would do more — running past a cemetery, a clock tower, a river — and having the narrator acknowledge them without breaking immersion. A graveyard in Shadow Realm should feel different from a graveyard in Neon Fugitive.

We're also building a post-run summary that reconstructs the narrative arc from the session: which zones you hit, when, for how long, and which beats fired — so after a hard tempo run you can read back the story your heart rate wrote.

Common questions

How does Runnory use LangGraph?

Runnory uses a 5-node LangGraph directed graph to coordinate its real-time narrative pipeline. Two nodes run in parallel — EnvironmentScanner (GPS, pace, terrain) and BiometricAnalyst (heart rate, HR zone, zone history) — then a SafetyGuard node merges their outputs, a NarrativeDirector node calls Gemini 2.5 Flash Lite to generate the next story beat, and an AudioSynthesizer node converts the text to speech. The full graph runs every 7 seconds during an active run.

What AI model does Runnory use for narrative generation?

Runnory uses Gemini 2.5 Flash Lite (gemini-2.5-flash-lite) with max_output_tokens set to 200 and temperature 1.2. The model generates 30-word narrative beats with 600–900ms end-to-end latency, which fits within the 7-second cycle time.

How does Runnory convert heart rate into a story?

Heart rate is mapped to one of five zones (Zone 1 at 0–57% HRmax through Zone 5 at 95%+ HRmax). Each zone maps to a WorldState — three floating-point values representing storm_intensity, enemy_proximity, and visibility. This WorldState is injected into the Gemini prompt every 7 seconds, so the narrative responds directly to what the runner's body is doing.

Does Runnory work offline?

Not currently. Each narrative beat requires a live LLM inference call to Gemini, taking 600–900ms end-to-end. Offline mode is on the roadmap but is not yet available.

Wondering why we chose a generative model instead of a pre-written story? Read our philosophical breakdown against Zombies, Run! here.

Runnory is in pre-release. Join the waitlist at runnory.com.