chore: bootstrap lean sysadmin-chronicles repo

Import the runnable game code, content, docs, scripts, and repo guidance while leaving local agent state, dependency installs, build output, and backup copies out of the published tree.
2026-05-02 11:49:07 -04:00
commit 0265afa054
252 changed files with 37574 additions and 0 deletions
@@ -0,0 +1,702 @@
+# SYSADMIN CHRONICLES — ARCHITECTURE DOCUMENT
+> Version 5.0 | Status: Active development
+>
+> Changelog:
+>   v5.0 — GDScript/Godot codebase removed. Node.js + Svelte is the only codebase.
+>   v4.0 — Full architecture pivot to Node.js game server + Svelte web HUD.
+>   v3.x — Save system, world flags, trust, incidents, pressure system (GDScript era).
+>   v2.0 — Native Godot 4 + libvirt design (superseded).
+>   v1.0 — Browser/v86 prototype (superseded).
+
+---
+
+## 1. PROJECT OVERVIEW
+
+**Sysadmin Chronicles** is a native Linux-only game where the player works as a
+junior sysadmin at Axiom Works, handling tickets inside **real Linux virtual
+machines** managed by **QEMU/KVM via libvirt**.
+
+The runtime stack (as of v4.0):
+- **Game server** — Node.js / Express + WebSocket (`server/`). Owns all game
+  logic: quest state, trust, validation, VM lifecycle, incidents, save state.
+- **Web HUD** — Svelte single-page app (`frontend/`). Tickets, mail, Sage, docs,
+  trust bar. Served from the game server at `http://192.168.100.1:3000`.
+- **Workstation VM** — XFCE desktop (Debian 12, sc-workstation). Player's desk.
+  Chromium auto-opens the HUD. Tilix provides a real terminal for SSH to target VMs.
+- **Target VMs** — Headless Debian (hermes) and Arch (vulcan). Quest objectives
+  live here. Player investigates and fixes via SSH from the workstation terminal.
+
+The player experience:
+- Sits at the workstation VM (via SPICE/remote-viewer fullscreen on the host)
+- Reads tickets and mail in the Chromium HUD
+- Opens Tilix, SSHes to hermes or vulcan, fixes real problems
+- Clicks "Mark Complete" in the HUD — game server SSHes in and validates VM state
+- World reacts, trust shifts, new mail arrives via WebSocket push
+
+No simulated terminal. No fake SSH sessions.
+
+---
+
+## 2. CORE DESIGN PRINCIPLES
+
+- Realism over simulation
+- Native Linux execution only
+- CLI-first development and asset wiring
+- Minimal, stable scenes; behavior lives in scripts
+- Data-driven content for quests, tickets, incidents, and dialogue
+- State-based validation only; never command-sequence checking
+- Multiple valid solutions where possible
+- Pressure comes from evolving systems, not arbitrary timers
+- Progression unlocks access, tools, and scope, not RPG stats
+- Deterministic systems so content is testable and agent-friendly
+- The dirty VM state is the game — preserve it, do not erase it
+
+---
+
+## 3. HIGH-LEVEL ARCHITECTURE
+
+```
+HOST MACHINE
+├── game-server/          Node.js/Express + WebSocket  (server/src/)
+│   ├── ContentLoader     loads content/ JSON at startup
+│   ├── QuestEngine       quest state machine
+│   ├── TicketService     ticket state, mark-complete handler
+│   ├── ValidationEngine  SSH into VMs, evaluates rules
+│   ├── VMManager         virsh start/stop/snapshot wrappers
+│   ├── TrustSystem       score, unlock evaluation, revocation
+│   ├── ProgressionSystem unlocked docs, VMs, access
+│   ├── EmailService      inbox, follow-up emails, reply options
+│   ├── SageService       rule-based knowledge base / dialogue
+│   ├── ShiftTimer        shift clock, pressure tick schedule
+│   ├── IncidentScheduler incident injection
+│   └── SaveState         ~/.local/share/sysadmin-chronicles/save.json
+│
+├── frontend/             Svelte web HUD  (frontend/src/)
+│   ├── TicketsPanel      ticket list, detail, "Mark Complete" button
+│   ├── MailPanel         inbox, message view, reply buttons
+│   ├── DocsPanel         trust-gated internal docs
+│   ├── SagePanel         chat / knowledge base search
+│   └── HeaderBar         trust indicator, shift timer, unread count
+│
+└── content/              JSON content — quests, tickets, dialogue, etc.
+
+NETWORK: sc-internal (libvirt bridge 192.168.100.0/24)
+  192.168.100.1  host  (game server port 3000)
+
+VMs on sc-internal
+├── sc-workstation (ares)   Debian 12 XFCE — player's desk
+│   ├── Chromium → http://192.168.100.1:3000  (HUD, always open)
+│   └── Tilix → SSH to hermes/vulcan          (real terminal)
+├── sc-web-server (hermes)  headless Debian   (Q002–Q005, Q007)
+└── sc-build-machine (vulcan) headless Arch   (Q006, Q008)
+
+PLAYER FLOW:
+  Host starts game server → boots sc-workstation via SPICE
+  Player sees XFCE desktop → Chromium with HUD auto-open
+  Reads ticket → opens Tilix → SSH hermes → fixes problem
+  Clicks "Mark Complete" → server SSHes hermes → validates
+  Trust updates → WebSocket pushes to browser → new mail arrives
+```
+
+---
+
+## 4. RUNTIME MODEL
+
+### 4.1 Game Server — Node.js
+
+The game server (`server/src/index.js`) is a Node.js/Express application:
+- Serves `frontend/dist/` as static files at `/`
+- WebSocket server on the same port (real-time event push to HUD)
+- On startup: loads all content JSON, hydrates services from save file,
+  ensures workstation VM is live via VMManager
+
+The server is responsible for:
+- All game logic (quest state, trust, progression, incidents)
+- VM lifecycle management (virsh via child_process)
+- Validation — SSH into target VMs and evaluate rules
+- Save/load (single JSON file at `~/.local/share/sysadmin-chronicles/save.json`)
+- WebSocket broadcast of trust changes, new mail, shift ticks, incident alerts
+
+### 4.2 Frontend — Svelte
+
+The web HUD (`frontend/src/`) is a Svelte single-page app:
+- Built with Vite; output lands in `frontend/dist/` and is served by the game server
+- All data fetched from the game server API; no local state beyond UI
+- WebSocket client for real-time updates
+- Does not run validation — only displays results
+
+### 4.3 Target Platform
+
+- Host OS: Linux
+- Supported deployment model: start game server on host, view workstation via SPICE
+- Required host: KVM, libvirt, virsh, Node.js 18+, virt-viewer
+- Required install model: one-time host setup with clean uninstall path
+
+No Windows, macOS, or browser target is planned for the host. The HUD is a web
+app served locally — it is never exposed to the internet.
+
+---
+
+## 5. VIRTUAL MACHINE SYSTEM
+
+### 5.1 Required Stack
+
+- `qemu-system-*`
+- `KVM`
+- `libvirtd`
+- `virsh`
+- libvirt virtual networks
+- qcow2-backed VM images
+
+Runtime policy:
+- The shipped game should not require broad `sudo` usage during normal play
+- One-time host setup may require admin approval
+- Ongoing gameplay should run as a regular user against a prepared VM runtime
+
+### 5.2 Core Behavior
+
+The game controls VMs through libvirt, not by emulating them internally.
+
+Responsibilities:
+- Ensure required domains and networks exist
+- Start the active VM
+- Stop or suspend inactive VMs
+- Revert to known snapshots for resets
+- Query runtime state for evaluation
+- Attach the player to the appropriate VM workflow
+
+The workstation and at least one target VM must be able to run at the same
+time. This is required for real SSH-based play and for background incidents to
+continue evolving while the player works elsewhere.
+
+Operational guidance:
+- `workstation` stays live during normal play
+- At least one target VM stays live with it
+- Later phases may keep all major quest VMs active simultaneously
+- Resource budgets should be documented and enforced conservatively
+
+Lab finding:
+- Small headless target VMs were inexpensive on the test host
+- The workstation became materially heavier once a real graphical session and
+  browser were added
+- Budget the workstation separately from server-style quest VMs
+
+### 5.3 Initial VM Roles
+
+| ID | Role | Distro | Hostname | Purpose |
+|----|------|--------|----------|---------|
+| `workstation` | Player desktop | Debian 12 | `ares` | XFCE + Chromium HUD + Tilix terminal |
+| `web_server` | Service host | Debian 12 | `hermes` | Web/service quests (Q002–Q005, Q007) |
+| `build_machine` | Build box | Arch | `vulcan` | Package/build/update quests (Q006, Q008) |
+
+### 5.3.1 Workstation Profile
+
+The workstation is a full XFCE desktop (Debian 12, 768–1536 MB RAM):
+- **Chromium** — opens `http://192.168.100.1:3000` on login (game HUD)
+- **Tilix** — split-pane terminal, set as default; player SSHes to hermes/vulcan from here
+- **Full sysadmin CLI toolkit** pre-installed (vim, htop, tmux, curl, nmap, tcpdump, etc.)
+- SPICE display with QXL video — dynamic resolution via vdagent; fullscreen via `remote-viewer`
+- `always_live: true` — stays running between shifts; suspended on game quit, resumed on next launch
+
+Player never needs to interact with the workstation VM's internal file system for
+game objectives — all quest work happens on the target VMs via SSH.
+
+### 5.3.2 Why XFCE + Chromium (not terminal-only)
+
+Earlier iterations used a terminal-only workstation. The game was redesigned
+because a terminal-only approach would require building a fake terminal and fake SSH.
+The XFCE + real browser approach is simpler, more realistic, and requires no
+terminal simulation at all:
+
+- Player uses a real Tilix terminal — no simulation
+- Player SSHes with real SSH — no protocol emulation
+- The HUD is a real web app — no custom UI framework needed for game chrome
+- Downside: workstation VM costs ~480–768 MB RAM; budget accordingly
+
+### 5.4 Snapshot Strategy
+
+Snapshots are the reset primitive and the save primitive.
+
+Named snapshot tiers per VM:
+
+| Name | Purpose |
+|------|---------|
+| `baseline.clean` | Authored starting state for a fresh quest arc |
+| `baseline.recovery` | Fallback if live state is unrecoverable |
+| `checkpoint.shift-{N}` | Auto-saved at start of each in-game shift |
+
+Rules:
+- Snapshot names are deterministic
+- Quest scripts may declare required baseline snapshots
+- Validation never depends on snapshot history; only current observed state
+- The game retains a maximum of 5 shift checkpoints per VM; older ones are pruned
+- `baseline.clean` and `baseline.recovery` are never pruned by the game
+
+### 5.5 Networking Model
+
+Networking is host-controlled through libvirt.
+
+Supported modes:
+- `quest`: constrained, deterministic virtual networks and fixtures
+- `sandbox`: broader connectivity for experimentation
+
+Examples:
+- Internal-only network between workstation and target VM
+- Broken DNS as part of a quest
+- Deliberately degraded service reachability
+- Optional outbound package mirror access for selected scenarios
+
+### 5.6 VM Provisioning Hooks
+
+Quest-specific VM state — broken configs, missing files, log histories — is
+authored into the VM baseline before the snapshot is taken. This is done via
+idempotent provisioning scripts:
+
+```
+tools/vm/quest-prep/Q0XX-prep.sh
+```
+
+These scripts run against the target VM before the quest's `baseline.clean`
+snapshot is taken. They are never run at quest activation time. See
+QUEST_AUTHORING.md for the full provisioning workflow.
+
+---
+
+## 6. OBSERVATION AND VALIDATION
+
+### 6.1 Validation Philosophy
+
+Quest completion is based on **system state**, not on how the player got there.
+
+Allowed evidence includes:
+- Files and directory contents
+- Ownership and permissions
+- Service state
+- Process state
+- Open ports
+- Package state
+- Mount state
+- Disk utilization
+- System configuration values
+
+Disallowed as primary success conditions:
+- Specific commands typed
+- Specific files opened
+- UI click history
+
+### 6.2 Observation Sources
+
+Primary sources:
+- `virsh domstate`, `domifaddr`, and domain metadata
+- Host-driven inspection tooling such as libguestfs where practical
+- SSH-based read-only checks initiated by the host when needed
+- Quest-specific host probe scripts for higher-level state summaries
+
+Authoritative rule:
+- Quest validation must use host-authoritative checks only
+- In-guest helpers may improve responsiveness, but cannot decide success
+
+In-guest helpers should use neutral names (examples: `atlas-index`, `yardd`,
+`ops-telemetry-cache`) and must not be trusted as a security boundary.
+
+Operational note:
+- Routine package operations inside guests may emit maintenance or virtualization
+  notices that break immersion
+- Base images should suppress or tune guest maintenance messaging where safe
+  for the authored environment
+- Validation and incident design should not rely on noisy package-manager side
+  effects being visible to the player
+
+### 6.3 Validation Rule Model
+
+Core rule families:
+- `file_exists` / `file_contains` / `file_mode` / `file_owner`
+- `directory_exists`
+- `service_state` / `service_enabled`
+- `process_running` / `process_user`
+- `port_listening`
+- `package_installed`
+- `mount_present`
+- `disk_usage_below` / `disk_usage_above`
+- `command_assert` — fallback only, must verify state not behavior
+- `and` / `or` / `not`
+
+### 6.4 Trust Boundary
+
+The player may gain root access on some machines. The guest is not trusted. The
+host validation layer is trusted. Anti-cheat is achieved through external
+validation, not secrecy.
+
+---
+
+## 7. GAMEPLAY SYSTEMS
+
+### 7.1 Core Loop
+
+1. Ticket arrives with incomplete context
+2. Player evaluates urgency against other active problems
+3. Player enters or connects into the relevant VM
+4. Player investigates using real Linux tools
+5. Player applies a fix
+6. Game validates resulting state
+7. World reacts
+8. Trust shifts
+9. Future conditions reflect earlier choices
+
+### 7.2 System Pressure
+
+Pressure is systemic, not a countdown bar. Examples:
+- Disk usage keeps climbing
+- A log fills with worsening symptoms
+- A degraded service starts affecting another team
+- A quick fix suppresses one symptom while creating later instability
+
+Pressure is authored as state transitions and event chains via incident files.
+
+### 7.3 Trust / Reputation
+
+Trust measures how much the organization relies on the player.
+
+Trust affects:
+- sudo scope
+- accessible machines
+- diagnostic tooling
+- ticket sensitivity
+- documentation visibility
+
+**Trust increases** when the player resolves problems cleanly, finds root causes,
+and avoids collateral damage.
+
+**Trust decreases** when the player breaks unrelated systems, applies fragile
+fixes, ignores urgent incidents, or resolves symptoms but not causes.
+
+**Trust revocation**: if trust falls below a declared threshold in the trust
+unlock table, specific access strings are revoked. A subsequent trust increase
+does not automatically restore revoked access — the player must re-earn the
+unlock tier. Revocation rules must be explicitly declared per unlock tier.
+
+### 7.4 Multiple Valid Solutions
+
+Quests support realistic alternatives where possible:
+- quick workaround
+- operationally acceptable fix
+- proper long-term fix
+
+Branch resolution rule:
+- multiple branches may match the same final state
+- each branch must declare a numeric `priority`
+- the highest matching priority wins
+- ties are a content error and fail validation during authoring checks
+
+### 7.5 Dynamic Events
+
+Dynamic events inject prioritization pressure and are authored in incident files.
+Events are selected from authored pools and activated by progression, trust,
+current system state, and world flags.
+
+Each incident declares a `blast_radius_quests` list so the incident scheduler
+can avoid activating an incident that would corrupt active quest evidence or
+simultaneously interfere with an in-progress objective.
+
+### 7.6 Investigation Quality
+
+Clues must be legible and grounded. Every quest declares a `clue_fingerprint`
+documenting what evidence exists in the VM baseline. Content validation checks
+that the fingerprint is plausible. The player should feel rewarded for competent
+debugging rather than guessing.
+
+### 7.7 Progression
+
+Progression unlocks:
+- broader sudo access
+- new servers
+- more dangerous responsibilities
+- better internal docs
+- helper scripts and diagnostics
+
+This is institutional progression, not character stats.
+
+### 7.8 Mentor Thread
+
+Marcus is the primary mentor character. His dialogue runs across the full game
+as a `series_id: marcus-main` thread. Each dialogue file that belongs to an
+ongoing character relationship declares `series_id` and `series_position`.
+
+The dialogue system tracks series state so Marcus remembers what happened in
+earlier quests and can reference it in later ones. This is the primary vehicle
+for institutional memory and character continuity.
+
+### 7.9 Tone and Humor
+
+The tone is dry, realistic, and slightly dysfunctional. Examples:
+- contradictory runbooks
+- tickets that misidentify the problem
+- passive-aggressive internal notes
+- perfect urgency attached to trivial formatting requests
+
+Humor must support immersion, not break it.
+
+---
+
+## 8. COMMAND AND ACCESS MODEL
+
+Access is controlled realistically through:
+- user accounts and group membership
+- sudoers configuration
+- reachable hosts
+- available packages and tooling
+
+If a player cannot run `systemctl`, the reason is that the VM account lacks the
+required privileges, not that the game disabled the verb.
+
+---
+
+## 9. PRESENTATION LAYER
+
+The player's view is the workstation VM desktop, viewed fullscreen via SPICE:
+
+```bash
+scripts/start-game.sh
+# → starts game server
+# → virsh start sc-workstation (if not already running)
+# → remote-viewer --full-screen spice://127.0.0.1:<port>
+```
+
+The player sees an XFCE desktop with Chromium pre-opened to the HUD.
+
+### 9.1 VM Display
+
+- **Protocol**: SPICE with QXL video driver
+- **Client**: `remote-viewer` (from `virt-viewer` package) in fullscreen mode
+- **Resolution**: dynamic — guest vdagent resizes to match host display
+- **Cursor release**: `Ctrl+Alt`; fullscreen toggle: `F11`
+- **Clipboard sharing**: via spice-vdagent in the guest
+
+No VNC, no custom viewer widget. The host runs `remote-viewer` and the player
+works inside the workstation VM.
+
+### 9.2 HUD (Svelte Web App)
+
+The game HUD is a Svelte single-page app served at `http://192.168.100.1:3000`:
+
+- **TicketsPanel** — ticket list, detail view, "Mark Complete" button
+- **MailPanel** — inbox, message body, reply buttons (where applicable)
+- **DocsPanel** — trust-gated internal docs, rendered from content/docs/
+- **SagePanel** — chat interface to SageService knowledge base
+- **HeaderBar** — trust indicator (no number, behavior only), shift timer, unread badge
+
+The HUD is a company intranet portal in look and feel — dark, monospace, minimal.
+
+### 9.3 One-Time Setup and Uninstall
+
+Host-side setup is unavoidable (KVM, libvirt, VM images). It must be simple.
+
+Principles:
+- one-time setup only (`tools/setup/first-run-setup.sh`)
+- plain-language explanation of what will be installed
+- managed resources use the `sc-` prefix (never touch other libvirt domains)
+- full uninstall removes all game-owned domains, networks, storage, helper files
+- normal gameplay does not require broad `sudo`
+
+---
+
+## 10. DATA MODEL
+
+Authoring formats:
+- JSON for quests, tickets, incidents, dialogue, documentation metadata
+- Shell helper scripts where CLI integration is necessary
+
+Top-level content domains:
+
+| Domain | Purpose |
+|--------|---------|
+| `quests/` | Objective chains and validation rules |
+| `tickets/` | Player-facing problem statements |
+| `incidents/` | Dynamic system pressure events |
+| `dialogue/` | Workplace messages, hints, follow-ups |
+| `docs/` | Internal documentation metadata/content |
+| `progression/` | Trust thresholds, unlocks, access tiers |
+| `vm_profiles/` | Domain names, snapshots, networks, probe config |
+| `helpers/` | Non-obvious guest helper naming/config data |
+| `world_flags/` | Central registry of all world state flags |
+
+Each authored scenario must declare:
+- `required_vms` — all VMs the quest touches
+- `baseline_snapshot` — starting snapshot for this quest
+- `clue_fingerprint` — evidence declared in the VM baseline
+- validation rules and branch priorities
+- escalation behavior
+- trust impact
+- `blast_radius` — incident IDs the quest may interact with
+- follow-on world effects
+
+---
+
+## 11. SAVE MODEL
+
+### 11.1 Dirty State Model
+
+The game uses a **dirty state model**. VM disk state is preserved across
+sessions as-is. The game does not revert to a clean baseline on load — it
+resumes from whatever state the VMs are currently in.
+
+This is intentional. The player's history of changes is part of the game. A
+machine they fixed stays fixed. A machine they damaged stays damaged until they
+repair it or request reimage.
+
+Two persistence layers:
+
+**Game State Layer** — saved as JSON:
+- Trust score and history
+- Unlocked access, sudo scopes, docs, tools
+- Active/completed quest and ticket state
+- World flags (current values and change history)
+- Incident scheduler state
+- In-world clock and shift counter
+
+**VM State Layer** — saved as libvirt snapshot references:
+- Per-VM reference to current snapshot tier or live disk
+- Per-VM managed recovery checkpoint list
+- Reimage history per VM
+
+### 11.2 Shift Checkpoints
+
+At the start of each in-game shift:
+1. Game state JSON is saved
+2. A named snapshot is created per active VM: `checkpoint.shift-{N}`
+3. The checkpoint reference is recorded in the save file
+4. Shift checkpoints beyond the retention limit (default: 5) are pruned
+
+Shift checkpoint rollback is an explicit player action ("start this shift
+over") with a confirmation prompt. It does not undo trust changes or dialogue
+already delivered.
+
+### 11.3 Load-Time Reconciliation
+
+On load, the observation service validates current VM state against saved world
+flags. Minor drift is handled silently. Major drift — missing snapshots,
+unbootable VMs — triggers the recovery flow.
+
+If a referenced snapshot is missing:
+- If `baseline.recovery` exists, offer resume from recovery
+- If `baseline.recovery` is also gone, the VM is treated as unrecoverable
+
+### 11.4 Recovery / Reimage Flow
+
+When a VM is unrecoverable, the player can report it for reimage through an
+in-world mechanic:
+
+1. Player submits a reimage request (ticket to management)
+2. In-world delay is imposed (one in-game shift)
+3. Machine is restored from `baseline.recovery` or `baseline.clean`
+4. Trust penalty is applied based on severity
+5. In-progress quests on that VM are reset
+6. Evidence from before the reimage is gone — acknowledged in-world
+
+This is the designed escape valve. It has visible consequences but allows
+forward progress.
+
+### 11.5 Host Storage Management
+
+qcow2 images with many snapshots can balloon. The game enforces:
+- Maximum of 5 shift checkpoints per VM (configurable in vm_profile)
+- Authored baseline and recovery snapshots are never pruned by the game
+- `resource_budget` in vm_profile declares expected disk footprint
+
+### 11.6 Developer Reset
+
+Not available in the shipped game. CLI only:
+
+```bash
+bash tools/vm/snapshot-all.sh --revert-to baseline.clean
+```
+
+Completely resets all VMs to authored baseline. Used during content authoring
+and automated test runs.
+
+---
+
+## 12. MODULE BREAKDOWN
+
+### Server (`server/src/`)
+
+| Module | Responsibility |
+|--------|----------------|
+| `index.js` | Express + WebSocket entry point; service wiring; static file serving |
+| `ContentLoader` | Loads all content/ JSON at startup; never writes |
+| `QuestEngine` | Quest state machine (pending → active → resolved) |
+| `TicketService` | Ticket state, mark-complete handler, branch resolution |
+| `ValidationEngine` | SSH into VMs, evaluates all rule types against real state |
+| `VMManager` | virsh start/stop/snapshot/getIP wrappers |
+| `TrustSystem` | Score tracking, unlock evaluation, revocation |
+| `ProgressionSystem` | Unlocked docs, VMs, access strings |
+| `EmailService` | Inbox, follow-up emails, reply options, WebSocket push |
+| `SageService` | Rule-based dialogue / knowledge base |
+| `ShiftTimer` | Shift clock, broadcasts shift:tick via WebSocket |
+| `IncidentScheduler` | Pressure tick loop, incident injection |
+| `ShiftReviewService` | End-of-shift performance review email generation |
+| `CertificationService` | Awards internal certs after quest chain completion |
+| `SaveState` | Read/write `~/.local/share/sysadmin-chronicles/save.json` |
+| `lib/ssh.js` | Promisified SSH command execution (node-ssh) |
+| `lib/virsh.js` | virsh command wrappers |
+| `lib/eventBus.js` | Internal Node.js EventEmitter for service coordination |
+
+### Frontend (`frontend/src/`)
+
+| Component | Responsibility |
+|-----------|----------------|
+| `App.svelte` | Root component; WebSocket connection; panel routing |
+| `TicketsPanel` | Ticket list, detail, mark-complete flow |
+| `MailPanel` | Inbox, message body, reply buttons |
+| `DocsPanel` | Trust-gated doc list and content viewer |
+| `SagePanel` | Chat interface, follow-up prompts |
+| `VmsPanel` | Live VM status indicators |
+| `HeaderBar` | Trust display, shift timer, mail unread count |
+| `lib/api.js` | Fetch wrapper for all REST API calls |
+
+---
+
+## 13. SECURITY AND SAFETY
+
+Requirements:
+- Scope libvirt resources to dedicated game domains/networks/storage pools
+- Never operate on arbitrary host VMs by default
+- Use explicit naming/prefixing for all game-managed resources (`sc-` prefix)
+- Separate quest-mode constrained networks from broader sandbox networks
+- Prefer least-privilege host integration
+- Provide a dry-run and diagnostic mode for development scripts
+
+The game manages only the resources it created or was explicitly pointed at
+during setup.
+
+---
+
+## 14. TECHNOLOGY DECISIONS
+
+| Technology | Role | Reason |
+|-----------|------|--------|
+| Node.js / Express | Game server | Async I/O, native SSH/virsh via child_process, easy JSON |
+| Svelte / Vite | Web HUD | Lightweight, no virtual DOM overhead, fast build |
+| WebSocket (`ws`) | Real-time push | Trust changes, mail, incidents without polling |
+| QEMU/KVM | Virtualization backend | Real Linux environments |
+| libvirt / virsh | VM lifecycle control | Standard Linux automation surface |
+| SPICE + QXL | Workstation display | Dynamic resolution, clipboard sharing, fullscreen |
+| `remote-viewer` | Host-side SPICE client | Ships with virt-viewer; fullscreen with F11 |
+| JSON | Content authoring | Data-driven, easy to diff, unchanged from prior design |
+| node-ssh | SSH execution in validation | Clean Promise API; BatchMode, key-based auth |
+
+Not in scope: v86, WebAssembly, browser-only runtime, service-worker networking.
+
+---
+
+## 15. DEVELOPMENT PRIORITIES
+
+1. Native architecture consistency
+2. VM control integration
+3. Observation and validation
+4. Core gameplay loop
+5. Pressure, trust, and dynamic event systems
+6. Presentation polish
+
+If a design choice improves presentation but weakens VM realism or maintainable
+automation, reject it.
@@ -0,0 +1,459 @@
+# Characters — Sysadmin Chronicles
+
+Story design reference. All characters, bios, relationships, and open story hooks.
+For company/world context see `COMPANY_LORE.md`. This file focuses on the people.
+
+---
+
+## Active Characters
+
+These characters have an established in-game voice and presence. Any new quest work
+should treat their characterization here as canonical.
+
+---
+
+### The Player
+**Role:** New junior sysadmin hire, day one  
+**Identity:** Unnamed. Player-selected portrait (5 options).
+
+Hired to replace Dale. Nobody will explain what Dale did. Badge number is still
+pending — temp credentials were handled by someone in Finance on their first day.
+The player is a competent professional, not a bumbling intern. They may not know
+every answer but they know how to look.
+
+The player has no spoken lines. Their character is expressed entirely through the
+choices they make when fixing things — whether they understand root causes or just
+clear symptoms, whether they leave systems better or just less broken.
+
+---
+
+### Marcus Webb
+**Role:** Senior Systems Administrator  
+**Email:** `m.webb@axiomworks.internal`  
+**Reports to:** Dave Kowalski (Director of IT)
+
+Six years at Axiom Works. Hired by Kowalski. Knows where everything is, why it's
+there, and which parts were a mistake. Communicates in short, precise messages.
+Does not explain things twice. Trusts competence over credentials — he will give
+the player more rope as they demonstrate they know what to do with it. If they
+don't, the rope gets shorter.
+
+He was the one who onboarded the player. He assigned their first ticket. He will
+assign most of the tickets that follow. His messages range from brief task
+assignments to late-night observations about something that's been on his mind —
+the latter usually mean something is about to become a problem.
+
+He knows what Dale did. He has decided not to discuss it.
+
+**Personality:** Dry. Technically precise. Does not perform enthusiasm. Occasionally
+wry but never jokey. Respects players who fix root causes. Mildly annoyed by
+players who fix symptoms and call it done.
+
+**Relationships:**
+- Kowalski: reports to him; respectful but not deferential
+- Sarah: professional; takes her tickets seriously, occasionally says quiet things when she's wrong
+- Priya: mutual professional respect; they operate in the same zone of "things that matter when they go wrong"
+- Phil Ruiz (Sales VP): warm; Phil owes Marcus for saving a demo once and Marcus has never mentioned it
+
+---
+
+### Sarah Chen
+**Role:** Product Manager, AxiomFlow  
+**Email:** `s.chen@axiomworks.internal`
+
+Owns the AxiomFlow product roadmap. Coordinates between sales, engineering, and
+customers. Emails Monday mornings. Cares intensely about the demo and staging
+environments because those are the product she can actually see and touch. Not wrong
+about their importance.
+
+She files tickets when things break on the product-facing side. Her descriptions of
+problems are accurate about symptoms and often wrong about causes — she will
+confidently diagnose a permissions issue as a script bug, or a package problem as a
+config error. She is not incompetent; she just doesn't have the full picture. When
+the player fixes the underlying cause rather than the surface symptom, she notices.
+
+She has a sharp edge when things get worse after someone touches them. She will say
+so, clearly, without being melodramatic about it.
+
+**Personality:** Direct. Metric-oriented. Not patient with vague timelines or "we're
+looking into it." Appreciates being told what the actual problem was, not just that
+it's fixed.
+
+**Relationships:**
+- Marcus: professional; trusts that her tickets will be handled, doesn't ask for much
+- Player: initially impersonal (they're new); warms or cools based on outcomes
+- Nikhil Sharma: upstream dependency — his build pipeline affects her deployments
+
+---
+
+### Priya Nair
+**Role:** Head of Security & Compliance  
+**Email:** `p.nair@axiomworks.internal`  
+**Direct report:** James Osei (Security Analyst)
+
+Leads all security reviews, access audits, and compliance programmes. Has a standing
+Thursday meeting with David Park (CTO) that has existed since 2017. Was brought in
+after an incident nobody discusses in public. Has been building the security function
+from something informal into something that can survive a SOC 2 audit.
+
+She frames everything in terms of what happens when things go wrong, not whether they
+will. She assumes breach. She assumes misconfiguration. She is often right. She is
+not someone who appreciates hearing about a production change after it has already
+happened.
+
+She will tell the player when a fix is correct and why. She will also tell them when
+a fix works but leaves the environment in a worse position than before. She is not
+punitive about this — she just states it.
+
+She does shift reviews at end-of-shift and grades the player's overall performance.
+Her criteria: did the work move forward, did the environment stay stable, did the
+player create extra problems.
+
+**Personality:** Precise. Consequence-focused. Calm in tone even when the content
+is not calm. Economical with words. Does not use exclamation marks.
+
+**Relationships:**
+- Player: evaluative; her trust is earned by demonstrating that security is a
+  consideration, not an afterthought
+- Marcus: peer respect; they operate in different domains with overlapping concerns
+- Dave Kowalski: reports indirectly up through him for infrastructure decisions
+- David Park: standing Thursday meeting; she has the CTO's ear
+
+> **Name note for developers:** The in-game email service and some ticket files
+> previously used "Priya Kapoor" and the onboarding doc used "Priya Singh."
+> These are all the same character. **Priya Nair** is the canonical name.
+> Email should be `p.nair@axiomworks.internal`. Update references in
+> `server/src/services/EmailService.js`, `content/tickets/T007.json`, and
+> `content/docs/onboarding.json`.
+
+---
+
+### Dave Okonkwo
+**Role:** Internal employee, non-technical  
+**Email:** `d.okonkwo@axiomworks.internal`
+
+A regular Axiom Works employee who notices when things aren't working and files
+tickets about it. He doesn't know enough to diagnose the problem — he reports
+symptoms accurately and assumes the wrong cause. His reports are useful precisely
+because they represent what a non-technical user actually experiences.
+
+He is not on the company website (280 employees, most of them aren't). He's
+somewhere in operations or general staff. He's not in Finance, not in IT.
+
+> **Open decision:** Dave Okonkwo is currently the only employee-level character who
+> submits tickets. The company website has Dave Kowalski as Director of IT Operations
+> (Marcus's boss), which is a completely different person. This is not a naming
+> inconsistency — they're two different people. However: if the story wants Kowalski
+> to become an active character who also files tickets or escalates issues, that's a
+> separate thread. Okonkwo and Kowalski coexist.
+
+---
+
+## Named Background Characters
+
+On the company website. No current in-game presence. Available for story use —
+they can send emails, appear on CC lines, be referenced in dialogue, or become
+active characters in new quests.
+
+Listed in rough order of story relevance to the IT/sysadmin context.
+
+---
+
+### Dave Kowalski — Director of IT Operations
+Marcus's manager. The player's skip-level. Background is network engineering —
+has Cisco certifications he will not volunteer unless provoked. Oversees systems
+(Marcus's domain), networking (Tom Malaney), and IT support. Has been at Axiom
+Works since 2015. Describes the infrastructure as "mature." Sends weekly status
+emails in bullet points that never quite answer the question. When things go wrong
+he schedules a meeting to "talk through the situation," which everyone has learned
+is worse than a direct message.
+
+Has said "we should really document that" more times than he can count. Has
+documented very little personally. Maintains a mysterious Tuesday 2–3pm calendar
+block.
+
+Story use: source of policy pressure, indirect escalation, the person who asks
+questions that reveal Marcus hasn't told the player everything.
+
+---
+
+### Nikhil Sharma — Platform Engineer
+Owns the internal build and release pipeline, the CI infrastructure, and the
+parts of deployment that nobody else wants to think about. Strong opinions about
+reproducible builds. Sends Slack messages at 6am. Occasionally at 11pm.
+
+He is the engineer most directly connected to what happens on vulcan — if a build
+is broken, it's probably something Nikhil built or maintains. He has never met the
+player. He almost certainly doesn't know the player exists.
+
+Story use: the author of broken packages the player has to debug; a character who
+can explain (or fail to explain) what went wrong upstream; an escalation path when
+a build problem is genuinely his fault.
+
+---
+
+### Tanya Okafor — Head of Customer Success
+Manages post-sale relationships for all AxiomFlow customers and the twelve legacy
+AxiomSync accounts that haven't migrated. Uses the word "partnership" a lot.
+
+Usually the first person to know when something is wrong in production, because a
+customer has already called her before IT knows there's a problem. Her call log
+is an early warning system. She is not hostile to IT but she has learned that
+"we're looking into it" is not an answer she can give a customer.
+
+Story use: pressure vector from the customer direction; source of urgency that
+doesn't come from Marcus or the ticket queue; demonstrates real-world stakes when
+things go down.
+
+---
+
+### Phil Ruiz — VP of Sales
+Has been promising features to prospects since 2016. Maintains a warm relationship
+with the infrastructure team because Marcus once fixed the staging environment with
+twenty minutes to spare before a major demo — Phil has never forgotten this. Travels
+frequently. Expense reports submitted promptly, which Marcus has noted approvingly.
+
+Story use: indirect beneficiary when demos work; pressure source when a sales demo
+is scheduled and something is broken; the person who will tell the CTO what IT did
+right in a room the player will never be in.
+
+---
+
+### Yusuf Halabi — Engineering Manager
+Reports to David Park (CTO). Manages the core AxiomFlow platform team. Runs the
+Thursday architecture review. Has opinions about test coverage. Leaves pull request
+comments that are technically correct and diplomatically suboptimal.
+
+Story use: engineering-side escalation; source of tickets about internal tooling;
+the person who will ask why a config change broke a downstream process.
+
+---
+
+### Derek Ashford — Financial Controller
+Does not appear at team meetings. Does appear on CC lines of every email that
+mentions cloud costs, hardware procurement, or infrastructure budget. Always
+replies-all. His manager is Rachel Brandt (CFO).
+
+Story use: background texture on procurement requests; the voice that makes any
+infrastructure spending feel like a negotiation.
+
+> **Note on "Dave from Finance":** Marcus's day-one message references "Dave from
+> Finance" as the person holding the player's temp credentials. This is almost
+> certainly Derek Ashford — Marcus using his first name informally, or a
+> continuity error. Derek Ashford is the only Finance character plausibly holding
+> IT credentials. His first name is Derek, not Dave — either the message should
+> be corrected, or "Dave from Finance" is a third unnamed Finance employee.
+
+---
+
+### Rachel Huang — Systems Administrator
+Marcus's peer on the IT team. Handles provisioning, patch cycles, and the ongoing
+negotiation with Finance over cloud consolidation. Came from a managed services
+background. Has strong opinions about monitoring dashboards, most of which are
+correct.
+
+Story use: the person who set something up that the player now has to maintain;
+a colleague who can provide context Marcus won't; someone whose provisioning
+decisions the player will encounter as infrastructure.
+
+---
+
+### Tom Malaney — Network Engineer
+Responsible for network infrastructure across the office and hosted environments.
+On-call for more holiday weekends than he would like. Thorough in documentation
+when he finds time for it.
+
+Story use: DNS, firewall, or routing problems that are not the player's fault
+but become the player's problem; someone who can be reached but is slow to
+respond.
+
+---
+
+### James Osei — Security Analyst
+Priya's direct report. Handles vulnerability assessments, access reviews, and
+quarterly compliance reporting. Methodical. Has a spreadsheet for everything,
+which is not a criticism.
+
+Story use: the person who runs the actual audit that Priya will summarize to the
+player; a source of detailed (sometimes overwhelming) security findings.
+
+---
+
+### Ellen Marsh — CEO & Co-Founder
+Built the first version of AxiomFlow after a decade in operations. No CS background.
+Attends all-hands twice a year. Does not use Slack. Has final say on pricing and
+major customer commitments.
+
+Story use: the distant authority whose priorities shape everything; never interacts
+with the player directly, but her decisions land as constraints.
+
+---
+
+### David Park — CTO & Co-Founder
+Wrote the original rules engine in 2011. Now manages engineering managers. Still has
+opinions about the data model. Has a standing Thursday meeting with Priya that hasn't
+moved since 2017.
+
+Story use: architectural decisions from above; the person Priya reports significant
+security findings to.
+
+---
+
+### Karen Volkov — COO
+Joined 2014. Responsible for the fact that the company has documented processes for
+anything at all. Has opinions about infrastructure costs that surface in IT's world
+via Finance. Prefers decisions with clear owners and deadlines.
+
+---
+
+### Rachel Brandt — CFO
+Joined 2016. Approves all capital expenditure over $5,000. Working to consolidate
+cloud spend. Does not enjoy surprises in the infrastructure budget. Derek Ashford
+reports to her.
+
+---
+
+### Mei Lin — Senior Software Engineer
+Has maintained AxiomSync's integration layer since 2018. Knows more about it than
+anyone would prefer, including herself. Currently leading the migration tooling
+project for the remaining legacy accounts.
+
+---
+
+### Cora Reyes — Software Engineer
+Works on the AxiomDash reporting pipeline. Has submitted more internal RFCs than
+anyone else on the team in the past year. Moving toward senior.
+
+---
+
+### Ben Portillo — Product Manager, AxiomDash
+Leads product development for the analytics add-on. Works closely with large
+accounts to understand what they actually want from dashboards (usually different
+from what they asked for).
+
+---
+
+### Annika Gosse — UX Designer
+Responsible for AxiomFlow's interface. Has been advocating for a redesign of the
+workflow builder since 2022. Patient.
+
+---
+
+### Sandra Wu — HR Manager
+Manages hiring, onboarding, and employee relations since 2016. Runs the new-hire
+onboarding process (three days, thorough). Sends birthday emails on time, every time.
+
+---
+
+### Owen Blake — Office Manager
+Keeps the office running. Has fixed more things than his job title implies. The
+person to contact if conference room equipment stops working.
+
+---
+
+### Mike Kawamoto — Account Executive
+Handles mid-market manufacturing accounts in the northeast. Believes strongly in
+the demo environment. Closes more deals in Q4 than any other quarter.
+
+---
+
+### Lisa Ferreira — Customer Success Manager
+Manages onboarding for new AxiomFlow deployments. Has a talent for understanding
+what customers mean rather than what they say.
+
+---
+
+## Unresolved Characters (Story Hooks)
+
+These are referenced in existing content but never defined. They represent the
+strongest open narrative threads.
+
+---
+
+### Dale — The Previous Sysadmin
+**Reference:** Marcus's day-one message — "You're replacing Dale. Nobody will tell you
+what Dale did because it's complicated."
+
+Dale is gone. The player has their desk, their access provisioning slot, and
+apparently their reputation — people know the player is "Dale's replacement" before
+they know the player's name. The systems the player inherits are the systems Dale
+last touched.
+
+What Dale did is unknown. It is described as "complicated." Marcus knows. Possibly
+Kowalski knows. Possibly Priya knows, if it was security-related.
+
+This is the strongest existing narrative mystery in the game. It has setup and no
+payoff. Dale's story could be:
+- A technical incident (something Dale broke and couldn't fix)
+- A policy violation (something Dale did that wasn't malicious but wasn't right)
+- A trust collapse (competent but burned bridges)
+- Something personal
+- Any combination
+
+The player finding out what Dale did — gradually, through the systems they work on,
+through things people let slip — is a natural story spine for the whole game.
+
+---
+
+### "Dave from Finance" — Day One Reference
+**Reference:** Marcus's day-one message — "Dave from Finance has your temp credentials.
+He's on three today."
+
+Almost certainly Derek Ashford (Financial Controller), referred to informally. But
+Derek's first name is Derek, not Dave — this is either Marcus being casual with
+names, a continuity error, or a genuinely separate unlisted Finance employee.
+
+Needs a decision: correct "Dave" to "Derek" in Marcus's message, or introduce a
+separate "Dave from Finance" as a minor character.
+
+---
+
+## Key Relationships Map
+
+```
+Ellen Marsh (CEO)
+  └── David Park (CTO)
+        └── Yusuf Halabi (Eng Manager)
+              ├── Mei Lin
+              ├── Cora Reyes
+              └── Nikhil Sharma
+  └── Karen Volkov (COO)
+  └── Rachel Brandt (CFO)
+        └── Derek Ashford (Financial Controller)
+  └── Phil Ruiz (VP Sales)
+        ├── Mike Kawamoto
+        └── Tanya Okafor
+              └── Lisa Ferreira
+
+Dave Kowalski (Director of IT)
+  ├── Marcus Webb  ←── Player's manager
+  │     └── [Player]
+  ├── Rachel Huang
+  └── Tom Malaney
+
+Priya Nair (Head of Security)
+  └── James Osei
+
+Sarah Chen (Product, AxiomFlow)  ←── frequent ticket source
+Ben Portillo (Product, AxiomDash)
+Annika Gosse (UX)
+```
+
+---
+
+## Tone Notes for New Story Work
+
+- **Marcus talks like someone who has answered this question before.** Precise, low
+  affect, no wasted words. Never condescending — just efficient.
+- **Sarah talks like a PM: outcome-focused, slightly impatient, specific about
+  what she needs.** She is not a villain. She has real deadlines.
+- **Priya talks like someone who has already thought about what goes wrong.** She
+  doesn't speculate — she states. She's not alarming, she's matter-of-fact.
+- **Dave Okonkwo talks like someone who doesn't know what the problem is** but is
+  trying to be helpful by reporting exactly what he observed. He should never be
+  made to look stupid — he's doing the right thing.
+- **The company takes itself seriously.** Humor comes from the gap between official
+  language and reality, not from anyone being a cartoon.
+- **Problems have plausible causes.** Systems broke because someone made a
+  reasonable decision under time pressure, not because they were careless idiots.
+  The player should feel like a professional, not a janitor.
@@ -0,0 +1,165 @@
+# Axiom Works — Company Lore Reference
+
+> For quest authors, dialogue writers, and ticket copy. Keep the tone dry and
+> believable. The company should feel real, slightly dysfunctional, and just
+> plausible enough that players recognise the type.
+
+---
+
+## Who They Are
+
+**Axiom Works** is a B2B enterprise software company founded in 2011. Headquarters
+is in a three-floor office park that is technically "downtown adjacent" depending
+on how charitable you are with the map. They have about 280 employees. The
+Glassdoor rating is 3.8 stars and management checks it obsessively.
+
+Their flagship product is **AxiomFlow** — a workflow automation platform aimed at
+mid-size manufacturers, logistics companies, and anyone who got a 90-minute demo
+and thought it looked easy. Most customers are still on the workflow they set up
+in 2019. The platform does what it says. Marketing says it does considerably more.
+
+---
+
+## Products
+
+| Product | Description | Status |
+|---------|-------------|--------|
+| **AxiomFlow** | Workflow automation platform | Active, main revenue |
+| **AxiomDash** | Reporting and analytics add-on | Active, profitable, under-resourced |
+| **AxiomSync** | Legacy data integration layer | End-of-sale since 2021, still maintained for 12 customers who refuse to migrate |
+
+The current marketing tagline is *"Streamline. Scale. Succeed."* It replaced
+*"Work smarter, not harder"* in Q3 of last year. The one before that mentioned
+AI. Nobody is sure what the AI was.
+
+---
+
+## Infrastructure
+
+The company runs a mix of on-prem servers (named after Greek gods — a choice made
+by a contractor in 2017 who left before documenting anything) and a handful of
+cloud instances that accounting keeps trying to consolidate.
+
+| Host | Role | Notes |
+|------|------|-------|
+| **ares** | Player workstation | XFCE desktop, where the player works |
+| **hermes** | Web/app server | nginx, staging and demo environment for AxiomFlow |
+| **vulcan** | Build machine | Arch Linux, compiles artifacts, runs scheduled jobs |
+
+### Planned future systems
+As the game grows, additional machines will be added. Candidates:
+
+| Proposed host | Role | Greek connection |
+|---|---|---|
+| **poseidon** | Database server | Foundation, depths, reliability |
+| **apollo** | Mail / notification server | Messenger, communication |
+| **athena** | Internal tooling (ticketing, wiki) | Wisdom, knowledge management |
+| **argus** | Monitoring / alerting | The hundred-eyed watcher |
+| **mnemosyne** | Backup / storage | Memory, persistence |
+
+---
+
+## Characters
+
+### Dave Kowalski — Director of IT Operations
+The player's skip-level manager. Has been at Axiom Works since 2015. Hired Marcus.
+Oversees three teams: systems (Marcus's domain), networking, and IT support. Background
+is originally networking — has Cisco certifications he won't bring up unless someone else
+brings up Cisco certifications first. Sends weekly status emails formatted in bullet
+points that never quite answer the question you were asking. When things go wrong he
+schedules a meeting to "talk through the situation," which everyone has learned is
+worse than an email. Maintains a calendar block from 2–3pm on Tuesdays that nobody
+has ever asked about. Has said "we should really document that" approximately 400 times.
+Describes the infrastructure as "mature."
+
+### Marcus Webb — Senior Sysadmin
+The player's manager and the person who assigned them the ticket. Has been at
+Axiom Works for six years. Knows where all the bodies are buried. Communicates
+primarily in terse Slack messages and occasionally very long emails sent at 11pm.
+Trusts competence over process. Gets irritated by people who confuse symptoms
+with root causes.
+
+### Priya Nair — Security / Compliance
+Runs security reviews and has opinions about everything. Usually right. Tends to
+frame concerns in terms of what will happen when things go wrong rather than
+whether they will. Was brought in after an incident nobody talks about in public.
+
+### Sarah Chen — Product Manager
+Represents the product team's perspective in the ticket queue. Cares about demo
+environments more than production ones because demos are what she can see. Not
+technically wrong about their importance. Emails at 8am on Mondays.
+
+### Derek Ashford — Financial Controller
+Does not appear in person. Appears on CC lines of emails where infrastructure
+costs are being discussed. Always replies-all. His full name is Derek Ashford.
+His manager is Rachel Brandt (CFO).
+
+---
+
+## Background Characters (non-interactive, for world texture)
+
+These characters exist on the company website and in lore but do not appear in
+quests or dialogue. Use them for verisimilitude — email headers, CC lines, internal
+wiki author credits, that sort of thing.
+
+### Ellen Marsh — CEO & Co-Founder
+Built AxiomFlow after a decade in operations. Not technical. Attends all-hands
+twice a year. Has final say on pricing and major customer commitments. Does not
+use Slack. The player will never interact with her.
+
+### David Park — CTO & Co-Founder
+Wrote the original rules engine. Now manages engineering managers. Still has
+opinions about the data model. Has a standing Thursday meeting with security
+that hasn't moved since 2017.
+
+### Karen Volkov — COO
+Joined 2014. Responsible for the fact that Axiom Works has documented processes
+for anything. Has opinions about infrastructure costs. Prefers decisions with
+clear owners and deadlines.
+
+### Rachel Brandt — CFO
+Joined 2016. Approves all capital expenditure over $5,000. Does not enjoy
+surprises in the infrastructure budget. Derek reports to her.
+
+### Phil Ruiz — VP of Sales
+Has been promising features to prospects since 2016. Has a warm relationship
+with the infrastructure team because Marcus once saved a demo with 20 minutes to
+spare. Expense reports submitted promptly.
+
+### Tanya Okafor — Head of Customer Success
+Manages all post-sale customer relationships including the twelve AxiomSync
+holdouts. Usually the first to know when something is wrong in production,
+because a customer has already called her.
+
+### Yusuf Halabi — Engineering Manager
+Reports to the CTO. Manages the core AxiomFlow platform team. Has opinions
+about test coverage. Runs the Thursday architecture review.
+
+### Mei Lin — Senior Software Engineer
+Has maintained AxiomSync's integration layer since 2018. Knows more about it
+than anyone would prefer.
+
+### Nikhil Sharma — Platform Engineer
+Owns the build and release pipeline and internal CI infrastructure. Occasionally
+sends Slack messages at 6am.
+
+### Sandra Wu — HR Manager
+Manages hiring, onboarding, and employee relations since 2016. Sends birthday
+emails on time, every time. Runs the new-hire onboarding process that takes
+three days.
+
+---
+
+## Tone Guidelines
+
+- **Dry, not sarcastic.** The company takes itself seriously. The humour comes
+  from the gap between how they describe things and what's actually happening.
+- **Specific, not generic.** "The AxiomSync customer in Cincinnati keeps calling"
+  is better than "a client is upset."
+- **Plausible dysfunction.** Problems happen because of reasonable decisions made
+  under time pressure, not because people are incompetent. The player should feel
+  like a real professional, not a janitor.
+- **No cartoon villains.** Derek from Finance is not evil. The product team is not
+  stupid. They have different priorities.
+- **The infrastructure has history.** It was built over time. Some parts are good.
+  Some parts were good in 2017. The player's job is to keep it working.
@@ -0,0 +1,641 @@
+# Installer & Distribution Plan
+> Status: Planning — not yet implemented.
+> Covers: installer, uninstaller, VM rebuild, save management, modular script architecture.
+
+---
+
+## Goals
+
+- Download zip from GitHub/Gitea, run `install.sh`, done.
+- Friendly tone throughout — this is a game, not a server deployment.
+- No jargon (libvirt, pool, domain, NAT) in any user-facing output.
+- Power users can follow the Manual Install section in README instead.
+- VM images live wherever the user puts the game (portable, large-drive friendly).
+- Full uninstall with explicit choices about what gets removed.
+- Users can rebuild individual VMs if something goes wrong.
+- Save data is resettable; save slots available for experimenting.
+
+---
+
+## `start-game.sh` Fixes
+
+The current launcher works but has two real bugs, several fragile assumptions, and
+no user-friendly output. Fix this in the same pass as the rest of the scripts since
+it will share `lib/ui.sh` and `lib/config.sh`.
+
+### Bugs to fix
+
+**Orphaned server process**
+The script ends with `exec remote-viewer`, which replaces the shell. The `trap`
+that was set to kill the server on EXIT disappears with the shell — so when the
+player closes the SPICE window, the game server keeps running silently.
+
+Fix: don't `exec`. Run `remote-viewer` normally, capture its PID, wait for it to
+exit, then kill the server cleanly.
+
+```bash
+# instead of:
+exec remote-viewer "$spice_uri"
+
+# do:
+remote-viewer "$spice_uri" &
+VIEWER_PID=$!
+trap 'kill "$SERVER_PID" "$VIEWER_PID" 2>/dev/null || true' EXIT INT TERM
+wait "$VIEWER_PID"
+```
+
+**`sleep 1` server readiness check**
+One second is a race. On a slow machine or if npm install just ran, the server
+may not be up. On a fast machine it's wasted time.
+
+Fix: poll in a tight loop with a timeout.
+
+```bash
+wait_for_server() {
+    local port="$1" timeout=15 i=0
+    while ! ss -tlnp | grep -q ":${port} " 2>/dev/null; do
+        sleep 0.3
+        ((i++))
+        [ $i -ge $((timeout * 3)) ] && return 1
+    done
+}
+```
+
+### Fragile assumptions to fix
+
+- **`lsof` for port check** — not universal. Replace with `ss -tlnp` (iproute2,
+  present on all modern Linux).
+- **No network check** — if the `sc-internal` libvirt network is inactive, the VM
+  starts but has no network. The HUD loads but shows nothing. Check the network is
+  active (and start it if not) before starting the VM.
+- **No images-dir check** — once portable installs land, `SC_IMAGES_DIR` might be
+  on an unmounted game drive. Check it exists before trying virsh ops.
+- **Frontend build at launch** — `"Building frontend..."` at game launch is odd UX.
+  Move this guard to install time. The launcher should only verify `dist/index.html`
+  exists and fail clearly if it doesn't (don't silently trigger a build).
+
+### UX improvements
+
+- Source `lib/ui.sh` and `lib/config.sh` once they exist.
+- Replace raw `echo "ERROR: ..."` with friendly messages. Examples:
+
+| Current | Replacement |
+|---|---|
+| `ERROR: virsh is required.` | `Your system is missing the virtual machine tools.\nRun install.sh to set up the game.` |
+| `ERROR: missing workstation domain: sc-workstation` | `Your game world hasn't been built yet.\nRun install.sh to finish setup.` |
+| `ERROR: node is required. Install Node.js 18+.` | `Node.js is required but wasn't found.\nRun install.sh to set up the game.` |
+
+- Show brief startup status so the player isn't staring at a blank terminal:
+
+```
+  Starting Sysadmin Chronicles...
+  ✓ Game server running
+  ✓ Workstation online
+  Opening your desk...
+```
+
+- Add `--manage-saves` and `--reset-save` flags (forward to `tools/save/manage-saves.sh`).
+
+### New flag: `--stop`
+
+Since the server now outlives the viewer when fixed, add `start-game.sh --stop`
+that kills any running game server process. Useful if something gets stuck.
+
+### Summary of changes to `start-game.sh`
+
+| Area | Change |
+|---|---|
+| Server shutdown | `exec` → normal run + `wait`, trap covers both server and viewer |
+| Server readiness | `sleep 1` → poll loop with 15s timeout |
+| Port check | `lsof` → `ss -tlnp` |
+| Network check | Add: verify `sc-internal` active, start if not |
+| Images dir check | Add: verify `SC_IMAGES_DIR` exists before virsh ops |
+| Frontend build | Remove from launcher; fail clearly if dist missing |
+| Error messages | Replace all with plain-English + fix instructions |
+| Startup output | Add three-line status before opening SPICE |
+| New flags | `--manage-saves`, `--reset-save`, `--stop` |
+
+---
+
+## Script Architecture
+
+All user-facing scripts share a common library layer. No logic is duplicated.
+
+```
+tools/
+  lib/
+    ui.sh          # colored output, prompts, spinners, progress bars
+    deps.sh        # distro detection, package name map, dep check/install
+    libvirt.sh     # virsh wrappers: network, pool, domain, snapshot ops
+    vm.sh          # build, rebuild, snapshot, revert per VM
+    config.sh      # read/write install config (~/.config/sysadmin-chronicles/config)
+    save.sh        # save slot management, reset helpers
+
+install.sh         # project root — the entry point for new users
+uninstall.sh       # project root — removal with options
+start-game.sh      # project root — launcher (checks env, starts server, opens SPICE)
+
+tools/
+  setup/
+    check-host.sh       # kept, improved UX, used internally by install.sh
+    first-run-setup.sh  # kept as internal lib target or merged into install.sh
+    seed-vms.sh         # kept as internal lib target, called by install.sh and rebuild
+  vm/
+    rebuild-vms.sh      # new: rebuild all or specific VMs
+  save/
+    manage-saves.sh     # new: list/switch/reset save slots
+```
+
+### `lib/ui.sh`
+- `sc_step "label"` — numbered step header
+- `sc_ok "msg"`, `sc_warn "msg"`, `sc_fail "msg"` — status lines
+- `sc_prompt "question" "default"` — interactive prompt, returns answer
+- `sc_confirm "question"` — yes/no, returns 0/1
+- `sc_spinner "label"` / `sc_spinner_stop` — background spinner for long ops
+- `sc_progress "label" current total` — simple fraction display
+
+### `lib/deps.sh`
+- `detect_distro` — sets `$SC_DISTRO` (arch, debian, ubuntu, fedora, opensuse)
+- `map_packages` — translates canonical dep names to distro package names
+- `check_deps` — returns list of missing deps
+- `install_deps "pkg1 pkg2 ..."` — runs the right package manager with sudo, logs what was installed
+
+### `lib/libvirt.sh`
+- `ensure_network name xml_path`
+- `ensure_pool name path`
+- `pool_path name` — returns the pool's target directory
+- `domain_exists name`, `domain_state name`
+- `snapshot_exists domain name`
+- `snapshot_create domain name description`
+- `snapshot_revert domain name`
+- `snapshot_delete domain name`
+
+### `lib/vm.sh`
+- `vm_build profile [--dry-run] [--force]` — wraps `build-vm.sh`
+- `vm_rebuild profile [--dry-run]` — destroy + rebuild from cloud image
+- `vm_revert vm_id snapshot_name` — revert to named snapshot
+- `vm_status vm_id` — running / stopped / missing
+- `vm_start vm_id`, `vm_stop vm_id`
+
+### `lib/config.sh`
+Config file lives at `~/.config/sysadmin-chronicles/config` (survives game dir moves).
+
+Variables stored:
+```bash
+SC_GAME_DIR=/home/user/Games/sysadmin-chronicles
+SC_IMAGES_DIR=/home/user/Games/sysadmin-chronicles/images
+SC_LIBVIRT_URI=qemu:///system
+SC_INSTALL_DATE=2026-04-27
+SC_INSTALLED_DEPS="libvirt qemu-system-x86 ..."  # what we added, for the log
+```
+
+- `config_read` — sources the config file
+- `config_write key value`
+- `config_show` — pretty-prints current config
+
+### `lib/save.sh`
+- `save_list` — lists all save slots with name, date, trust score, quest progress
+- `save_switch slot_name` — switch active save
+- `save_new slot_name` — create a new empty save slot
+- `save_reset [slot_name]` — wipe a slot back to new-game state
+- `save_export slot_name path` — export save JSON for backup
+- `save_import path slot_name` — import a save JSON
+
+---
+
+## Installer Design (`install.sh`)
+
+### Phase 1 — Welcome
+
+```
+╔══════════════════════════════════════════╗
+║       SYSADMIN CHRONICLES — SETUP       ║
+╚══════════════════════════════════════════╝
+
+Welcome! This installer will:
+  • Install a few system tools (KVM, QEMU, libvirt)
+  • Set up a private virtual network for the game
+  • Build three virtual machines (~30 minutes, once only)
+
+Where would you like to install the game?
+  [default: ~/Games/sysadmin-chronicles]  >
+```
+
+### Phase 2 — System check (silent)
+
+Internally calls `check_deps`. If all present, skip to Phase 4 silently.
+
+### Phase 3 — Dependency install (only if needed)
+
+```
+Your system is missing the following tools:
+  • KVM virtualization support (qemu-system-x86)
+  • Virtual machine manager (libvirt, virt-install)
+  • SPICE display viewer (virt-viewer)
+  • Cloud image tools (cloud-image-utils, genisoimage)
+
+Install them now? You'll be asked for your password.  [Y/n]
+```
+
+After install:
+- Log installed packages to `~/.local/share/sysadmin-chronicles/install.log`
+- Format: timestamp, package name, version, distro. Human-readable.
+- Note at end: "This log is kept so you know exactly what was added. See it at: ..."
+
+### Phase 4 — One-time network and storage setup
+
+```
+── Setting up game network ──────────────────
+  ✓ Private game network created
+  ✓ VM image storage configured at ~/Games/sysadmin-chronicles/images
+  ✓ Game access keys generated
+```
+
+User never sees "libvirt", "storage pool", "sc-internal", "sc-images".
+
+### Phase 5 — VM build
+
+```
+── Building your game world ─────────────────
+  This happens once and takes about 30 minutes.
+  You can leave this running in the background.
+
+  Building workstation (1/3) ........... ✓  8m 14s
+  Building web server   (2/3) ........... ✓  4m 02s
+  Building build server (3/3) ........... ✓  5m 31s
+  Setting up quest scenarios ........... ✓  1m 48s
+```
+
+### Phase 6 — Desktop entry
+
+```
+Create a desktop launcher so the game appears in your app menu?  [Y/n]
+```
+
+Creates `~/.local/share/applications/sysadmin-chronicles.desktop` if yes.
+
+### Phase 7 — Done
+
+```
+╔══════════════════════════════════════════╗
+║              SETUP COMPLETE!            ║
+╚══════════════════════════════════════════╝
+
+Start the game:
+  bash ~/Games/sysadmin-chronicles/start-game.sh
+  (or from your app menu if you created a launcher)
+
+If you ever need to rebuild the virtual machines:
+  bash ~/Games/sysadmin-chronicles/tools/vm/rebuild-vms.sh
+
+Install log saved at:
+  ~/.local/share/sysadmin-chronicles/install.log
+```
+
+---
+
+## Uninstaller Design (`uninstall.sh`)
+
+Improved from current: shows sizes, explains consequences, three-tier removal.
+
+### Menu approach
+
+```
+╔══════════════════════════════════════════╗
+║     SYSADMIN CHRONICLES — UNINSTALL     ║
+╚══════════════════════════════════════════╝
+
+What would you like to remove?
+
+  1) Everything — full uninstall (recommended)
+  2) Game world only — remove VMs, keep game files
+  3) Save data only — reset to new game
+  4) Custom — choose what to remove
+
+  q) Cancel
+
+>
+```
+
+### "Everything" breakdown (shows before confirming)
+
+```
+This will remove:
+
+  Game virtual machines (3 VMs + all snapshots)   ~38 GB
+  VM image files on disk                           ~38 GB  ← ask separately
+  Game network and storage configuration           <1 MB
+  Game access keys (~/.ssh/sc_host_key)            <1 KB
+  Desktop launcher (if created)                    <1 KB
+
+  System packages (libvirt, QEMU, etc.)            NOT removed
+  ↑ These were installed by your package manager.
+    See ~/.local/share/sysadmin-chronicles/install.log
+    if you want to remove them manually.
+
+Keep VM image files? If you ever reinstall, keeping them
+saves the 30-minute rebuild.  [Y/n — default: keep]
+
+Type REMOVE to confirm:  >
+```
+
+### What is never auto-removed
+
+- System packages (libvirt, qemu, virt-viewer, etc.)
+- Anything not prefixed with `sc-` in libvirt
+- Any other libvirt VMs or networks not owned by this game
+
+---
+
+## VM Rebuild Tool (`tools/vm/rebuild-vms.sh`)
+
+For when something goes wrong with a VM or the user wants a clean reset.
+
+```
+Usage:
+  rebuild-vms.sh                  Rebuild all VMs from scratch
+  rebuild-vms.sh --vm workstation Rebuild a single VM
+  rebuild-vms.sh --revert         Revert all VMs to baseline snapshot (fast, ~30s)
+  rebuild-vms.sh --revert --vm workstation
+
+Menu (interactive):
+  1) Revert all to last known good  (fast — restores baseline snapshot)
+  2) Rebuild workstation            (~8 min — rebuilds from cloud image)
+  3) Rebuild web server             (~4 min)
+  4) Rebuild build server           (~5 min)
+  5) Rebuild everything             (~20 min)
+  q) Cancel
+```
+
+Key behavior:
+- Always confirm before destroying a VM
+- Show what quest progress will be affected
+- Offer to back up save data before proceeding
+- After rebuild, re-runs the appropriate quest-prep scripts and re-takes baseline snapshot
+
+---
+
+## User Snapshots
+
+Players can take their own named snapshots of any VM — useful before attempting
+something risky, or to bookmark a state they want to return to.
+
+These are distinct from the game's automatic shift checkpoints and baseline
+snapshots. User snapshots are never pruned automatically.
+
+### Via `manage-saves.sh` (recommended)
+
+The save management menu will include a **VM Snapshots** section:
+
+```
+VM Snapshots
+
+  workstation (ares)
+    1) before-ssh-experiment   2026-05-01 19:14
+    2) checkpoint.shift-3      2026-05-01 22:00  [auto]
+    3) baseline.day-one                          [protected]
+
+  web server (hermes)
+    1) my-nginx-fix            2026-05-02 11:30
+    2) checkpoint.shift-3      2026-05-01 22:00  [auto]
+    3) baseline.clean                            [protected]
+
+  Actions: [t]ake snapshot  [r]evert  [d]elete  [q]uit
+```
+
+Taking a snapshot prompts for a name (letters, numbers, hyphens only).
+Reverting shows a confirmation with the snapshot date.
+Protected snapshots (baseline.*, checkpoint.*) cannot be deleted from this menu.
+
+### Via `tools/vm/rebuild-vms.sh --snapshot`
+
+For scripting or quick one-liners:
+
+```bash
+rebuild-vms.sh --snapshot --vm workstation --name before-risky-thing
+rebuild-vms.sh --snapshot --all --name pre-shift-4
+rebuild-vms.sh --revert  --vm workstation --name before-risky-thing
+```
+
+### Storage note
+
+Each VM snapshot is an internal qcow2 differential — typically 100 MB–2 GB
+depending on how much disk has changed since the baseline. The uninstaller shows
+the total size of user snapshots separately so the user can decide whether to
+keep them.
+
+### `lib/vm.sh` additions needed
+
+- `vm_snapshot_create vm_id name` — with name validation
+- `vm_snapshot_list vm_id` — returns name, date, size, protection flag
+- `vm_snapshot_revert vm_id name`
+- `vm_snapshot_delete vm_id name` — refuses if name matches `baseline.*` or `checkpoint.*`
+
+---
+
+## Save Management
+
+### Save file layout
+
+```
+~/.local/share/sysadmin-chronicles/
+  saves/
+    autosave.json          ← always-present auto save (current session)
+    slot-1.json
+    slot-2.json
+    slot-3.json
+  install.log
+```
+
+### Save slot semantics
+
+Save slots store JSON state only:
+- Trust score and history
+- Quest and ticket state
+- World flags
+- Inbox
+- In-world clock
+
+**VM state is not per-slot.** The shift checkpoint snapshots (checkpoint.shift-N) are the VM save mechanism and are independent of JSON slots. This is a known limitation but keeps disk usage manageable.
+
+When switching slots: if the VM state doesn't match the JSON slot's expected state, warn the user. They may need to revert VMs manually.
+
+### `tools/save/manage-saves.sh`
+
+```
+Usage:
+  manage-saves.sh                 Show save slot menu
+  manage-saves.sh --reset         Reset current save to new game
+  manage-saves.sh --reset slot-1  Reset a specific slot
+  manage-saves.sh --list          List all slots
+
+Interactive menu:
+  Current save: autosave  (Day 3, Trust: 67, 4/8 quests)
+
+  1) autosave   Day 3  Trust 67  Q4/8  [active]
+  2) slot-1     Day 1  Trust 50  Q1/8
+  3) slot-2     —empty—
+  4) slot-3     —empty—
+
+  Actions: [s]witch  [n]ew  [r]eset  [e]xport  [i]mport  [q]uit
+```
+
+### Reset save (standalone, accessible from start-game.sh)
+
+The launcher `start-game.sh` should have an escape hatch:
+
+```
+start-game.sh --manage-saves     → opens save management menu
+start-game.sh --reset-save       → confirms and resets to new game
+```
+
+---
+
+## Launcher Improvements (`start-game.sh`)
+
+Current issues to fix:
+- Silently fails if images drive not mounted
+- No check that the libvirt network is up before starting
+- `sleep 1` to wait for server is fragile
+
+Improvements:
+- `config_read` to get `SC_IMAGES_DIR`, check it exists and is writable
+- Check libvirt network is active, start it if not (with clear message)
+- Poll server readiness on `/healthz` instead of sleeping
+- Show a brief status before launching SPICE: "Starting your workstation..."
+- On failure, show a plain-English error and the fix
+
+---
+
+## Portable Installation Notes
+
+The `sc-images` libvirt pool target can be any path the host OS can write to. The installer configures it to `$SC_IMAGES_DIR` (inside the game dir by default). 
+
+If the user puts the game on a game drive (`/mnt/gamesdrive/sysadmin-chronicles/`):
+- `SC_IMAGES_DIR=/mnt/gamesdrive/sysadmin-chronicles/images`
+- The libvirt pool points there
+- All qcow2 files live on the game drive
+- The launcher checks the drive is mounted before starting
+
+If the drive is unmounted:
+```
+  ✗ Can't find your game world.
+    The VM images are stored at /mnt/gamesdrive/sysadmin-chronicles/images
+    but that location isn't available right now.
+
+    Is your game drive plugged in and mounted?
+    Once it's mounted, run start-game.sh again.
+```
+
+---
+
+## Dependency Log Format
+
+`~/.local/share/sysadmin-chronicles/install.log`
+
+```
+# Sysadmin Chronicles — Install Log
+# Created: 2026-04-27 14:32:01
+# Distro:  arch (6.19.12-arch1-1)
+# Game dir: /home/aaron/Games/sysadmin-chronicles
+# Images:   /home/aaron/Games/sysadmin-chronicles/images
+
+[INSTALLED] libvirt                  12.2.0   via pacman
+[INSTALLED] qemu-system-x86         11.0.0   via pacman
+[INSTALLED] qemu-hw-display-qxl     11.0.0   via pacman
+[INSTALLED] qemu-hw-display-virtio-gpu  11.0.0  via pacman
+[INSTALLED] qemu-ui-spice-core      11.0.0   via pacman
+[INSTALLED] qemu-chardev-spice      11.0.0   via pacman
+[INSTALLED] qemu-audio-spice        11.0.0   via pacman
+[INSTALLED] virt-install            5.1.0    via pacman
+[INSTALLED] virt-viewer             11.0     via pacman
+[INSTALLED] cloud-image-utils       0.33     via pacman
+[INSTALLED] cdrtools                3.02a09  via pacman
+[INSTALLED] libisoburn              1.5.8    via pacman
+[SKIPPED]   nodejs                           already installed
+
+# To remove manually:
+# sudo pacman -Rns libvirt qemu-system-x86 qemu-hw-display-qxl ...
+```
+
+---
+
+## File Layout After Install
+
+```
+~/Games/sysadmin-chronicles/     ← SC_GAME_DIR
+  install.sh
+  uninstall.sh
+  start-game.sh
+  content/
+  server/
+  frontend/
+  docs/
+  tools/
+    lib/
+      ui.sh
+      deps.sh
+      libvirt.sh
+      vm.sh
+      config.sh
+      save.sh
+    setup/
+      check-host.sh
+      first-run-setup.sh
+      seed-vms.sh
+    vm/
+      rebuild-vms.sh
+      build-vm.sh
+      ...
+    save/
+      manage-saves.sh
+
+  images/                        ← SC_IMAGES_DIR (libvirt pool points here)
+    sc-workstation.qcow2         (~20 GB)
+    sc-web-server.qcow2          (~8 GB)
+    sc-build-machine.qcow2       (~10 GB)
+
+~/.config/sysadmin-chronicles/config      ← install config (survives game dir moves)
+~/.local/share/sysadmin-chronicles/
+  saves/
+    autosave.json
+    slot-1.json ...
+  install.log
+```
+
+---
+
+## Implementation Order
+
+1. `tools/lib/ui.sh` — all other scripts depend on this
+2. `tools/lib/config.sh` — needed by installer and launcher
+3. `tools/lib/deps.sh` — needed by installer
+4. `tools/lib/libvirt.sh` — needed by installer and rebuild tool
+5. `tools/lib/vm.sh` — needed by installer and rebuild tool
+6. `tools/lib/save.sh` — needed by save manager
+7. `install.sh` — assembles libs 1–5
+8. `tools/vm/rebuild-vms.sh` — assembles libs 1, 3, 4
+9. `tools/save/manage-saves.sh` — assembles libs 1, 2, 6
+10. `uninstall.sh` — assembles libs 1, 2, 4
+11. `start-game.sh` (improved) — assembles libs 1, 2
+12. Update `check-host.sh` UX
+13. README — manual install section, quick start
+
+---
+
+## README Structure
+
+```markdown
+## Quick Install
+
+curl -fsSL .../install.sh | bash
+# or
+bash install.sh   # from downloaded zip
+
+## Manual Install
+
+<details>
+<summary>For users who want full control or are troubleshooting</summary>
+...per-distro dep tables, step-by-step...
+</details>
+```
@@ -0,0 +1,76 @@
+# SYSADMIN CHRONICLES — PRESSURE PROFILES
+> Version 1.1
+>
+> Pressure profiles define how an unresolved situation degrades over time.
+> They are referenced by name from quest files and live in
+> `content/pressure_profiles/`.
+>
+> A pressure profile is NOT an incident. An incident is a discrete event with
+> a trigger, escalation chain, and resolution. A pressure profile describes the
+> passive degradation behavior of the environment while a quest is active and
+> unresolved. Incidents may be spawned by pressure profiles, but are separate.
+
+---
+
+## SCHEMA
+
+```json
+{
+  "id": "web_outage_escalation",
+  "label": "Web Service Outage",
+  "description": "Gentle escalation for Tier 1 web outage quests. Creates narrative urgency without punishing new players.",
+  "intensity": 2,
+  "escalation_steps": [
+    {
+      "trigger_after_seconds": 900,
+      "notification": "Hermes is still showing errors. Is someone on this?",
+      "notification_severity": "warning"
+    },
+    {
+      "trigger_after_seconds": 1800,
+      "notification": "Site has been down thirty minutes. Ticket priority is going up.",
+      "notification_severity": "warning",
+      "escalate_linked_ticket": "high"
+    },
+    {
+      "trigger_after_seconds": 3600,
+      "notification": "Hour down. Priya has been copied in.",
+      "notification_severity": "error",
+      "escalate_linked_ticket": "critical"
+    }
+  ]
+}
+```
+
+---
+
+## FIELD REFERENCE
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | string | Unique identifier. Must match the string used in the quest's `pressure_profile` field. |
+| `label` | string | Short human-readable name for tooling and authoring. |
+| `description` | string | Internal description for authors. |
+| `intensity` | int | Relative urgency / pressure level. |
+| `escalation_steps` | array | Ordered list of timed escalation notices or ticket priority changes. |
+
+### Stage Fields
+
+| Field | Required | Description |
+|-------|----------|-------------|
+| `trigger_after_seconds` | Yes | Seconds after activation before the stage fires. |
+| `notification` | Yes | Player-facing escalation message. |
+| `notification_severity` | Yes | Severity label used by the UI and notifier. |
+| `escalate_linked_ticket` | No | Optional linked-ticket priority escalation. |
+
+---
+
+## AUTHORING NOTES
+
+- `trigger_after_seconds` is relative to quest activation time, not real wall time.
+  In-game time compression applies.
+- Stages must be ordered by `trigger_after_seconds` ascending. Authoring tools will
+  warn on out-of-order stages.
+- Pressure profiles should create urgency, not guaranteed punishment.
+- If a pressure profile escalates a linked ticket, it should do so in a way that
+  matches the authored ticket priority curve.
@@ -0,0 +1,377 @@
+# SYSADMIN CHRONICLES — PROJECT MAP
+> Living document. Update when files are added, moved, removed, or when architecture changes.
+> Version 5.1 | Living document — update when files are added, moved, or removed.
+
+---
+
+## ROOT STRUCTURE
+
+```
+sysadmin-chronicles/
+│
+├── server/                     ← NEW: Node.js game server
+│   ├── src/
+│   │   ├── index.js            Entry point — Express + WebSocket
+│   │   ├── routes/             auth, state, tickets, mail, docs, sage, vms
+│   │   ├── services/           ContentLoader, QuestEngine, TicketService,
+│   │   │                       ValidationEngine, VMManager, TrustSystem,
+│   │   │                       ProgressionSystem, EmailService, SageService,
+│   │   │                       ShiftTimer, IncidentScheduler, ShiftReviewService,
+│   │   │                       CertificationService, SaveState
+│   │   └── lib/                ssh.js, virsh.js, command.js, eventBus.js, session.js
+│   └── package.json
+│
+├── frontend/                   ← NEW: Svelte web HUD
+│   ├── src/
+│   │   ├── App.svelte          Root component, WebSocket, panel routing
+│   │   ├── components/         TicketsPanel, MailPanel, DocsPanel, SagePanel,
+│   │   │                       VmsPanel, ProfilePanel, HeaderBar, SidebarTabs
+│   │   ├── lib/api.js          REST API fetch wrapper
+│   │   └── main.js
+│   ├── dist/                   Built output (served by game server)
+│   └── package.json
+│
+├── scripts/
+│   └── start-game.sh           One-shot: start server + open SPICE workstation viewer
+│
+├── docs/
+│   ├── ARCHITECTURE.md               System architecture
+│   ├── CHARACTERS.md                 All characters — bios, relationships, story hooks
+│   ├── COMPANY_LORE.md               World, company, products, tone guidelines
+│   ├── INSTALLER_PLAN.md             Installer design and packaging
+│   ├── PRESSURE_PROFILES.md          Time-pressure escalation schema and authoring guide
+│   ├── PROJECT_MAP.md                ← this file
+│   ├── ROADMAP.md                    Development phases and content status
+│   ├── RUNTIME_DEPENDENCIES.md       Host dependencies and version requirements
+│   ├── SAVE_SYSTEM.md                Save model, VM persistence policy, recovery flows
+│   ├── SNAPSHOT_CHAIN.md             VM snapshot chain and baseline management
+│   ├── STORY_DESIGN_CONTEXT.md       How story works — narrative arc, quest model, design constraints
+│   ├── VM_BUILD_SYSTEM.md            VM build and provisioning system
+│   ├── WORKSTATION_POLISH_BACKLOG.md Outstanding UX polish items
+│   └── codex-specs/
+│
+├── content/                    ← data-driven content loaded by Node.js server
+│   ├── quests/         quest JSON files (being reworked — see STORY_DESIGN_CONTEXT.md)
+│   ├── tickets/        ticket JSON files (being reworked)
+│   ├── incidents/      incident JSON files (being reworked)
+│   ├── pressure_profiles/  escalation profiles (schema in PRESSURE_PROFILES.md)
+│   ├── dialogue/       character dialogue JSON files (being reworked)
+│   ├── world_flags/    world_flags.json (central registry)
+│   ├── docs/           onboarding, sage_content, internal_docs, etc.
+│   ├── progression/    trust_unlocks.json, access_tiers.json
+│   └── vm_profiles/    workstation.json, web_server.json, build_machine.json
+│
+├── tools/
+│   ├── setup/          check-host.sh, seed-vms.sh, first-run-setup.sh, uninstall.sh
+│   ├── vm/             build-vm.sh, build-*.sh, snapshot-all.sh, suppress-maintenance-noise.sh
+│   │   ├── profiles/   workstation.sh, web-server.sh, build-machine.sh
+│   │   └── quest-prep/ Q001–Q008 prep/post scripts
+│   └── content/        validate-content.js (zero-error gate), verify-clue-fingerprints.js
+│
+├── company-website/            Axiom Works public website (static HTML/CSS)
+│   ├── index.html              Home — hero, product highlights, stats
+│   ├── about.html              Company story, values, contact
+│   ├── people.html             Team page — Dave, Marcus, Priya, Sarah + filler staff
+│   ├── products.html           AxiomFlow, AxiomDash, AxiomSync product pages
+│   ├── style.css               Shared corporate CSS (navy/blue scheme)
+│   └── assets/                 logo.png, portrait photos for each NPC
+│
+├── vm/                         images/, snapshots/, cloud-init/, probes/
+├── package.json
+└── README.md
+```
+
+
+
+---
+
+## COMPANY WEBSITE
+
+Static HTML/CSS site serving as the public-facing Axiom Works company website, accessible from the workstation VM.
+
+**URL inside the VM:** `http://www.axiomworks.corp/` (no port)
+
+**How it works:**
+- The game server serves `company-website/` at `/company/` (port 3000)
+- nginx is installed in the workstation VM and proxies `axiomworks.io` and `www.axiomworks.io` (port 80) → game server port 3000 at `/company/`
+- `/etc/hosts` in the workstation maps both hostnames to `127.0.0.1` (localhost → nginx)
+- Result: the player sees a clean `http://www.axiomworks.io/` URL in Chromium with no port number
+
+**Pages:** Home (`index.html`), About (`about.html`), Our Team (`people.html`), Products (`products.html`)
+
+**Team page portraits:** NPC photos live in `company-website/assets/`. The player is not featured on the website.
+
+**Domain note:** `axiomworks.corp` uses the IANA-reserved `.corp` TLD (reserved 2024, can never be publicly delegated). No registration needed — it will never resolve on the real internet. The in-VM `/etc/hosts` + nginx approach is sufficient for any build.
+
+**Player portraits** (for the HUD profile panel) are separate from the website portraits. They live in `server/public/portraits/` and are served at `/public/portraits/`. The player selects one via the Profile panel; the choice persists in `save.json` as `player_portrait`.
+
+---
+
+## BOOT FLOW (Node.js Server)
+
+```
+bash scripts/start-game.sh
+  ↓
+node server/src/index.js
+  1. ContentLoader.load()      — reads all content/**/*.json into memory
+  2. SaveState.load()          — reads ~/.local/share/sysadmin-chronicles/save.json
+                                  or creates fresh save
+  3. TrustSystem.initialize()  — hydrates trust score + unlock state
+  4. ProgressionSystem.initialize()
+  5. QuestEngine.initialize()  — restores quest states from save
+  6. TicketService.initialize()
+  7. EmailService.initialize() — restores inbox, seeds T001 email on fresh save
+  8. ShiftTimer.start()        — starts shift clock
+  9. IncidentScheduler.start() — begins pressure tick loop (every 30s)
+  10. VMManager.ensureWorkstationLive() — virsh start sc-workstation if needed
+  ↓
+Express + WebSocket listening on PORT (default 3000)
+  ↓
+remote-viewer opens SPICE connection to sc-workstation
+Player sees XFCE desktop → Chromium opens HUD → game is live
+```
+
+---
+
+## TICKET COMPLETION FLOW
+
+```
+Player clicks "Mark Complete" on ticket in HUD
+  ↓
+POST /api/tickets/:id/complete
+  ↓
+TicketService.markComplete(ticketId)
+  → load ticket + linked quest JSON
+  → for each solution_branch (sorted by priority DESC):
+      ValidationEngine.check(vmId, branch.validation.rules)
+        → VMManager.getIP(vmId)
+        → SSH as opsbridge using sc_host_key
+        → run each rule check (file_exists, service_state, etc.)
+      if all rules pass → winning branch found
+  → TrustSystem.adjust(branch.trust_delta)
+  → WorldFlags.set(branch.world_flags)
+  → QuestEngine.completeQuest(questId)
+  → EmailService.send(follow-up NPC email if negative branch)
+  → SaveState.write()
+  → broadcast trust:changed, mail:new via WebSocket
+  ↓
+Response: { passed, branch, trust_delta, failures }
+HUD shows success toast or failure details
+```
+
+---
+
+## VM IDENTITY TABLE
+
+| vm_id | SC constant | libvirt domain | hostname | distro | ssh_user | mgmt_user | always_live | Quests |
+|-------|-------------|----------------|----------|--------|----------|-----------|-------------|--------|
+| `workstation` | `SC.VM_WORKSTATION` | `sc-workstation` | `ares` | Debian 12 | `player` | `opsbridge` | yes | Q001 |
+| `web_server` | `SC.VM_WEB_SERVER` | `sc-web-server` | `hermes` | Debian 12 | `player` | — | no | Q002–Q005, Q007 |
+| `build_machine` | `SC.VM_BUILD_MACHINE` | `sc-build-machine` | `vulcan` | Arch Linux | `player` | — | no | Q006, Q008 |
+
+See `docs/VM_BUILD_SYSTEM.md` for full build system documentation and profile authoring guide.
+
+**SSH key**: all host→guest connections use `~/.ssh/sc_host_key` (BatchMode, no password).
+
+**Baseline snapshots**:
+- workstation: `baseline.day-one`
+- web_server, build_machine: `baseline.clean`
+
+---
+
+## TERMINAL ARCHITECTURE
+
+The player uses a real **Tilix** terminal inside the workstation VM (sc-workstation / ares).
+No terminal simulation. SSH to target VMs is real SSH. There is no in-game terminal widget.
+
+```
+Player opens Tilix on the workstation XFCE desktop
+  → types: ssh hermes
+  → real SSH to sc-web-server using player's authorized_keys
+  → works directly on the target VM
+
+Host-side validation (triggered by "Mark Complete" in HUD):
+  ValidationEngine.js SSHes as 'opsbridge' → sudo -H -i -u player
+  Runs rule checks (file_exists, service_state, etc.)
+  Returns pass/fail to game server
+```
+
+Host SSH options (used by ValidationEngine.js and VMManager.js):
+```
+-o StrictHostKeyChecking=no
+-o BatchMode=yes
+-o ConnectTimeout=5
+-o LogLevel=ERROR
+-i ~/.ssh/sc_host_key
+```
+
+---
+
+## SERVICE DEPENDENCY GRAPH (Node.js server)
+
+```
+eventBus.js (Node.js EventEmitter — no deps)
+  └─ consumed by: all services
+
+ContentLoader
+  └─ consumed by: QuestEngine, TicketService, ValidationEngine, TrustSystem,
+                  ProgressionSystem, IncidentScheduler, EmailService, SageService
+
+VMManager
+  ← wraps virsh.js + ssh.js
+  ← called by QuestEngine (start required VMs on quest activation)
+  ← called by ValidationEngine (get VM IP for SSH)
+
+ValidationEngine
+  ← calls VMManager.getIP(vmId)
+  ← SSHes as opsbridge → runs rule checks (file_exists, service_state, etc.)
+  ← called by TicketService on mark-complete
+
+QuestEngine
+  ← calls VMManager to start required VMs
+  ← calls ValidationEngine via TicketService
+  ← calls TrustSystem, WorldFlags, EmailService on resolution
+  → emits via eventBus: quest:activated, quest:resolved, ticket:received
+
+IncidentScheduler
+  ← reads WorldFlags for trigger conditions
+  ← tick drives escalation step advancement
+  → emits via eventBus: incident:activated, incident:escalated, incident:resolved
+
+TrustSystem
+  ← called by QuestEngine on branch resolution
+  ← called by IncidentScheduler for ignored incident penalties
+  → emits via eventBus: trust:changed
+
+SaveState
+  ← called by QuestEngine, TrustSystem, ProgressionSystem
+  ← reads/writes ~/.local/share/sysadmin-chronicles/save.json
+```
+
+---
+
+## KEY MODULES
+
+### Server (`server/src/`)
+
+| Module | File | Responsibility |
+|--------|------|----------------|
+| Entry point | index.js | Express + WS, service wiring, static serving |
+| ContentLoader | services/ContentLoader.js | Load all content/ JSON at startup |
+| QuestEngine | services/QuestEngine.js | Quest state machine |
+| TicketService | services/TicketService.js | Ticket state, mark-complete, branch resolution |
+| ValidationEngine | services/ValidationEngine.js | SSH rule evaluation (all rule types) |
+| VMManager | services/VMManager.js | virsh wrappers, IP resolution |
+| TrustSystem | services/TrustSystem.js | Score, unlocks, revocation |
+| ProgressionSystem | services/ProgressionSystem.js | Unlocked VMs, docs, access |
+| EmailService | services/EmailService.js | Inbox, follow-ups, reply options |
+| SageService | services/SageService.js | Rule-based dialogue / KB |
+| ShiftTimer | services/ShiftTimer.js | Shift clock, 30s tick broadcasts |
+| IncidentScheduler | services/IncidentScheduler.js | Pressure tick, incident injection |
+| ShiftReviewService | services/ShiftReviewService.js | End-of-shift review email |
+| CertificationService | services/CertificationService.js | Cert awards after quest chains |
+| SaveState | services/SaveState.js | Read/write save.json |
+| ssh.js | lib/ssh.js | Promisified SSH execution |
+| virsh.js | lib/virsh.js | virsh command wrappers |
+| eventBus.js | lib/eventBus.js | Node.js EventEmitter for service coordination |
+
+### Frontend (`frontend/src/`)
+
+| Component | File | Responsibility |
+|-----------|------|----------------|
+| Root | App.svelte | Panel routing, WebSocket connection |
+| Tickets | TicketsPanel.svelte | List, detail, mark-complete |
+| Mail | MailPanel.svelte | Inbox, message, reply buttons |
+| Docs | DocsPanel.svelte | Trust-gated doc viewer |
+| Sage | SagePanel.svelte | Chat / KB search |
+| VMs | VmsPanel.svelte | Live VM status indicators |
+| Header | HeaderBar.svelte | Trust, shift timer, mail badge |
+| API | lib/api.js | REST fetch wrapper |
+
+---
+
+## CONTENT DOMAINS
+
+| Domain | Purpose |
+|--------|---------|
+| `quests/` | Objective chains, clue fingerprints, validation rules, branch priorities |
+| `tickets/` | Player-facing problem statements with initial/current priority |
+| `incidents/` | Dynamic pressure events with blast_radius and escalation steps |
+| `dialogue/` | Workplace messages, hints, follow-ups, series threads |
+| `pressure_profiles/` | Reusable escalation templates referenced by quest branches |
+| `world_flags/` | Central registry — all world state flags declared here |
+| `docs/` | Internal documentation + Sage/help content (trust-gated) |
+| `progression/` | Trust thresholds, unlocks, revocation rules, access tiers |
+| `vm_profiles/` | Domain names, hostnames, snapshots, networks, resource budgets |
+
+---
+
+## FILE NAMING CONVENTIONS
+
+- Quest files: `Q{NNN}-{kebab-case-title}.json`
+- Ticket files: `T{NNN}.json`
+- Incident files: `I{NNN}-{kebab-case-title}.json`
+- Dialogue files: `{character}-Q{NNN}.json` or `{character}-Q{NNN}-{variant}.json`
+- Quest prep scripts: `Q{NNN}-prep.sh`
+- VM profiles: `{snake_case}.json`
+
+---
+
+## CONTENT VALIDATION CHECKS
+
+Run: `node tools/content/validate-content.js` — must exit 0 (zero errors).
+
+| Check | Rule |
+|-------|------|
+| JSON well-formed | All content files parse without error |
+| No duplicate IDs | Unique across quests, tickets, incidents, pressure profiles, dialogue |
+| World flags | Every referenced flag exists in `world_flags/world_flags.json` |
+| required_vms | Every entry maps to a valid VM profile |
+| blast_radius | Every entry maps to an existing incident file |
+| linked_quest | Every ticket's linked_quest maps to an existing quest |
+| ticket_id | Every quest's ticket_id maps to an existing ticket |
+| Branch priority | Priorities unique per quest (no ties) |
+| follow_up_incident | Maps to an existing incident file |
+| pressure_profile | Maps to an existing pressure profile file |
+| series_id | Every series_id has at least two dialogue members |
+| revokes | Trust unlock revoke entries reference valid unlock strings |
+| clue_fingerprint | Evidence rule types are valid |
+
+---
+
+## KNOWN GAPS (Post-Redesign)
+
+These are gaps in the v4.0 Node.js + Svelte implementation.
+All content is authored, validator-clean, and reused unchanged.
+
+### P0 — Blocking for first playable shift
+
+| Gap | Notes |
+|-----|-------|
+| Phase 7 workstation VM verification | Confirm SPICE display, Chromium autostart, Tilix as default work end-to-end on a freshly seeded VM |
+| Phase 10 full playtest | Boot all VMs, play Q001→Q002, validate full server→SSH→HUD loop |
+
+### P1 — Required before broader testing
+
+| Gap | Notes |
+|-----|-------|
+| Clue quality as system degrades | Evidence should remain legible as incidents escalate (I001/I002/I003 escalation pass) |
+| Viewer smoothness | `remote-viewer` SPICE path is functional but not final-UX smooth; lower priority with real XFCE desktop |
+
+### P2 — Polish / completeness
+
+| Gap | Notes |
+|-----|-------|
+| WORKSTATION_POLISH_BACKLOG.md items | See that file for outstanding desktop UX polish |
+
+---
+
+## GENERATED / LARGE ASSETS
+
+Created by CLI tooling, not hand-managed:
+
+- `vm/images/*.qcow2`
+- Imported libvirt domain XML
+- Baseline snapshot exports or manifests
+- Shift checkpoint snapshots
+- Packaged Linux build artifacts
@@ -0,0 +1,58 @@
+# SYSADMIN CHRONICLES — DEVELOPMENT ROADMAP
+> Version 5.0 | Status: Active development
+>
+> Changelog:
+>   v5.0 — GDScript/Godot removed. Node.js + Svelte is the only codebase.
+>   v4.0 — Full architecture pivot to Node.js + Svelte.
+>   v3.x — GDScript/Godot era (superseded).
+
+---
+
+## IMPLEMENTATION PHASES (Node.js + Svelte)
+
+| Phase | Description | Status |
+|-------|-------------|--------|
+| 1 | Game server skeleton — Express, ContentLoader, SaveState, GET /api/state | [x] done |
+| 2 | TrustSystem, ProgressionSystem, QuestEngine, TicketService, ticket routes | [x] done |
+| 3 | ValidationEngine — SSH into VMs, all rule types | [x] done |
+| 4 | EmailService — inbox, follow-up emails, reply options, mail routes | [x] done |
+| 5 | WebSocket broadcasts — trust:changed, mail:new, shift:tick, incident:alert | [x] done |
+| 6 | Svelte frontend — all panels built, dist/ served by game server | [x] done |
+| 7 | XFCE workstation VM — cloud-init, SPICE/QXL, Chromium, Tilix, autostart | [x] done |
+| 8 | SageService + docs routes + SagePanel + DocViewer | [x] done |
+| 9 | IncidentScheduler + ShiftTimer + pressure tick loop | [x] done |
+| 10 | Full playtest — boot all VMs, play Q001→Q002 end to end | [ ] pending |
+
+**Phase 7 details:** `workstation.sh` profile provisions the full XFCE desktop via
+cloud-init: SPICE+virtio display with spicevmc channel for vdagent resize, Chromium
+autostart via `open-portal` wrapper (waits for game server before launching), Tilix
+as default terminal (`update-alternatives` + `helpers.rc`), dark theme, screensaver
+off, desktop icons executable. Snapshot chain: `baseline.day-one`, `baseline.recovery`
+taken by `seed-vms.sh`.
+
+---
+
+## CONTENT STATUS
+
+The quest system and story are being completely reworked. All existing quest,
+ticket, dialogue, and incident content (Q001–Q008, T001–T008, I001–I003) is
+considered legacy and will be replaced.
+
+### Story Design Assets
+
+| File | Purpose |
+|------|---------|
+| `docs/CHARACTERS.md` | All characters — bios, relationships, story hooks, unresolved threads |
+| `docs/STORY_DESIGN_CONTEXT.md` | How story works in this game — narrative arc, quest structure, character model, design constraints |
+| `docs/COMPANY_LORE.md` | World, company, products, tone guidelines |
+
+
+---
+
+## QUEST TIER DEFINITIONS
+
+| Tier | Label | Characteristics |
+|------|-------|-----------------|
+| 1 | Tutorial Arc | Single VM, clear symptoms, one obvious fix, one better fix, no time pressure |
+| 2 | Workday Arc | Multi-symptom, one quest affects another, trust pressure, incidents active |
+| 3 | Stretch | Multi-VM, ambiguous root cause, political pressure, real prioritization stakes |
@@ -0,0 +1,54 @@
+# Runtime Dependencies
+
+This file tracks host and guest dependency expectations for Sysadmin Chronicles.
+Keep it updated when provisioning scripts, VM display backends, or installer
+requirements change.
+
+## Host Packages
+
+| Capability | Arch package / command | Minimum tested version | Notes |
+| --- | --- | --- | --- |
+| Godot runtime | `godot` | 4.6.2 | Used for the current Godot client path. |
+| Libvirt CLI | `libvirt` / `virsh` | 12.2.0 | Use `qemu:///system` for game VMs. Socket activation is supported. |
+| QEMU system emulator | `qemu-system-x86` / `qemu-system-x86_64` | 11.0.0 | Must match the split QEMU module package versions. |
+| QEMU disk tools | `qemu-img` | 11.0.0 | Used by VM builders for qcow2 images. |
+| QXL display module | `qemu-hw-display-qxl` | 11.0.0 | Required for `virt-install --video qxl`. |
+| Virtio GPU modules | `qemu-hw-display-virtio-gpu`, `qemu-hw-display-virtio-gpu-pci`, `qemu-hw-display-virtio-vga` | 11.0.0 | Required for the default SPICE + virtio workstation display path. |
+| SPICE UI module | `qemu-ui-spice-core` | 11.0.0 | Required for SPICE graphics in libvirt domain capabilities. |
+| SPICE channel module | `qemu-chardev-spice` | 11.0.0 | Required for SPICE agent channels. |
+| SPICE audio module | `qemu-audio-spice` | 11.0.0 | Required for SPICE-backed guest audio. |
+| VM installer | `virt-install` | 5.1.0 | Creates imported cloud-image domains. |
+| SPICE viewer | `remote-viewer` / `virt-viewer` | 11.0 | Used for desktop workstation display. |
+| Cloud image tools | `cloud-image-utils`, `cdrtools`, `libisoburn` | cloud-image-utils 0.33, cdrtools 3.02a09, libisoburn 1.5.8 | Used to generate seed ISOs. |
+| SSH client | `ssh` | OpenSSH 10.3p1 | Used by the game and setup scripts to reach guests. |
+| Node.js | `node` | 22.22.2 | Required by the redesigned browser HUD/server path. |
+
+## Libvirt Resources
+
+| Resource | Required shape | Notes |
+| --- | --- | --- |
+| Network | `sc-internal`, bridge `sc-br0`, subnet `10.42.0.0/24`, NAT forwarding | NAT is required during VM image provisioning so Debian cloud-init can install packages. The network remains private to libvirt guests for inbound access. |
+| Storage pool | `sc-images` | For `qemu:///system`, defaults to `/var/lib/libvirt/images/sysadmin-chronicles`. |
+| SSH key | `~/.ssh/sc_host_key` | Injected into guests for game automation and bridge access. |
+
+## Workstation Guest Packages
+
+The workstation image currently targets Debian 12 Bookworm and installs:
+
+- Desktop/display: `xfce4`, `xfce4-goodies`, `lightdm`, `lightdm-gtk-greeter`, `spice-vdagent`, `qemu-guest-agent`, `accountsservice`, `linux-image-amd64`
+- Desktop metadata: `gvfs`, `gvfs-daemons`, `libglib2.0-bin` for trusted desktop launchers and GVFS metadata writes
+- User tools: `tilix`, `chromium`, `thunar`, `geany`, `meld`, `vim`, `nano`, `tmux`, `htop`
+- Sysadmin tools: `openssh-server`, `openssh-client`, `sudo`, `curl`, `wget`, `rsync`, `git`, `jq`, `python3`, `nmap`, `netcat-openbsd`, `dnsutils`, `traceroute`, `mtr`, `tcpdump`, `strace`, `lsof`, `openssl`, `whois`, `iperf3`, `logwatch`
+- Fonts/completion: `fonts-hack`, `fonts-firacode`, `bash-completion`
+
+## Version Capture
+
+Before cutting an installer or release, capture current versions with:
+
+```bash
+tools/setup/check-host.sh
+virsh --connect qemu:///system version
+qemu-system-x86_64 --version
+virt-install --version
+pacman -Q libvirt qemu-system-x86 qemu-hw-display-qxl qemu-hw-display-virtio-gpu qemu-hw-display-virtio-gpu-pci qemu-hw-display-virtio-vga qemu-ui-spice-core qemu-chardev-spice qemu-audio-spice virt-install virt-viewer spice-gtk cloud-image-utils cdrtools libisoburn
+```
@@ -0,0 +1,330 @@
+# SYSADMIN CHRONICLES — SAVE SYSTEM DESIGN
+> Version 1.3 | Status: Active development
+>
+> Changelog:
+>   v1.3 — Defined `persists: false` flag semantics (shift boundary reset).
+>           Added world flag persistence rules section.
+>
+> This document covers the save model, VM persistence policy, dirty state
+> handling, recovery flows, and the design decisions behind them.
+
+---
+
+## THE CORE TENSION
+
+The game wants real VMs. Real VMs have real state. That state changes as the
+player works. The question is: what do we save, when, and what happens when
+things go wrong?
+
+Two broad approaches exist:
+
+**Approach A — Replay Model**
+Save authored flags and game state only. On load, restore a baseline snapshot
+and replay authored events to reconstruct the world. Simple, cheap, predictable.
+
+**Approach B — Dirty State Model**
+Preserve actual VM disk state as-is. Save references to the current snapshot or
+live qcow2 state. On load, the VM resumes exactly where it was.
+
+This game uses **Approach B**, with structured recovery fallbacks. Here is why,
+and what that means in practice.
+
+---
+
+## WHY DIRTY STATE
+
+The replay model breaks the design contract. If the player spent forty minutes
+debugging a broken service, leaving behind log entries, partial edits, and
+useful breadcrumbs, restoring a clean baseline erases all of that. The world
+forgets. That is not how real systems work.
+
+The dirty state model means:
+- The player's workstation remembers what they did
+- Target VMs remember fixes applied and mistakes made
+- Evidence persists — good and bad
+- A machine the player damaged stays damaged until they fix it or request reimage
+- A machine they set up correctly stays correct
+
+Operational note:
+- The workstation should be treated as a curated terminal-first appliance image
+  whose shell history, local config, and jump-box state persist like any other VM state
+- Desktop-like company tools live in the game state layer, not inside a VM browser session
+- Rebuilding the workstation runtime on every reset would create slow, noisy,
+  and inconsistent recovery behavior
+
+This is more expensive. It is also the point of the game.
+
+---
+
+## WHAT GETS SAVED
+
+### Game State Layer
+Saved as structured JSON. Cheap, fast, always consistent.
+
+- Player trust score and history
+- Unlocked VMs, sudo scopes, internal docs, tools
+- Active and completed ticket/quest state
+- World flags (current values and change history)
+- Incident scheduler state (active incidents, escalation timers)
+- Per-quest authored consequence records
+- Shift timestamp and in-world clock
+
+### VM State Layer
+Saved as libvirt snapshot references or qcow2 state references. Expensive but
+necessary.
+
+- Per-VM: reference to current named snapshot or live disk state
+- Per-VM: list of managed recovery checkpoints
+- Per-VM: reimage eligibility and reimage history
+- Per-VM: last-known observation data (advisory, not authoritative)
+
+The game does not store VM disk images in the save file. It stores references to
+named snapshots managed by libvirt. The actual disk data lives where libvirt
+puts it.
+
+---
+
+## WORLD FLAG PERSISTENCE RULES
+
+Every world flag in `world_flags/world_flags.json` declares a `persists` field.
+This controls how the flag behaves across shift boundaries and game loads.
+
+### `persists: true`
+The flag is written to the save file and survives indefinitely. It is cleared
+only when a quest or incident explicitly sets it to false, or when the VM is
+reimaged. Most flags are persistent — they represent stable facts about the
+world (nginx is configured correctly, logrotate is healthy, etc.).
+
+### `persists: false`
+The flag is **reset at the start of each new shift**, regardless of its current
+value. It is NOT reset on game load within the same shift.
+
+Non-persistent flags represent transient pressure states that should not carry
+forward into the next working session:
+- `hermes_disk_healthy` — disk state that may change overnight without the player's intervention
+- `web_disk_pressure_active` — active disk pressure event currently escalating
+
+**On shift boundary**: all `persists: false` flags are cleared before the new
+shift's checkpoint is taken. Their cleared state is what gets saved.
+
+**On game load mid-shift**: `persists: false` flags are loaded from the save
+file as-is. They are not reset on load, only on shift boundary.
+
+**Implementation note for `SaveSystem`**: When writing the shift checkpoint,
+iterate all world flags and zero out any with `persists: false` before
+serializing. Do not zero them in the live `WorldFlagRegistry` until the
+checkpoint write is complete, to avoid mid-write state corruption.
+
+---
+
+## SNAPSHOT STRATEGY FOR SAVE/LOAD
+
+### Named Snapshot Tiers
+
+Each VM maintains three tiers of snapshots:
+
+```
+baseline.clean          — Authored starting state for a fresh quest arc
+baseline.recovery       — Fallback if live state is unrecoverable
+checkpoint.shift-{N}    — Auto-saved at start of each in-game shift
+live                    — Current working state (no snapshot, just disk)
+```
+
+On save: the game records which snapshot tier is current per VM and any
+divergence from it (live state is implicitly the disk, not a snapshot).
+
+On load: the game checks that referenced snapshots still exist and are
+consistent with the saved game state flags. If they are, it resumes from live
+disk state and continues normally.
+
+### What "Resume" Means
+
+The game does not revert to a snapshot on load. It resumes from whatever state
+the VMs are currently in. The save file describes what the game *thinks* the
+world looks like. On load, the observation service validates current VM state
+against saved world flags and reconciles any drift.
+
+Minor drift (service restarted, log rotated by the OS) is handled silently.
+Major drift (a VM that should be running is gone, a snapshot reference is
+missing) triggers the recovery flow.
+
+---
+
+## DIRTY STATE RISKS AND MITIGATIONS
+
+### Risk 1: Snapshot Reference Goes Stale
+A named snapshot the game references is deleted or corrupted outside the game.
+
+Mitigation: On load, the save system checks all referenced snapshots exist
+before resuming. If a checkpoint snapshot is missing but baseline.clean exists,
+offer to resume from baseline with authored-flag reconstruction where possible.
+If baseline.clean is also gone, the VM is treated as unrecoverable and the
+reimage flow is offered.
+
+### Risk 2: Live Disk State is Unbootable
+The player damaged the VM beyond booting — corrupted bootloader, deleted
+critical system files, broke networking in a way that prevents observation.
+
+Mitigation: The game detects unbootable VMs through libvirt domain state and
+failed SSH probes. The player is notified in-world ("hermes is not responding")
+and the reimage flow is offered. The game does not attempt to force-boot or
+auto-repair.
+
+### Risk 3: Multiple VMs Diverge from Each Other
+The player fixed hermes but their notes reference a service that is now
+configured differently. Cross-VM state is inconsistent with authored
+expectations.
+
+Mitigation: World flags are the source of truth for cross-VM consequences, not
+raw VM state. If the flags say nginx_stable but hermes currently has nginx
+failed, the validation service surfaces this on next observation pass and raises
+an in-world event. The player is not penalized for drift that happens while they
+are offline — but they are informed.
+
+### Risk 4: Disk Space on Host
+qcow2 images with many snapshots can balloon. Long save histories consume real
+host storage.
+
+Mitigation: Managed checkpoint retention policy. The game keeps a maximum of N
+shift checkpoints per VM (default: 5) and prunes the oldest on new checkpoint
+creation. Authored baseline and recovery snapshots are never pruned by the game.
+A storage budget field in vm_profiles allows per-VM tuning.
+
+Resource budget note:
+- Budget the workstation separately from server VMs
+- Even a modest workstation profile should be budgeted separately from server VMs
+- Save/recovery tooling should assume workstation snapshots are the most
+  storage-expensive routine snapshots in the fleet
+- Earlier lab builds showed that browser-capable workstation images can exceed
+  small cloud-image defaults quickly; the terminal-first plan avoids much of
+  that pressure, but disk budgets still need to be explicit
+
+---
+
+## THE REIMAGE FLOW
+
+When a VM is unrecoverable, the player can report it for reimage through an
+in-world mechanic (ticket to management or ops channel).
+
+Flow:
+1. Player submits a reimage request for the affected machine
+2. An in-world delay is imposed (e.g., 1 in-game shift)
+3. The machine is restored from baseline.recovery or baseline.clean
+4. Trust penalty is applied based on severity
+5. Any in-progress quests on that VM are reset to their baseline state
+6. Evidence from before the reimage is gone — acknowledged in-world as "we
+   had to wipe the machine"
+
+This is not a free reset. It has visible consequences. But it allows the game
+to continue rather than becoming permanently stuck.
+
+The reimage flow is the designed escape valve, not a hidden automatic recovery.
+
+---
+
+## SHIFT CHECKPOINTS
+
+At the start of each in-game shift, the game:
+1. Clears all `persists: false` world flags
+2. Saves all game state JSON (with non-persistent flags already zeroed)
+3. Creates a named snapshot for each active VM: `checkpoint.shift-{N}`
+4. Records the checkpoint reference in the save file
+5. Prunes shift checkpoints beyond the retention limit
+
+This gives the player a rollback option at shift granularity if they want to
+undo a disastrous session, at the cost of losing that shift's work entirely.
+
+Shift checkpoint rollback is an explicit player action, not automatic. It is
+presented as "start this shift over" and requires confirmation. It does not
+undo trust changes or world flag consequences that were sent to other characters
+(e.g., dialogue already delivered, tickets already closed).
+
+---
+
+## DEVELOPER RESET
+
+For authoring and testing, a separate CLI tool exists outside the game:
+
+```bash
+bash tools/vm/snapshot-all.sh --revert-to baseline.clean
+```
+
+This is not accessible in the shipped game. It completely resets all VMs to
+their authored baseline. Used during content authoring and automated test runs.
+
+---
+
+## SAVE FILE STRUCTURE (DRAFT SCHEMA)
+
+```json
+{
+  "save_version": 1,
+  "player": {
+    "trust": 14,
+    "trust_history": [],
+    "unlocks": ["sudo:systemctl", "vm:build_machine"],
+    "current_shift": 7
+  },
+  "world": {
+    "flags": {
+      "player_ssh_configured": true,
+      "nginx_stable": true,
+      "hermes_logrotate_healthy": false,
+      "hermes_log_pressure_pending": true,
+      "hermes_disk_healthy": false
+    },
+    "flag_history": [],
+    "_note": "persists:false flags are zeroed at shift boundary before this snapshot is written. They survive game load within the same shift."
+  },
+  "quests": {
+    "completed": ["Q001", "Q002"],
+    "failed": [],
+    "active": ["Q003"],
+    "branch_outcomes": {
+      "Q002": "config-fixed-enabled"
+    }
+  },
+  "tickets": {
+    "active": ["T003"],
+    "closed": ["T001", "T002"]
+  },
+  "incidents": {
+    "active": [
+      {
+        "id": "I001",
+        "started_at_shift": 6,
+        "escalation_step_reached": 1
+      }
+    ],
+    "resolved": []
+  },
+  "vms": {
+    "workstation": {
+      "current_snapshot_tier": "live",
+      "last_checkpoint": "checkpoint.shift-6",
+      "recovery_snapshot": "baseline.recovery",
+      "reimage_count": 0,
+      "last_observation": {}
+    },
+    "web_server": {
+      "current_snapshot_tier": "live",
+      "last_checkpoint": "checkpoint.shift-6",
+      "recovery_snapshot": "baseline.recovery",
+      "reimage_count": 0,
+      "last_observation": {}
+    }
+  }
+}
+```
+
+---
+
+## DESIGN PRINCIPLES SUMMARY
+
+- The dirty state is the game. Preserving it is the point.
+- Snapshots are structured fallbacks, not the primary save mechanism.
+- The game never silently reverts VM state without player awareness.
+- Recovery from failure is in-world and has consequences.
+- The host disk cost is real and must be managed with a retention policy.
+- Developers get clean-reset tooling outside the shipped game.
+- `persists: false` flags reset at shift boundary, not on load.
@@ -0,0 +1,103 @@
+# SYSADMIN CHRONICLES — SNAPSHOT CHAIN
+> Version 1.0
+>
+> This document defines what each named baseline snapshot represents,
+> how the snapshot chain is built, and what assumptions quest authors
+> can make about VM state at each snapshot.
+
+---
+
+## POLICY
+
+Each `baseline.post-qXXX` snapshot represents the **canonical clean-branch
+outcome** of quest QXXX — meaning all prior quests were resolved via their
+highest-priority (best) solution branch.
+
+Player state diverges from the baseline during play. The baseline is always
+the authored "good state" for that point in the arc, built independently of
+any player's actual save.
+
+**A baseline snapshot is never built from a bad or partial branch outcome.**
+If a player took the wrong branch, their VM state differs from the baseline
+for all subsequent quests. That divergence is intentional and is the game.
+
+---
+
+## SNAPSHOT CHAIN TABLE
+
+| Snapshot Name | VM(s) | Built After | Represents |
+|---------------|-------|-------------|------------|
+| `baseline.day-one` | workstation | fresh image | Brand new ares workstation. No player account SSH key. Provisioning script ran but authorized_keys absent. |
+| `baseline.clean` | web_server | fresh image | Fresh hermes. nginx installed, no config errors, logrotate present, web root owned by www-data. Ready for Q002 to break it. |
+| `baseline.clean` | build_machine | fresh image | Fresh vulcan. NTP disabled (for Q006 scenario). Arch base install, pacman configured to use internal repo. |
+| `baseline.post-q001` | workstation | Q001 clean branch | Player SSH key in authorized_keys with correct permissions (0600 file, 0700 dir). Used as the implied state for all subsequent quests requiring SSH access. Not an explicit snapshot — workstation just stays live from Q001 onward. |
+| `baseline.post-q004` | web_server | Q004 clean branch | hermes with: nginx stable+enabled, logrotate configured, web root owned by www-data recursively. All of Q002–Q004 resolved cleanly. Used as starting state for Q005 and Q007. |
+| `baseline.post-q006` | build_machine | Q006 clean branch | vulcan with NTP enabled and healthy, archlinux-keyring refreshed, builds working. Used as starting state for Q008. |
+
+---
+
+## HOW SNAPSHOTS ARE BUILT
+
+Snapshots are produced by `tools/vm/seed-vms.sh` in sequence:
+
+```
+1. Build base VM images from cloud-init or preseed
+2. Run base configuration (hostname, users, packages, game helpers)
+3. Run suppress-maintenance-noise.sh
+4. Take baseline.clean snapshot
+5. Run Q001-prep.sh → take no snapshot (workstation stays live)
+6. Run Q002-prep.sh through Q004-prep.sh sequentially on web_server
+7. Apply clean-branch outcome state manually or via a post-quest-state script
+8. Take baseline.post-q004 snapshot on web_server
+9. Run Q006-prep.sh on build_machine
+10. Apply clean-branch outcome state on build_machine
+11. Take baseline.post-q006 snapshot on build_machine
+```
+
+Step 7 and 10 ("apply clean-branch outcome state") are done via dedicated
+scripts in `tools/vm/quest-prep/`:
+
+```
+Q004-post-clean.sh   — sets web root ownership, confirms logrotate, enables nginx
+Q006-post-clean.sh   — enables systemd-timesyncd, refreshes archlinux-keyring
+```
+
+These post-clean scripts are the authoritative definition of what "clean
+branch" means for snapshot purposes.
+
+---
+
+## WHAT QUEST AUTHORS CAN ASSUME
+
+When authoring a quest against `baseline.post-q004`, you can assume:
+- nginx is active and enabled on hermes
+- /etc/logrotate.d/nginx exists and is correct
+- /var/www/axiomworks is owned by www-data recursively
+- The deploy service runs as www-data and can write to /var/www/axiomworks
+- No Q002/Q003/Q004 broken state exists
+- Q005 and Q007 both build on this clean hermes state
+
+When authoring a quest against `baseline.post-q006`, you can assume:
+- Everything in post-q004 (hermes state)
+- systemd-timesyncd is active and enabled on vulcan
+- archlinux-keyring is up to date
+- pacman -Syu works without signature errors
+- Q008 uses this as its clean starting baseline
+
+If your quest needs to break something that was fixed in a prior quest,
+your prep script must re-break it after the post-clean baseline is applied.
+Document this explicitly in your prep script's header comment.
+
+---
+
+## DEVELOPER RESET
+
+To rebuild all baselines from scratch:
+
+```bash
+bash tools/vm/snapshot-all.sh --revert-to baseline.clean
+bash tools/vm/seed-vms.sh
+```
+
+This is destructive and should only be run during authoring or CI.
+It is not available in the shipped game.
@@ -0,0 +1,423 @@
+# Story Design Context — Sysadmin Chronicles
+
+For story designers and AI agents creating new quests and narrative content.
+
+**Related docs:**
+- `CHARACTERS.md` — character bios, relationships, story hooks
+- `COMPANY_LORE.md` — world, company, tone
+- `QUEST_AUTHORING.md` — technical JSON spec for implementers
+
+This document answers: *how does story actually work in this game, and what does a quest
+concept need to contain to be usable?*
+
+---
+
+## The Core Premise
+
+The player is a new junior sysadmin at Axiom Works, a mid-size B2B software company.
+They are replacing someone named Dale. Nobody will explain why Dale is gone.
+
+The game is played entirely through a simulated work environment: a terminal, an email
+inbox, and a company website. There are no cutscenes, no narration, no inventory, no
+combat. Everything that happens is expressed through:
+
+- **Tickets** — the player receives a ticket describing a problem
+- **The terminal** — the player SSHes into VMs, investigates, and fixes things
+- **Character dialogue** — characters react to how the player solved the problem
+- **The next ticket** — the world moves on, and the consequences of what the player
+  did are baked into the next situation
+
+That's it. Story is not told — it is accumulated from the choices the player makes
+when fixing real Linux problems on real virtual machines.
+
+---
+
+## The Three Machines (VMs)
+
+Every quest happens on one or more of these machines. Their narrative identities
+matter as much as their technical roles.
+
+### ares — the Workstation
+The player's home machine. Ubuntu 24.04. Quests here are onboarding-flavored —
+establishing access, learning the environment. It's the only machine the player
+can reach on day one.
+
+*Narrative identity:* Where you start. Safe-ish. The first one you break is here.
+
+### hermes — the Web / App Server
+Debian 12. Runs nginx and the AxiomFlow demo/staging application. This is the
+machine that Sarah Chen cares about, that customers can feel, and that Priya Nair
+watches for security posture. Most of the early-game quests are here.
+
+*Narrative identity:* The product's face to the world. Breaking this makes noise
+immediately. The most politically visible machine.
+
+### vulcan — the Build Machine
+Arch Linux. Compiles packages, runs the internal build pipeline, serves packages
+to hermes via an internal apt repo. Nikhil Sharma owns this in principle but nobody
+manages it daily. Things here break silently until hermes starts serving bad software.
+
+*Narrative identity:* The machine nobody watches until something downstream fails.
+Quests here reveal that problems have upstream causes the player didn't expect.
+
+### Planned future machines
+As the story expands, new machines can be added. Each should have a clear narrative
+role before it's introduced. (See `COMPANY_LORE.md` for the candidate list.)
+
+---
+
+## How Story Is Delivered
+
+### Tickets as Act One
+Every quest begins with a ticket in the player's inbox. The ticket is a short email
+from a character describing a symptom — not a cause. The sender's perception of the
+problem is usually incomplete and sometimes wrong. This is intentional: the player's
+job is to investigate, not to execute instructions.
+
+Good ticket writing:
+- Describes what the sender experienced, not what the cause is
+- Has the sender's voice and perspective (Sarah is outcome-focused; Dave is confused;
+  Priya is terse and specific)
+- Does not hint at the solution
+- Creates genuine stakes (site is down, builds are failing, someone is locked out)
+
+Bad ticket writing:
+- Explains the root cause ("the log file is too big")
+- Has no character voice (generic IT help desk language)
+- Stakes are unclear or low
+
+### The Terminal as Act Two
+The player investigates. They SSH in, run commands, read logs, check configs, look at
+file ownership. The evidence is seeded into the VM baseline — it is genuinely there
+to find, not procedurally generated. A good quest has a natural clue trail:
+
+- The most obvious thing points to a second thing
+- The second thing reveals the actual problem
+- The fix is achievable with real Linux knowledge
+
+The player cannot be told what to do. They can ask Marcus for hints (via dialogue
+choices), but good players don't need to.
+
+### Branching Resolution as Act Three
+When the player has made changes to the VM, the game checks the state of the
+system against the quest's solution branches. The branch that matches determines:
+
+- What dialogue fires (Marcus's reaction, Sarah's reaction, Priya's follow-up)
+- What trust delta the player receives
+- What world flag is set (persistent story state)
+- Whether an incident is triggered (a future consequence of a partial fix)
+- What ticket comes next
+
+**This is the central story mechanic.** Every quest should be designed with at
+least two and ideally three resolution branches:
+
+| Branch type | What it means |
+|-------------|---------------|
+| **Clean fix** | Player understood the root cause and solved it properly. High trust, no downstream risk. |
+| **Acceptable fix** | Problem is solved but with a tradeoff — brittle approach, future maintenance burden, or incomplete cleanup. Lower trust. |
+| **Regression** | Player fixed the symptom but made something else worse. Negative trust. Story consequences. |
+
+The **regression branch** is not about punishment — it's about realism. A real
+sysadmin who removes all SSH restrictions to restore one person's access has
+technically solved the ticket while creating a larger problem. The story should
+treat this as realistic professional consequence, not a game-over failure.
+
+Players on a clean-fix path get more trust, unlock more access, and receive warmer
+character reactions. Players on a regression path continue playing but face the
+downstream effects of their choices.
+
+---
+
+## World Flags — Persistent Story State
+
+World flags are string keys set when a quest's branch resolves. They persist for
+the entire playthrough and can be read by later quests, incidents, and dialogue.
+
+Examples:
+- `hermes_logrotate_healthy` — set when the player properly fixed log rotation
+- `hermes_ssh_allowusers_fragile` — set when the player restored SSH access using
+  the brittle AllowUsers approach instead of the robust AllowGroups approach
+- `player_ssh_configured` — set when the player successfully set up SSH on day one
+
+World flags are how story continuity works. A later quest can check whether the
+player fixed something correctly earlier and behave differently. Marcus can reference
+a past fix. Priya can flag a previously introduced risk in a later audit. A problem
+that was "solved" with a quick fix can recur.
+
+**When designing a new quest, ask:** what flag should this set, and what future quests
+or dialogue might reference it?
+
+---
+
+## Trust — The Narrative Currency
+
+Trust is a numeric score that tracks the player's professional standing with Marcus
+and the IT team. It affects:
+
+- **VM access** — the player gains SSH access to hermes and vulcan as trust increases.
+  If trust drops badly, access can be revoked.
+- **Documentation access** — more trusted players get access to internal runbooks
+  and admin guides
+- **Character warmth** — Marcus's messages change tone subtly as trust grows
+- **Incident visibility** — at a certain trust level, the player starts seeing
+  background incidents before they become critical
+
+Trust is not displayed as a raw number. Players experience it as consequences.
+
+**For quest designers:** each branch should have a `trust_delta` that reflects the
+quality of the fix. A proper root-cause fix should earn more than a workaround.
+Regression branches should cost trust. Day-one onboarding quests are lenient;
+later quests at higher tiers should be less forgiving.
+
+---
+
+## Incidents — Consequences of Incomplete Fixes
+
+An incident is a time-delayed consequence that fires when a quest's partial-fix
+branch was taken. It represents the problem coming back.
+
+Example: The player clears a full disk by deleting a log file but doesn't restore
+the logrotate config. Two in-game hours later, the disk starts filling again. Dave
+notices. The player gets another ticket about the same symptom.
+
+Incidents are not punishments — they are realistic. The world doesn't stay fixed
+just because the player touched it. A player who takes clean-fix branches will
+rarely see incidents. A player who takes every shortcut will find their ticket queue
+filling up with problems they already "solved."
+
+For story purposes: incidents can also carry narrative weight. If the player made a
+security regression, an incident could represent an audit finding, an unusual login,
+or a configuration discrepancy Priya noticed.
+
+---
+
+## The Character Conversation Model
+
+Quest dialogue fires after a branch resolves. Three characters can speak:
+
+### Marcus Webb
+The primary voice. Appears in every quest. His post-resolution message reflects:
+- What the player actually did (not just whether they succeeded)
+- Whether they understood the root cause or just cleared the symptom
+- A forward-looking observation (usually a quiet flag for what's coming next)
+
+Marcus does not praise effusively or scold dramatically. He states what he observed.
+His message for a clean fix is warmer and sometimes wry. His message for a regression
+is brief and pointed. He never says "well done!" He might say "that's the right call."
+
+### Sarah Chen
+Speaks when the quest affects something product-facing (hermes being up or down,
+deploys working or failing). Her messages are reactive — she responds to outcomes,
+not process. She is not hostile unless the player makes her situation worse.
+
+### Priya Nair
+Speaks when the quest has security implications — access changes, hardening,
+audit posture. She does end-of-shift reviews that grade overall performance.
+Her per-quest messages are brief and evaluative. She notices things Marcus might not.
+
+### Other characters
+Dave Okonkwo files tickets. He does not have post-resolution dialogue — he
+just stops or starts noticing things. Future characters (Kowalski, Nikhil, Tanya)
+can speak in dialogue if quests are designed to involve them.
+
+---
+
+## The Narrative Arc
+
+The overall story has six phases. Quests should be designed with their phase in mind.
+The phase is usually not visible to the player — it emerges from what's happening
+around them.
+
+### Phase 1 — Normal Work
+*Tier 1 quests. Early game.*
+
+The player is new. Everything is routine. Marcus is helpful. The problems are real
+but not alarming — a broken config, a full disk, a permission issue. The player is
+learning the environment. The subtext is that things are slightly more wrong than
+they should be, but there's nothing to point at.
+
+Hidden layer: small anomalies in the systems that curious players can notice but
+don't have context for yet.
+
+### Phase 2 — Unease
+*Tier 1/2 transition.*
+
+The problems start to have patterns. The same kind of thing breaks twice. A fix
+the player made doesn't hold the way it should. Nothing is alarming, but Marcus's
+messages have a slightly different quality — he notices things he doesn't explain.
+
+Hidden layer: a world flag from an early quest points somewhere unexpected.
+
+### Phase 3 — Suspicion
+*Tier 2 quests. Mid game.*
+
+The player starts encountering problems they didn't cause and can't fully explain.
+Access was changed by someone. A config was edited recently. A log shows an
+unusual pattern. Nobody is accusing anyone. But the player now has enough context
+to start asking questions — even if no quest explicitly tells them to.
+
+This is where Dale becomes relevant again. The systems the player inherits were
+last touched by Dale. Some of them have been in a particular state for a long time.
+
+### Phase 4 — Investigation
+*Tier 2/3 transition.*
+
+The player has connected enough dots to understand that something happened before
+they arrived. The quests in this phase involve digging into logs, access records,
+and configuration history. The investigation is framed as professional work
+(audit the access logs, trace the package build history) — but the results tell
+a story.
+
+Marcus's messages are shorter. Priya starts appearing more. Kowalski schedules a
+meeting nobody explains.
+
+### Phase 5 — Conflict
+*Tier 3 quests. Late game.*
+
+The player knows what happened. Acting on that knowledge has professional
+consequences. The conflict is not physical — it is about what the player chooses
+to surface, who they tell, and what they do with access they were given for one
+purpose that could be used for another.
+
+### Phase 6 — Resolution
+*Endgame.*
+
+The situation resolves. The ending the player gets depends on the world flags
+accumulated across their entire playthrough — not just whether they clicked the
+"good ending" button. A player who took clean-fix branches throughout, built
+trust, and noticed the hidden anomalies gets a different ending than a player
+who patched symptoms, lost trust, and missed everything.
+
+---
+
+## What Makes a Good Quest Scenario
+
+The best quests have a **plausible mundane cause** and a **visible technical trail**.
+Players should never need to guess — they should be able to find the answer by
+looking at the right files and running the right commands.
+
+### Good scenario types
+- Service down → config syntax error → player traces error output to the line
+- Disk full → log file enormous → logrotate config missing → player restores it
+- Deploy fails → files owned by wrong user → someone ran a script as root manually
+- Build failures → clock drift → NTP not running → player enables time sync
+- Access locked out → sshd_config modified → wrong directive → player corrects it
+- App crashes after update → bad package from internal repo → player traces to source
+
+### What makes these work
+1. **The symptom is real and urgent.** Something is actually broken.
+2. **The cause is discoverable.** The evidence is in logs, config files, or system state.
+3. **The fix is a real Linux operation.** Not artificial — `chown`, `systemctl`, editing
+   a config, fixing a cron entry, rolling back a package.
+4. **Multiple approaches exist.** The quick fix works. The proper fix is better and
+   the game knows the difference.
+5. **The character reactions are grounded.** Sarah cares about the demo being up.
+   Priya cares about the access control implications. Marcus cares about whether the
+   player understood what they were doing.
+
+### Bad scenario types to avoid
+- Problems that require packages not in the VM's guaranteed baseline (see `QUEST_AUTHORING.md`)
+- Problems that require real-time events the validation engine can't check
+- Problems where the "correct" fix is the only fix (no meaningful branch differentiation)
+- Problems that break the fourth wall or require the player to know game-layer information
+- Problems that are gotchas rather than investigations (the cause can't be found by looking)
+
+---
+
+## Hidden Anomalies — Environmental Storytelling
+
+Every 3–5 quests should include something unusual in the VM environment that the player
+is not told about and not required to engage with. These are not quest objectives.
+They are breadcrumbs for curious players.
+
+Examples of the kind of thing these should be:
+- A user account that shouldn't exist
+- A log entry from an odd time that doesn't match the official history
+- A file that was modified recently but wasn't part of the quest setup
+- A cron job that's been disabled but was once important
+- An SSH key in authorized_keys that doesn't belong to anyone obvious
+
+These anomalies should be consistent with the overall narrative arc — a player who
+collects them across the whole game should be able to piece together what happened
+before they arrived. They should never be labelled, never referenced in objectives,
+and never required. They are for the players who look.
+
+---
+
+## Quest Output Format for Story Agents
+
+When proposing new quests, provide the following. This is the minimum needed for
+a technical author to implement the quest.
+
+```
+Quest ID: QXXX
+Title: [player-facing]
+Narrative phase: [1–6]
+Tier: [1, 2, or 3]
+
+Primary VM: [ares / hermes / vulcan]
+Additional VMs: [if any]
+
+Scenario summary:
+  What is broken, why it is broken (the root cause), and what the player
+  will encounter. 1–3 sentences. Written for the implementer, not the player.
+
+Ticket:
+  From: [character name]
+  Subject: [email subject line]
+  Body: [the email the player receives. Written in the sender's voice.
+         Describes the symptom. Does not explain the cause.]
+
+Clue trail:
+  What the player will find when they investigate. The evidence that leads
+  them to the root cause. Describe the actual files, log entries, and system
+  states — not the player's steps.
+
+Solution branches:
+  Branch 1 (clean fix, highest trust):
+    What the player has done. Why it's correct. Trust delta.
+  Branch 2 (acceptable fix):
+    What the player has done. What tradeoff it introduces. Trust delta.
+  Branch 3 (regression, if applicable):
+    What the player did wrong. What it breaks. Negative trust delta.
+
+Character reactions:
+  Marcus (post-resolution):
+    Clean: [what Marcus says]
+    Acceptable: [what Marcus says]
+    Regression: [what Marcus says]
+  Sarah / Priya (if relevant):
+    [reaction to the specific outcome that affects them]
+
+World flags set: [list flags each branch sets]
+Follow-up incident (if any): [what recurs if the acceptable-fix branch was taken]
+Hidden anomaly (if any): [something unusual seeded into the VM that's not part of
+  the quest objectives]
+Narrative notes: [anything a future quest author should know — Dale connections,
+  story threads this opens or closes, things characters should remember]
+```
+
+---
+
+## The Dale Thread — Notes for Story Designers
+
+Dale's story should emerge slowly from the systems themselves, not from exposition.
+When designing quests — especially mid-to-late game — consider:
+
+- **What did Dale last touch?** The VMs the player inherits have a history. Some
+  configurations were made by Dale. Some are good. Some are wrong in ways that
+  suggest Dale was dealing with something.
+
+- **What was Dale trying to do?** As the investigation phase develops, the picture
+  should become coherent. Dale wasn't random — there was a pattern to their actions.
+
+- **Who knew?** Marcus knew Dale. Priya may have been involved in whatever ended
+  Dale's tenure. Kowalski definitely knows. The player assembles this from fragments,
+  not a scene where someone explains it.
+
+- **The player is inheriting Dale's problems.** Some of the broken things the player
+  fixes are broken because Dale broke them. Some of the broken things were broken on
+  purpose. The player won't know which is which until later.
+
+The reveal of what Dale did should feel like the player figured it out, not like the
+game told them.
@@ -0,0 +1,187 @@
+# VM Build System
+
+## Overview
+
+VM provisioning uses a modular driver + profile pattern. One driver script handles
+the full build pipeline; per-VM profile files declare what makes each machine
+distinct. Adding a new VM means writing one profile file — no changes to the driver.
+
+## Structure
+
+```
+tools/vm/
+  build-vm.sh              # Driver — sources a profile and runs the build pipeline
+  build-workstation.sh     # Wrapper → build-vm.sh profiles/workstation.sh
+  build-web-server.sh      # Wrapper → build-vm.sh profiles/web-server.sh
+  build-build-machine.sh   # Wrapper → build-vm.sh profiles/build-machine.sh
+  profiles/
+    workstation.sh         # sc-workstation / ares    — XFCE desktop (Debian)
+    web-server.sh          # sc-web-server  / hermes  — nginx app server (Debian)
+    build-machine.sh       # sc-build-machine / vulcan — build toolchain (Arch)
+  lib/
+    common.sh              # Shared libvirt helpers (pool, domain, seed ISO, wait-for-IP)
+```
+
+## Invocation
+
+```bash
+# By wrapper (backwards-compatible)
+./build-workstation.sh [--dry-run] [--force]
+./build-web-server.sh  [--dry-run] [--force]
+./build-build-machine.sh [--dry-run] [--force]
+
+# By driver directly — profile name (no extension) or explicit path
+./build-vm.sh workstation [--dry-run] [--force]
+./build-vm.sh profiles/web-server.sh --force
+```
+
+`--dry-run` skips all libvirt/qemu-img calls and prints what would run.
+`--force` destroys and recreates a domain that already exists.
+
+## Profile Contract
+
+A profile is a bash file sourced by `build-vm.sh`. It must set these variables:
+
+| Variable | Example | Description |
+|----------|---------|-------------|
+| `DOMAIN` | `sc-web-server` | libvirt domain name |
+| `HOSTNAME` | `hermes` | Guest hostname |
+| `RAM_MB` | `512` | Memory in MB |
+| `VCPUS` | `1` | vCPU count |
+| `DISK_SIZE` | `8G` | qcow2 overlay size |
+| `GRAPHICS` | `vnc` | `vnc`, `spice`, `spice-qxl`, or `none` |
+| `BASE_URL` | `https://...` | URL to download base cloud image from |
+| `BASE_IMAGE` | `$SC_BASE_DIR/...` | Local path to cache the base image |
+
+It must also define `generate_user_data()` — a function that prints the complete
+cloud-init `#cloud-config` YAML to stdout. The driver calls this function and writes
+the output to the seed ISO. The following variables are available when the function
+runs (set by the driver after sourcing the profile):
+
+| Variable | Value |
+|----------|-------|
+| `PUBKEY` | Contents of `${SC_SSH_KEY}.pub` |
+| `GAME_HOST_IP` | `${SC_GAME_HOST_IP:-10.42.0.1}` |
+| `POOL_DIR` | Resolved libvirt pool path |
+| `DISK_PATH` | `$POOL_DIR/${DOMAIN}.qcow2` |
+| `SEED_ISO` | `$SC_SEED_DIR/${DOMAIN}-seed.iso` |
+
+Profile-specific variables (e.g. `HUD_URL`, `SAGE_URL`, `PRIVKEY_INDENT`) are set
+in the profile before `generate_user_data` is defined and are available inside it.
+
+## Writing a New Profile
+
+1. Copy `profiles/web-server.sh` as a starting point.
+2. Set the 8 required variables.
+3. Write `generate_user_data()` with the cloud-init YAML for the new machine.
+4. Run `./build-vm.sh profiles/my-new-vm.sh --dry-run` to validate.
+5. Run without `--dry-run` to build.
+
+No changes to the driver or any other file are needed.
+
+## Build Pipeline (driver)
+
+1. Parse `--dry-run` / `--force` flags
+2. Resolve and source the profile file
+3. Validate required variables and `generate_user_data` function exist
+4. Source `lib/common.sh` (sets `SC_*` env, exposes helpers)
+5. Run `ensure_vm_tooling` (checks virsh, qemu-img, virt-install, SSH keys, pool/network)
+6. If domain exists and `--force` not set: exit cleanly
+7. `download_if_missing` — fetch base image if not cached
+8. Call `generate_user_data` → write to tmpdir, build NoCloud seed ISO
+9. `destroy_domain` — remove existing domain if present
+10. `create_backing_disk` — qcow2 overlay over the base image
+11. `build_import_domain` — `virt-install --import`, enable autostart
+12. `wait_for_agent_ip` — poll QEMU guest agent for IP (up to 300 s)
+13. Cleanup tmpdir on exit (trap)
+
+## Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `SC_GAME_HOST_IP` | `10.42.0.1` | Host machine IP on the game network |
+| `SC_SSH_KEY` | `~/.ssh/sc_host_key` | SSH key pair used for all host→guest connections |
+| `SC_BASE_DIR` | See `common.sh` | Where base cloud images are cached |
+| `SC_SEED_DIR` | See `common.sh` | Where cloud-init seed ISOs are written |
+| `SC_POOL_NAME` | `sc-images` | libvirt storage pool |
+| `SC_NETWORK_NAME` | `sc-internal` | libvirt network |
+| `LIBVIRT_DEFAULT_URI` | `qemu:///system` | Override to `qemu:///session` for user-mode libvirt |
+| `SC_WORKSTATION_GRAPHICS` | `spice` | Override workstation graphics backend |
+
+## Current VMs
+
+| Profile | Domain | Hostname | OS | RAM | vCPUs | Disk | Graphics |
+|---------|--------|----------|----|-----|-------|------|----------|
+| `workstation.sh` | `sc-workstation` | `ares` | Debian 12 | 2048 MB | 2 | 20 G | SPICE |
+| `web-server.sh` | `sc-web-server` | `hermes` | Debian 12 | 512 MB | 1 | 8 G | VNC |
+| `build-machine.sh` | `sc-build-machine` | `vulcan` | Arch Linux | 768 MB | 2 | 10 G | VNC |
+
+## Hostname Resolution
+
+All VMs resolve internal hostnames via static `/etc/hosts`. There is no DNS server
+on the game network — this matches how small company networks often work before a
+proper internal DNS is set up.
+
+Each VM only has entries for the hosts it needs to reach:
+
+- **ares** (workstation): knows `hermes`, `vulcan`, `portal.axiomworks.internal`, `sage.axiomworks.internal`
+- **hermes**: knows `portal.axiomworks.internal`
+- **vulcan**: knows `hermes` (deploy target), `portal.axiomworks.internal`
+
+The `.axiomworks.internal` domain is fictional but realistic — real companies use
+private suffixes like `.internal` or `.corp` for their infrastructure.
+
+## Networking Notes
+
+- All VMs attach to the `sc-internal` libvirt network
+- The host machine (10.42.0.1) serves the game portal (`:3000`) and Sage KB (`/sage/`)
+- Fixed IPs used in `/etc/hosts` across VMs: hermes=10.42.0.40, vulcan=10.42.0.24
+- These must match the DHCP reservations configured in `network-sc-internal.xml`
+- IPv6 disabled on all VMs (sysctl) — not needed, reduces noise
+
+## Performance Tuning
+
+All VMs share a common sysctl baseline applied via `/etc/sysctl.d/`:
+
+| Setting | Value | Rationale |
+|---------|-------|-----------|
+| `vm.swappiness` | 10 | Prefer RAM; swap only under real pressure |
+| `vm.vfs_cache_pressure` | 50 | Keep inode cache warm longer |
+| `vm.dirty_ratio` | 15–25 | Batch writes; vulcan higher for build workloads |
+| IPv6 disabled | — | Removes unnecessary network overhead |
+
+All VMs have a swap file (512 MB – 1 GB depending on role) created at first boot.
+
+## DHCP Reservations and MAC Addresses
+
+Fixed IPs are set via DHCP reservations in `network-sc-internal.xml` and the live
+libvirt network. The reservations reference MAC addresses, which virt-install
+**generates fresh on every `--force` rebuild**. After any rebuild, the old
+reservation is stale and the VM will get a random IP from the pool.
+
+After a `--force` rebuild, update the reservations:
+
+```bash
+# 1. Get the new MAC
+virsh domiflist sc-web-server   # (or sc-workstation, sc-build-machine)
+
+# 2. Remove the old reservation (use the old MAC from network-sc-internal.xml)
+sudo virsh net-update sc-internal delete ip-dhcp-host \
+  "<host mac='OLD_MAC' name='hermes' ip='10.42.0.40'/>" --live --config
+
+# 3. Add the new one
+sudo virsh net-update sc-internal add ip-dhcp-host \
+  "<host mac='NEW_MAC' name='hermes' ip='10.42.0.40'/>" --live --config
+
+# 4. Update network-sc-internal.xml to match
+```
+
+The VM will pick up the reserved IP on its next DHCP renewal (or reboot).
+
+### Current reservations
+
+| VM | Domain | Hostname | MAC | IP |
+|----|--------|----------|-----|----|
+| Workstation | sc-workstation | ares | `52:54:00:bd:aa:29` | 10.42.0.36 |
+| Web server | sc-web-server | hermes | `52:54:00:49:9b:64` | 10.42.0.40 |
+| Build machine | sc-build-machine | vulcan | `52:54:00:5e:9f:b9` | 10.42.0.24 |
@@ -0,0 +1,56 @@
+# Workstation Polish Backlog
+
+Captured from playtest notes. These items are intentionally left unresolved for a later pass.
+
+## Launcher And Viewer
+
+- ~~Make `./scripts/start-game.sh` executable by default.~~ **RESOLVED** — file is `rwxr-xr-x`.
+- ~~Prevent Chromium from auto-launching on workstation login.~~ **RESOLVED** — removed the `game-hud.desktop` autostart entry from `workstation.sh`. Players open the Axiom Works portal from the desktop launcher when they want it.
+- ~~Fix fullscreen toggling in the workstation viewer. The current `FULLSCREEN.txt` says `Shift+F12` but that is the cursor-release binding; fullscreen toggle is `F11`.~~ **RESOLVED** — Renamed to `VIEWER_HELP.txt`, corrected key bindings, expanded to cover fullscreen, cursor release, zoom, copy/paste, and USB redirect.
+- Make sure the player can exit fullscreen without shutting down the VM.
+- Investigate whether virt-viewer / the SPICE client can auto-detect and apply the host's native resolution when entering fullscreen mode. SPICE supports dynamic resolution via the vdagent service (already installed); verify the guest `spice-vdagent` is running and that the display XML uses `<channel name="spicevmc"/>` so resize events actually reach the guest.
+
+## HTTPS / TLS
+
+- Make all in-VM websites (portal, Sage, company website) serve over HTTPS. Approach: generate a self-signed CA during workstation cloud-init, install it into Chromium's trust store and the system CA bundle, then issue a wildcard or multi-SAN cert for `*.axiomworks.corp`, `*.axiomworks.internal`, and `portal.axiomworks.internal`. Configure the game server to serve TLS (or put nginx in front for all sites), and update all internal URLs to `https://`. No browser warnings, everything looks legitimate. Not required for gameplay but raises the production feel significantly.
+
+## Desktop UX
+
+- ~~Ensure the Axiom Works portal desktop icon is executable/trusted out of the box.~~ **RESOLVED** — `Portal.desktop` is provisioned with permissions `0755`, and `workstation.sh` seeds GVFS trusted metadata with a login-time reload fallback.
+- Remove mail from the top of the XFCE applications menu, since the portal handles email. (Low priority — no mail client is installed, so this is unlikely to appear.)
+- ~~Set Tilix as the default terminal entry in the applications menu.~~ **RESOLVED** — `update-alternatives --set x-terminal-emulator /usr/bin/tilix` and `helpers.rc` both configured in `workstation.sh` runcmd.
+- The XFCE **Applications → System → Terminal Emulator** menu entry still launches the XFCE terminal emulator instead of Tilix. `update-alternatives` sets the system default but XFCE's own preferred-applications config (`xfce4-terminal.desktop` precedence) overrides it for that menu entry. Fix by either: removing `xfce4-terminal` from the installed packages, or writing a `~/.config/xfce4/helpers.rc` entry that explicitly maps `TerminalEmulator=tilix`, or adding a `preferred-applications.xml` override in the XFCE config directory.
+- ~~Keep the XFCE dark theme as the default desktop theme.~~ **RESOLVED** — `xsettings.xml` sets `Adwaita-dark` theme in `workstation.sh`.
+- ~~Tilix launched from the desktop icon opens in `/Desktop` by default instead of `/home/player`. Fix the `Terminal.desktop` launcher to set `Path=/home/player` so the initial working directory is the home directory.~~ **RESOLVED** — `Path=/home/player` added to `Terminal.desktop` in `build-workstation.sh`.
+- ~~Preserve clean desktop icon placement after removing `cidata`.~~ **RESOLVED** — `workstation.sh` seeds XFCE desktop icon layout files so Terminal and Portal sit in the chosen top-right positions and viewer help stays bottom-left after rebuilds.
+
+## Workstation Lifecycle
+
+- ~~Take a clean snapshot after the workstation is fully configured and validated.~~ **RESOLVED** — `seed-vms.sh` takes `baseline.day-one` and `baseline.recovery` snapshots after workstation build.
+- ~~Treat workstation shutdown as the end-of-shift game exit; save workstation state.~~ **RESOLVED (server side)** — `VMManager.ensureWorkstationLive()` in the Node.js server handles startup. Game server cleanly shuts down when `start-game.sh` exits (SIGTERM). VM suspend-on-quit is a future enhancement.
+- ~~Rebuild or restore from the clean snapshot when needed, but allow the live workstation to drift during play.~~ **RESOLVED** — `always_live: true` in `workstation.json` means shift checkpoints skip the workstation; it drifts freely and is only restored from `baseline.recovery` on catastrophic failure.
+
+## Terminal Experience
+
+~~All in-game terminal simulation items are obsolete~~ — the player uses a real Tilix terminal directly in the XFCE workstation VM. Arrow key history, tab completion, copy/paste, scrollback, and interactive programs (vim, htop, etc.) all work natively.
+
+## Browser and Bookmarks
+
+- The Chromium bookmarks bar shows the default Debian bookmarks. The game-specific bookmarks are buried under a "Managed bookmarks" folder instead of sitting directly in the bar. Move the managed bookmarks to the top-level bar and remove the default Debian entries. This is controlled by the `ManagedBookmarks` policy in `/etc/chromium/policies/managed/bookmarks.json`; restructure the JSON so items appear at bar level rather than inside a named folder.
+- ~~All four managed bookmarks go to the same URL; anchors don't work.~~ **RESOLVED** — Bookmarks reduced to two: "Axiom Works Portal" and "Sage (KB)" at `/sage/`.
+
+## Sage — Knowledge Base
+
+- Sage is intended to be a navigable knowledge base, not just a search box. It should feel like a real internal company wiki: organized into sections and categories that a player can explore by browsing, in addition to searching. The content is the KB data already planned for the game.
+- Search should be lightweight and practical — something like Meilisearch (or a similarly small embedded-first search server) that indexes the KB content and serves fast full-text results without requiring a heavy backend.
+- Sage should be a completely separate web application from the Axiom Works portal. It should have its own URL, its own visual design (distinct look and feel from the portal), and its own place in the bookmarks bar. In a realistic company, documentation tools are separate products (Confluence, Notion, internal wikis) from the ticketing portal — Sage should feel the same way.
+- Add a Sage bookmark to Chromium once Sage has its own URL.
+
+## VM Performance
+
+- ~~Guest VM RAM maxed causing hangs.~~ **RESOLVED** — `RAM_MB` raised to 1536 MB; 1 GB swap file added via `runcmd` in `build-workstation.sh` (fallocate + mkswap + fstab entry). Rebuild required to take effect.
+
+## Visual Cleanup
+
+- ~~Hide or remove the `cidata` desktop icon.~~ **RESOLVED** — `build-vm.sh` detaches the cloud-init seed ISO after workstation readiness, so the CD-ROM is not exposed on the desktop or in file-manager device lists. `xfce4-desktop.xml` also keeps removable/device desktop icons hidden as a fallback.
+- ~~Hide the internal `VirtIO Disk` from Thunar's Computer view.~~ **RESOLVED** — `workstation.sh` installs a udev rule setting `UDISKS_IGNORE=1` on `vd*` system disk devices, keeping internal VM storage out of player-facing file-manager device lists.
@@ -0,0 +1,459 @@
+# Characters — Sysadmin Chronicles
+
+Story design reference. All characters, bios, relationships, and open story hooks.
+For company/world context see `COMPANY_LORE.md`. This file focuses on the people.
+
+---
+
+## Active Characters
+
+These characters have an established in-game voice and presence. Any new quest work
+should treat their characterization here as canonical.
+
+---
+
+### The Player
+**Role:** New junior sysadmin hire, day one  
+**Identity:** Unnamed. Player-selected portrait (5 options).
+
+Hired to replace Dale. Nobody will explain what Dale did. Badge number is still
+pending — temp credentials were handled by someone in Finance on their first day.
+The player is a competent professional, not a bumbling intern. They may not know
+every answer but they know how to look.
+
+The player has no spoken lines. Their character is expressed entirely through the
+choices they make when fixing things — whether they understand root causes or just
+clear symptoms, whether they leave systems better or just less broken.
+
+---
+
+### Marcus Webb
+**Role:** Senior Systems Administrator  
+**Email:** `m.webb@axiomworks.internal`  
+**Reports to:** Dave Kowalski (Director of IT)
+
+Six years at Axiom Works. Hired by Kowalski. Knows where everything is, why it's
+there, and which parts were a mistake. Communicates in short, precise messages.
+Does not explain things twice. Trusts competence over credentials — he will give
+the player more rope as they demonstrate they know what to do with it. If they
+don't, the rope gets shorter.
+
+He was the one who onboarded the player. He assigned their first ticket. He will
+assign most of the tickets that follow. His messages range from brief task
+assignments to late-night observations about something that's been on his mind —
+the latter usually mean something is about to become a problem.
+
+He knows what Dale did. He has decided not to discuss it.
+
+**Personality:** Dry. Technically precise. Does not perform enthusiasm. Occasionally
+wry but never jokey. Respects players who fix root causes. Mildly annoyed by
+players who fix symptoms and call it done.
+
+**Relationships:**
+- Kowalski: reports to him; respectful but not deferential
+- Sarah: professional; takes her tickets seriously, occasionally says quiet things when she's wrong
+- Priya: mutual professional respect; they operate in the same zone of "things that matter when they go wrong"
+- Phil Ruiz (Sales VP): warm; Phil owes Marcus for saving a demo once and Marcus has never mentioned it
+
+---
+
+### Sarah Chen
+**Role:** Product Manager, AxiomFlow  
+**Email:** `s.chen@axiomworks.internal`
+
+Owns the AxiomFlow product roadmap. Coordinates between sales, engineering, and
+customers. Emails Monday mornings. Cares intensely about the demo and staging
+environments because those are the product she can actually see and touch. Not wrong
+about their importance.
+
+She files tickets when things break on the product-facing side. Her descriptions of
+problems are accurate about symptoms and often wrong about causes — she will
+confidently diagnose a permissions issue as a script bug, or a package problem as a
+config error. She is not incompetent; she just doesn't have the full picture. When
+the player fixes the underlying cause rather than the surface symptom, she notices.
+
+She has a sharp edge when things get worse after someone touches them. She will say
+so, clearly, without being melodramatic about it.
+
+**Personality:** Direct. Metric-oriented. Not patient with vague timelines or "we're
+looking into it." Appreciates being told what the actual problem was, not just that
+it's fixed.
+
+**Relationships:**
+- Marcus: professional; trusts that her tickets will be handled, doesn't ask for much
+- Player: initially impersonal (they're new); warms or cools based on outcomes
+- Nikhil Sharma: upstream dependency — his build pipeline affects her deployments
+
+---
+
+### Priya Nair
+**Role:** Head of Security & Compliance  
+**Email:** `p.nair@axiomworks.internal`  
+**Direct report:** James Osei (Security Analyst)
+
+Leads all security reviews, access audits, and compliance programmes. Has a standing
+Thursday meeting with David Park (CTO) that has existed since 2017. Was brought in
+after an incident nobody discusses in public. Has been building the security function
+from something informal into something that can survive a SOC 2 audit.
+
+She frames everything in terms of what happens when things go wrong, not whether they
+will. She assumes breach. She assumes misconfiguration. She is often right. She is
+not someone who appreciates hearing about a production change after it has already
+happened.
+
+She will tell the player when a fix is correct and why. She will also tell them when
+a fix works but leaves the environment in a worse position than before. She is not
+punitive about this — she just states it.
+
+She does shift reviews at end-of-shift and grades the player's overall performance.
+Her criteria: did the work move forward, did the environment stay stable, did the
+player create extra problems.
+
+**Personality:** Precise. Consequence-focused. Calm in tone even when the content
+is not calm. Economical with words. Does not use exclamation marks.
+
+**Relationships:**
+- Player: evaluative; her trust is earned by demonstrating that security is a
+  consideration, not an afterthought
+- Marcus: peer respect; they operate in different domains with overlapping concerns
+- Dave Kowalski: reports indirectly up through him for infrastructure decisions
+- David Park: standing Thursday meeting; she has the CTO's ear
+
+> **Name note for developers:** The in-game email service and some ticket files
+> previously used "Priya Kapoor" and the onboarding doc used "Priya Singh."
+> These are all the same character. **Priya Nair** is the canonical name.
+> Email should be `p.nair@axiomworks.internal`. Update references in
+> `server/src/services/EmailService.js`, `content/tickets/T007.json`, and
+> `content/docs/onboarding.json`.
+
+---
+
+### Dave Okonkwo
+**Role:** Internal employee, non-technical  
+**Email:** `d.okonkwo@axiomworks.internal`
+
+A regular Axiom Works employee who notices when things aren't working and files
+tickets about it. He doesn't know enough to diagnose the problem — he reports
+symptoms accurately and assumes the wrong cause. His reports are useful precisely
+because they represent what a non-technical user actually experiences.
+
+He is not on the company website (280 employees, most of them aren't). He's
+somewhere in operations or general staff. He's not in Finance, not in IT.
+
+> **Open decision:** Dave Okonkwo is currently the only employee-level character who
+> submits tickets. The company website has Dave Kowalski as Director of IT Operations
+> (Marcus's boss), which is a completely different person. This is not a naming
+> inconsistency — they're two different people. However: if the story wants Kowalski
+> to become an active character who also files tickets or escalates issues, that's a
+> separate thread. Okonkwo and Kowalski coexist.
+
+---
+
+## Named Background Characters
+
+On the company website. No current in-game presence. Available for story use —
+they can send emails, appear on CC lines, be referenced in dialogue, or become
+active characters in new quests.
+
+Listed in rough order of story relevance to the IT/sysadmin context.
+
+---
+
+### Dave Kowalski — Director of IT Operations
+Marcus's manager. The player's skip-level. Background is network engineering —
+has Cisco certifications he will not volunteer unless provoked. Oversees systems
+(Marcus's domain), networking (Tom Malaney), and IT support. Has been at Axiom
+Works since 2015. Describes the infrastructure as "mature." Sends weekly status
+emails in bullet points that never quite answer the question. When things go wrong
+he schedules a meeting to "talk through the situation," which everyone has learned
+is worse than a direct message.
+
+Has said "we should really document that" more times than he can count. Has
+documented very little personally. Maintains a mysterious Tuesday 2–3pm calendar
+block.
+
+Story use: source of policy pressure, indirect escalation, the person who asks
+questions that reveal Marcus hasn't told the player everything.
+
+---
+
+### Nikhil Sharma — Platform Engineer
+Owns the internal build and release pipeline, the CI infrastructure, and the
+parts of deployment that nobody else wants to think about. Strong opinions about
+reproducible builds. Sends Slack messages at 6am. Occasionally at 11pm.
+
+He is the engineer most directly connected to what happens on vulcan — if a build
+is broken, it's probably something Nikhil built or maintains. He has never met the
+player. He almost certainly doesn't know the player exists.
+
+Story use: the author of broken packages the player has to debug; a character who
+can explain (or fail to explain) what went wrong upstream; an escalation path when
+a build problem is genuinely his fault.
+
+---
+
+### Tanya Okafor — Head of Customer Success
+Manages post-sale relationships for all AxiomFlow customers and the twelve legacy
+AxiomSync accounts that haven't migrated. Uses the word "partnership" a lot.
+
+Usually the first person to know when something is wrong in production, because a
+customer has already called her before IT knows there's a problem. Her call log
+is an early warning system. She is not hostile to IT but she has learned that
+"we're looking into it" is not an answer she can give a customer.
+
+Story use: pressure vector from the customer direction; source of urgency that
+doesn't come from Marcus or the ticket queue; demonstrates real-world stakes when
+things go down.
+
+---
+
+### Phil Ruiz — VP of Sales
+Has been promising features to prospects since 2016. Maintains a warm relationship
+with the infrastructure team because Marcus once fixed the staging environment with
+twenty minutes to spare before a major demo — Phil has never forgotten this. Travels
+frequently. Expense reports submitted promptly, which Marcus has noted approvingly.
+
+Story use: indirect beneficiary when demos work; pressure source when a sales demo
+is scheduled and something is broken; the person who will tell the CTO what IT did
+right in a room the player will never be in.
+
+---
+
+### Yusuf Halabi — Engineering Manager
+Reports to David Park (CTO). Manages the core AxiomFlow platform team. Runs the
+Thursday architecture review. Has opinions about test coverage. Leaves pull request
+comments that are technically correct and diplomatically suboptimal.
+
+Story use: engineering-side escalation; source of tickets about internal tooling;
+the person who will ask why a config change broke a downstream process.
+
+---
+
+### Derek Ashford — Financial Controller
+Does not appear at team meetings. Does appear on CC lines of every email that
+mentions cloud costs, hardware procurement, or infrastructure budget. Always
+replies-all. His manager is Rachel Brandt (CFO).
+
+Story use: background texture on procurement requests; the voice that makes any
+infrastructure spending feel like a negotiation.
+
+> **Note on "Dave from Finance":** Marcus's day-one message references "Dave from
+> Finance" as the person holding the player's temp credentials. This is almost
+> certainly Derek Ashford — Marcus using his first name informally, or a
+> continuity error. Derek Ashford is the only Finance character plausibly holding
+> IT credentials. His first name is Derek, not Dave — either the message should
+> be corrected, or "Dave from Finance" is a third unnamed Finance employee.
+
+---
+
+### Rachel Huang — Systems Administrator
+Marcus's peer on the IT team. Handles provisioning, patch cycles, and the ongoing
+negotiation with Finance over cloud consolidation. Came from a managed services
+background. Has strong opinions about monitoring dashboards, most of which are
+correct.
+
+Story use: the person who set something up that the player now has to maintain;
+a colleague who can provide context Marcus won't; someone whose provisioning
+decisions the player will encounter as infrastructure.
+
+---
+
+### Tom Malaney — Network Engineer
+Responsible for network infrastructure across the office and hosted environments.
+On-call for more holiday weekends than he would like. Thorough in documentation
+when he finds time for it.
+
+Story use: DNS, firewall, or routing problems that are not the player's fault
+but become the player's problem; someone who can be reached but is slow to
+respond.
+
+---
+
+### James Osei — Security Analyst
+Priya's direct report. Handles vulnerability assessments, access reviews, and
+quarterly compliance reporting. Methodical. Has a spreadsheet for everything,
+which is not a criticism.
+
+Story use: the person who runs the actual audit that Priya will summarize to the
+player; a source of detailed (sometimes overwhelming) security findings.
+
+---
+
+### Ellen Marsh — CEO & Co-Founder
+Built the first version of AxiomFlow after a decade in operations. No CS background.
+Attends all-hands twice a year. Does not use Slack. Has final say on pricing and
+major customer commitments.
+
+Story use: the distant authority whose priorities shape everything; never interacts
+with the player directly, but her decisions land as constraints.
+
+---
+
+### David Park — CTO & Co-Founder
+Wrote the original rules engine in 2011. Now manages engineering managers. Still has
+opinions about the data model. Has a standing Thursday meeting with Priya that hasn't
+moved since 2017.
+
+Story use: architectural decisions from above; the person Priya reports significant
+security findings to.
+
+---
+
+### Karen Volkov — COO
+Joined 2014. Responsible for the fact that the company has documented processes for
+anything at all. Has opinions about infrastructure costs that surface in IT's world
+via Finance. Prefers decisions with clear owners and deadlines.
+
+---
+
+### Rachel Brandt — CFO
+Joined 2016. Approves all capital expenditure over $5,000. Working to consolidate
+cloud spend. Does not enjoy surprises in the infrastructure budget. Derek Ashford
+reports to her.
+
+---
+
+### Mei Lin — Senior Software Engineer
+Has maintained AxiomSync's integration layer since 2018. Knows more about it than
+anyone would prefer, including herself. Currently leading the migration tooling
+project for the remaining legacy accounts.
+
+---
+
+### Cora Reyes — Software Engineer
+Works on the AxiomDash reporting pipeline. Has submitted more internal RFCs than
+anyone else on the team in the past year. Moving toward senior.
+
+---
+
+### Ben Portillo — Product Manager, AxiomDash
+Leads product development for the analytics add-on. Works closely with large
+accounts to understand what they actually want from dashboards (usually different
+from what they asked for).
+
+---
+
+### Annika Gosse — UX Designer
+Responsible for AxiomFlow's interface. Has been advocating for a redesign of the
+workflow builder since 2022. Patient.
+
+---
+
+### Sandra Wu — HR Manager
+Manages hiring, onboarding, and employee relations since 2016. Runs the new-hire
+onboarding process (three days, thorough). Sends birthday emails on time, every time.
+
+---
+
+### Owen Blake — Office Manager
+Keeps the office running. Has fixed more things than his job title implies. The
+person to contact if conference room equipment stops working.
+
+---
+
+### Mike Kawamoto — Account Executive
+Handles mid-market manufacturing accounts in the northeast. Believes strongly in
+the demo environment. Closes more deals in Q4 than any other quarter.
+
+---
+
+### Lisa Ferreira — Customer Success Manager
+Manages onboarding for new AxiomFlow deployments. Has a talent for understanding
+what customers mean rather than what they say.
+
+---
+
+## Unresolved Characters (Story Hooks)
+
+These are referenced in existing content but never defined. They represent the
+strongest open narrative threads.
+
+---
+
+### Dale — The Previous Sysadmin
+**Reference:** Marcus's day-one message — "You're replacing Dale. Nobody will tell you
+what Dale did because it's complicated."
+
+Dale is gone. The player has their desk, their access provisioning slot, and
+apparently their reputation — people know the player is "Dale's replacement" before
+they know the player's name. The systems the player inherits are the systems Dale
+last touched.
+
+What Dale did is unknown. It is described as "complicated." Marcus knows. Possibly
+Kowalski knows. Possibly Priya knows, if it was security-related.
+
+This is the strongest existing narrative mystery in the game. It has setup and no
+payoff. Dale's story could be:
+- A technical incident (something Dale broke and couldn't fix)
+- A policy violation (something Dale did that wasn't malicious but wasn't right)
+- A trust collapse (competent but burned bridges)
+- Something personal
+- Any combination
+
+The player finding out what Dale did — gradually, through the systems they work on,
+through things people let slip — is a natural story spine for the whole game.
+
+---
+
+### "Dave from Finance" — Day One Reference
+**Reference:** Marcus's day-one message — "Dave from Finance has your temp credentials.
+He's on three today."
+
+Almost certainly Derek Ashford (Financial Controller), referred to informally. But
+Derek's first name is Derek, not Dave — this is either Marcus being casual with
+names, a continuity error, or a genuinely separate unlisted Finance employee.
+
+Needs a decision: correct "Dave" to "Derek" in Marcus's message, or introduce a
+separate "Dave from Finance" as a minor character.
+
+---
+
+## Key Relationships Map
+
+```
+Ellen Marsh (CEO)
+  └── David Park (CTO)
+        └── Yusuf Halabi (Eng Manager)
+              ├── Mei Lin
+              ├── Cora Reyes
+              └── Nikhil Sharma
+  └── Karen Volkov (COO)
+  └── Rachel Brandt (CFO)
+        └── Derek Ashford (Financial Controller)
+  └── Phil Ruiz (VP Sales)
+        ├── Mike Kawamoto
+        └── Tanya Okafor
+              └── Lisa Ferreira
+
+Dave Kowalski (Director of IT)
+  ├── Marcus Webb  ←── Player's manager
+  │     └── [Player]
+  ├── Rachel Huang
+  └── Tom Malaney
+
+Priya Nair (Head of Security)
+  └── James Osei
+
+Sarah Chen (Product, AxiomFlow)  ←── frequent ticket source
+Ben Portillo (Product, AxiomDash)
+Annika Gosse (UX)
+```
+
+---
+
+## Tone Notes for New Story Work
+
+- **Marcus talks like someone who has answered this question before.** Precise, low
+  affect, no wasted words. Never condescending — just efficient.
+- **Sarah talks like a PM: outcome-focused, slightly impatient, specific about
+  what she needs.** She is not a villain. She has real deadlines.
+- **Priya talks like someone who has already thought about what goes wrong.** She
+  doesn't speculate — she states. She's not alarming, she's matter-of-fact.
+- **Dave Okonkwo talks like someone who doesn't know what the problem is** but is
+  trying to be helpful by reporting exactly what he observed. He should never be
+  made to look stupid — he's doing the right thing.
+- **The company takes itself seriously.** Humor comes from the gap between official
+  language and reality, not from anyone being a cartoon.
+- **Problems have plausible causes.** Systems broke because someone made a
+  reasonable decision under time pressure, not because they were careless idiots.
+  The player should feel like a professional, not a janitor.
@@ -0,0 +1,165 @@
+# Axiom Works — Company Lore Reference
+
+> For quest authors, dialogue writers, and ticket copy. Keep the tone dry and
+> believable. The company should feel real, slightly dysfunctional, and just
+> plausible enough that players recognise the type.
+
+---
+
+## Who They Are
+
+**Axiom Works** is a B2B enterprise software company founded in 2011. Headquarters
+is in a three-floor office park that is technically "downtown adjacent" depending
+on how charitable you are with the map. They have about 280 employees. The
+Glassdoor rating is 3.8 stars and management checks it obsessively.
+
+Their flagship product is **AxiomFlow** — a workflow automation platform aimed at
+mid-size manufacturers, logistics companies, and anyone who got a 90-minute demo
+and thought it looked easy. Most customers are still on the workflow they set up
+in 2019. The platform does what it says. Marketing says it does considerably more.
+
+---
+
+## Products
+
+| Product | Description | Status |
+|---------|-------------|--------|
+| **AxiomFlow** | Workflow automation platform | Active, main revenue |
+| **AxiomDash** | Reporting and analytics add-on | Active, profitable, under-resourced |
+| **AxiomSync** | Legacy data integration layer | End-of-sale since 2021, still maintained for 12 customers who refuse to migrate |
+
+The current marketing tagline is *"Streamline. Scale. Succeed."* It replaced
+*"Work smarter, not harder"* in Q3 of last year. The one before that mentioned
+AI. Nobody is sure what the AI was.
+
+---
+
+## Infrastructure
+
+The company runs a mix of on-prem servers (named after Greek gods — a choice made
+by a contractor in 2017 who left before documenting anything) and a handful of
+cloud instances that accounting keeps trying to consolidate.
+
+| Host | Role | Notes |
+|------|------|-------|
+| **ares** | Player workstation | XFCE desktop, where the player works |
+| **hermes** | Web/app server | nginx, staging and demo environment for AxiomFlow |
+| **vulcan** | Build machine | Arch Linux, compiles artifacts, runs scheduled jobs |
+
+### Planned future systems
+As the game grows, additional machines will be added. Candidates:
+
+| Proposed host | Role | Greek connection |
+|---|---|---|
+| **poseidon** | Database server | Foundation, depths, reliability |
+| **apollo** | Mail / notification server | Messenger, communication |
+| **athena** | Internal tooling (ticketing, wiki) | Wisdom, knowledge management |
+| **argus** | Monitoring / alerting | The hundred-eyed watcher |
+| **mnemosyne** | Backup / storage | Memory, persistence |
+
+---
+
+## Characters
+
+### Dave Kowalski — Director of IT Operations
+The player's skip-level manager. Has been at Axiom Works since 2015. Hired Marcus.
+Oversees three teams: systems (Marcus's domain), networking, and IT support. Background
+is originally networking — has Cisco certifications he won't bring up unless someone else
+brings up Cisco certifications first. Sends weekly status emails formatted in bullet
+points that never quite answer the question you were asking. When things go wrong he
+schedules a meeting to "talk through the situation," which everyone has learned is
+worse than an email. Maintains a calendar block from 2–3pm on Tuesdays that nobody
+has ever asked about. Has said "we should really document that" approximately 400 times.
+Describes the infrastructure as "mature."
+
+### Marcus Webb — Senior Sysadmin
+The player's manager and the person who assigned them the ticket. Has been at
+Axiom Works for six years. Knows where all the bodies are buried. Communicates
+primarily in terse Slack messages and occasionally very long emails sent at 11pm.
+Trusts competence over process. Gets irritated by people who confuse symptoms
+with root causes.
+
+### Priya Nair — Security / Compliance
+Runs security reviews and has opinions about everything. Usually right. Tends to
+frame concerns in terms of what will happen when things go wrong rather than
+whether they will. Was brought in after an incident nobody talks about in public.
+
+### Sarah Chen — Product Manager
+Represents the product team's perspective in the ticket queue. Cares about demo
+environments more than production ones because demos are what she can see. Not
+technically wrong about their importance. Emails at 8am on Mondays.
+
+### Derek Ashford — Financial Controller
+Does not appear in person. Appears on CC lines of emails where infrastructure
+costs are being discussed. Always replies-all. His full name is Derek Ashford.
+His manager is Rachel Brandt (CFO).
+
+---
+
+## Background Characters (non-interactive, for world texture)
+
+These characters exist on the company website and in lore but do not appear in
+quests or dialogue. Use them for verisimilitude — email headers, CC lines, internal
+wiki author credits, that sort of thing.
+
+### Ellen Marsh — CEO & Co-Founder
+Built AxiomFlow after a decade in operations. Not technical. Attends all-hands
+twice a year. Has final say on pricing and major customer commitments. Does not
+use Slack. The player will never interact with her.
+
+### David Park — CTO & Co-Founder
+Wrote the original rules engine. Now manages engineering managers. Still has
+opinions about the data model. Has a standing Thursday meeting with security
+that hasn't moved since 2017.
+
+### Karen Volkov — COO
+Joined 2014. Responsible for the fact that Axiom Works has documented processes
+for anything. Has opinions about infrastructure costs. Prefers decisions with
+clear owners and deadlines.
+
+### Rachel Brandt — CFO
+Joined 2016. Approves all capital expenditure over $5,000. Does not enjoy
+surprises in the infrastructure budget. Derek reports to her.
+
+### Phil Ruiz — VP of Sales
+Has been promising features to prospects since 2016. Has a warm relationship
+with the infrastructure team because Marcus once saved a demo with 20 minutes to
+spare. Expense reports submitted promptly.
+
+### Tanya Okafor — Head of Customer Success
+Manages all post-sale customer relationships including the twelve AxiomSync
+holdouts. Usually the first to know when something is wrong in production,
+because a customer has already called her.
+
+### Yusuf Halabi — Engineering Manager
+Reports to the CTO. Manages the core AxiomFlow platform team. Has opinions
+about test coverage. Runs the Thursday architecture review.
+
+### Mei Lin — Senior Software Engineer
+Has maintained AxiomSync's integration layer since 2018. Knows more about it
+than anyone would prefer.
+
+### Nikhil Sharma — Platform Engineer
+Owns the build and release pipeline and internal CI infrastructure. Occasionally
+sends Slack messages at 6am.
+
+### Sandra Wu — HR Manager
+Manages hiring, onboarding, and employee relations since 2016. Sends birthday
+emails on time, every time. Runs the new-hire onboarding process that takes
+three days.
+
+---
+
+## Tone Guidelines
+
+- **Dry, not sarcastic.** The company takes itself seriously. The humour comes
+  from the gap between how they describe things and what's actually happening.
+- **Specific, not generic.** "The AxiomSync customer in Cincinnati keeps calling"
+  is better than "a client is upset."
+- **Plausible dysfunction.** Problems happen because of reasonable decisions made
+  under time pressure, not because people are incompetent. The player should feel
+  like a real professional, not a janitor.
+- **No cartoon villains.** Derek from Finance is not evil. The product team is not
+  stupid. They have different priorities.
+- **The infrastructure has history.** It was built over time. Some parts are good.
+  Some parts were good in 2017. The player's job is to keep it working.
@@ -0,0 +1,419 @@
+# Quest Authoring
+Use this guide when adding new JSON quests under `content/quests/`.
+
+Quest files describe observed VM state. They are not command scripts and they
+should model real Linux behavior, not puzzle logic detached from the system.
+
+For complete worked files, see [`docs/AUTHORING_EXAMPLES.md`](/home/aaron/Programming/sysadmin-chronicles/docs/AUTHORING_EXAMPLES.md).
+
+## Quest JSON Schema
+
+### Root Fields
+
+| Field | Type | Description |
+| --- | --- | --- |
+| `id` | string | Quest ID, for example `Q005`. |
+| `title` | string | Player-facing quest title. |
+| `tier` | int | Difficulty tier, usually `1`, `2`, or `3`. |
+| `primary_vm` | string | Main VM for the quest. Current authored values are `workstation`, `web_server`, and `build_machine`. |
+| `required_vms` | string[] | Every VM the quest touches. Include all VMs used in clues, validation, or prep. |
+| `ticket_id` | string | Links to `content/tickets/<id>.json`. |
+| `baseline_snapshot` | string | Snapshot name that the prep script should restore or build from. |
+| `summary` | string | Short internal scenario summary. |
+| `clue_fingerprint` | object | Advisory description of the evidence seeded into the baseline. |
+| `objectives` | object[] | Objective list shown to the player and used for progress checks. |
+| `solution_branches` | object[] | Branches the validator can resolve to. Higher-priority valid branches win. |
+| `pressure_profile` | string or null | Optional pressure/escalation profile name. |
+| `blast_radius` | string[] | Incident IDs that this quest can affect or trigger. |
+| `unlock_requirements` | string[] | Prerequisites such as `world_flag:` entries. |
+| `tags` | string[] | Search and classification tags. |
+| `internal_notes` | string | Author-only notes for reviewers. |
+| `_note` | string | Optional author-only comment. Existing content uses this at root and inside nested objects. |
+
+### `clue_fingerprint`
+
+`clue_fingerprint` is advisory. It documents what evidence the baseline already
+contains so content reviewers can confirm the clue trail is real.
+
+| Field | Type | Description |
+| --- | --- | --- |
+| `description` | string | Plain-language explanation of the clue trail. |
+| `evidence` | object[] | Evidence items that point to the issue. Use the same general shape as the relevant validation type. |
+
+Common evidence shapes in existing content:
+
+- File and log evidence usually includes `type`, `vm`, `path`, and `contains`
+- State evidence may include `type`, `vm`, `service`, `state`, or `enabled`
+- Ownership evidence may include `type`, `vm`, `path`, `user`, and `group`
+- Scalar evidence may include `threshold_percent`, `port`, or `command` depending on the clue
+
+Existing clue fingerprints also use clue-only labels such as `service_state_is`,
+`service_enabled_is`, and `expected_user`. Treat those as descriptive baseline
+metadata, not runtime validation names.
+
+## Objectives
+
+| Field | Type | Description |
+| --- | --- | --- |
+| `id` | string | Stable objective ID. |
+| `description` | string | Player-facing objective text. |
+| `check_mode` | string | `passive` or `explicit`. Use `passive` by default. |
+| `validation` | object | Rule object evaluated by `ValidationService`. |
+
+Objectives are for feedback and progress tracking. They do not choose the
+winning solution branch.
+
+## Solution Branches
+
+| Field | Type | Description |
+| --- | --- | --- |
+| `id` | string | Stable branch ID. |
+| `label` | string | Optional short label used in content review and debugging. |
+| `priority` | int | Higher wins when multiple branches validate. Priorities must be unique per quest. |
+| `validation` | object | Rule object evaluated for this branch. |
+| `trust_delta` | float | Trust change applied when this branch wins. Positive for better fixes, negative for risky or damaging ones. |
+| `follow_up_dialogue` | string | Dialogue ID to trigger after resolution. |
+| `follow_up_incident` | string | Incident ID to trigger after resolution, if the branch intentionally leaves a latent problem. |
+| `follow_up_ticket` | string | Next ticket ID in the quest chain. |
+| `world_flags` | string[] | Flags to set when the branch wins. |
+| `_note` | string | Optional author-only comment. |
+
+### Branch Authoring Guide
+
+- Use branch priority to rank the quality of valid solutions.
+- Put the clean, robust fix at the highest priority.
+- Use lower priorities for brittle workarounds, partial fixes, or outcomes that
+  leave future risk behind.
+- Use `trust_delta` to reflect the quality of the fix, not just whether the
+  quest technically completed.
+- Use `follow_up_ticket` when a winning branch should advance the story to the
+  next ticket.
+- Use `follow_up_incident` only when that branch intentionally seeds a later
+  recurrence or operational cost.
+- Keep priorities unique. If two branches can both pass with the same priority,
+  the content should be rewritten.
+
+## Validation Rule Types
+
+Design notes sometimes use shorthand names like `file_mode_matches` or
+`command_exits_zero`. In authored JSON, use the runtime rule names below.
+
+- `file_mode_matches` -> `file_mode`
+- `file_owner_matches` -> `file_owner`
+- `service_state_matches` -> `service_state`
+- `service_is_enabled` -> `service_enabled`
+- `process_is_running` -> `process_running`
+- `port_is_listening` -> `port_listening`
+- `package_is_installed` -> `package_installed`
+- `command_exits_zero` -> `command_assert`
+
+| JSON type | Fields | Notes |
+| --- | --- | --- |
+| `file_exists` | `vm`, `path` | Passes when the file exists. |
+| `file_absent` | `vm`, `path` | Inverse of `file_exists`. |
+| `directory_exists` | `vm`, `path` | Passes when the directory exists. |
+| `file_contains` | `vm`, `path`, `contains` | Passes when the file contains the given text. |
+| `log_contains` | `vm`, `path`, `contains` | Alias for `file_contains` used by some clue fingerprints. |
+| `file_mode` | `vm`, `path`, `mode` | Checks the exact file mode string, such as `0600`. |
+| `file_owner` | `vm`, `path`, `user`, `group` | Checks exact ownership. |
+| `file_owner_is_not` | `vm`, `path`, `user`, `group` | Negated ownership check. |
+| `service_state` | `vm`, `service`, `state` | Checks the active state, such as `active`, `inactive`, or `failed`. |
+| `service_enabled` | `vm`, `service`, `enabled` | Checks boot-time enablement. The `enabled` field defaults to `true`. |
+| `process_running` | `vm`, `process` | Passes when the named process is running. |
+| `process_user` | `vm`, `process`, `user` | Passes when the named process runs as the given user. |
+| `port_listening` | `vm`, `port`, `listening` | Checks whether a port is listening. The `listening` field defaults to `true`. |
+| `package_installed` | `vm`, `package` | Passes when the package is installed. |
+| `mount_present` | `vm`, `path` | Passes when the mount is present. |
+| `disk_usage_below` | `vm`, `path`, `threshold_percent` | Passes when disk usage is below the threshold. `percent` is accepted in older content. |
+| `disk_usage_above` | `vm`, `path`, `threshold_percent` | Passes when disk usage is above the threshold. `percent` is accepted in older content. |
+| `command_assert` | `vm`, `command` | Fallback rule for command-based checks. Use sparingly. |
+| `and` | `rules` | All sub-rules must pass. |
+| `or` | `rules` | Any sub-rule may pass. |
+| `not` | `rule` | Inverts the inner rule. |
+
+### Validation Notes
+
+- Prefer state-based checks over command checks.
+- Use `and` and `or` to model genuinely alternative states, not to hide weak
+  authoring.
+- `command_assert` is a fallback. If a real state rule exists, use that first.
+- Some older quest files include extra fields such as `protocol` or
+  `installed`. The loader ignores unknown keys, but new quests should stick to
+  the documented fields above.
+
+## Prep Script Requirements
+
+Each quest needs a prep script at `tools/vm/quest-prep/QXXX-prep.sh`.
+
+- The script must be idempotent.
+- It must set up the starting VM state for the quest.
+- It runs at image build time, not when the player starts the quest.
+- It should install required packages only from local or pre-baked sources.
+- It may create logs, users, groups, permissions, or broken config files that
+  form the scenario.
+- It must not rely on a live player session.
+
+When a quest continues an existing chain, the prep script should restore the
+prior clean snapshot first, then apply the new scenario changes, and finally
+take the next baseline snapshot.
+
+## VM Provisioning Pipeline
+
+A new quest requires a VM baseline before it can be played. The full authoring
+workflow from scratch to playable quest:
+
+### 1. Write the prep script
+
+Create `tools/vm/quest-prep/QXXX-prep.sh`. Requirements:
+
+- Must be idempotent — safe to run twice on the same domain.
+- Accepts the domain name as $1 and an optional `--dry-run` flag as $2.
+- Must not prompt for input or depend on internet access.
+- Reads `tools/vm/lib/common.sh` for shared helpers (`run`, `step`, `ok`, etc.).
+
+Typical operations: break a config file, chown a directory, remove a logrotate
+config, add a cron entry, delete a key. Nothing that would be undone by the
+player before the quest starts.
+
+### 2. Register the quest in seed-vms.sh
+
+Open `tools/setup/seed-vms.sh` and:
+
+1. Add a `require_file` check near the top (`STEP 1 — Pre-flight checks`):
+   ```bash
+   require_file "$QUEST_PREP/QXXX-prep.sh" "QXXX prep script"
+   ```
+
+2. Add a `run_prep_and_snapshot` call in `STEP 4 — Run quest-prep scripts`:
+   ```bash
+   run_prep_and_snapshot "QXXX" "sc-<vm-domain>" "baseline.<snapshot-name>"
+   ```
+   The snapshot name must match the quest's `baseline_snapshot` field.
+
+### 3. Baseline snapshot chain
+
+Each VM has its own chain. Only the CLEAN branch resolution of a quest is used
+as the baseline for the next quest. Brittle-branch resolutions are never
+snapshotted.
+
+| VM | Snapshot chain |
+|----|----------------|
+| `sc-workstation` | `baseline.day-one` (Q001 only) |
+| `sc-web-server` | `baseline.clean` → `baseline.post-q002` → `baseline.post-q003` → `baseline.post-q004` |
+| `sc-build-machine` | `baseline.clean` → `baseline.post-q006` |
+
+A prep script that builds on a prior quest must revert to the prior snapshot
+before applying its changes.
+
+### 4. VM baseline package set
+
+Each authored VM has a guaranteed minimum set of packages that players can rely on
+during gameplay. New quests must not assume packages outside this set unless the
+quest prep script installs them.
+
+| VM | OS | Guaranteed packages |
+|----|----|---------------------|
+| `sc-workstation` (ares) | Ubuntu 24.04 | `qemu-guest-agent`, `openssh-server`, `sudo`, `bash-completion`, `hostname`, `ssh` client (system) |
+| `sc-web-server` (hermes) | Debian 12 | `qemu-guest-agent`, `openssh-server`, `sudo`, `nginx`, `logrotate`, `rsync`, `curl`, `hostname`, `ssh` client |
+| `sc-build-machine` (vulcan) | Arch Linux | `qemu-guest-agent`, `openssh`, `sudo`, `base-devel`, `archlinux-keyring`, `inetutils` (provides `hostname`, `ping`), `ssh` client |
+
+`hostname`, `whoami`, `id`, `ls`, `cat`, `echo`, `ps`, `df`, `du`, `free`,
+`systemctl`, `journalctl` are available on all VMs.
+
+The in-game terminal auto-adds `-C` to bare `ls` calls so column output renders
+correctly. If a quest step requires `ls -l` or another explicit format, pass it
+explicitly — the auto-`-C` injection only fires when no layout flag is present.
+
+### 5. Run the pipeline
+
+```bash
+# Dry run first — shows what would execute without touching VMs
+bash tools/setup/seed-vms.sh --dry-run
+
+# Full build — requires libvirt and all three sc-* domains to exist
+bash tools/setup/seed-vms.sh
+
+# Prep + snapshot only (skip the image build step)
+bash tools/setup/seed-vms.sh --skip-build
+
+# Single VM only
+bash tools/setup/seed-vms.sh --vm web_server
+```
+
+### 5. Validate
+
+After seed-vms.sh completes:
+
+```bash
+# Check content integrity (including baseline_snapshot field)
+node tools/content/validate-content.js
+
+# Verify snapshots exist on each domain
+virsh snapshot-list sc-web-server
+virsh snapshot-list sc-build-machine
+```
+
+## Multi-Solution Quest Example
+
+```json
+{
+  "id": "Q099",
+  "title": "Cron Runs as Root",
+  "tier": 2,
+  "primary_vm": "web_server",
+  "required_vms": ["web_server"],
+  "ticket_id": "T099",
+  "baseline_snapshot": "baseline.clean",
+  "_note": "Minimal example: the nightly cron job should run as www-data, not root.",
+  "summary": "A site-sync cron entry was copied from a root shell. It still runs, but it now leaves root-owned cache files behind.",
+  "clue_fingerprint": {
+    "description": "The cron file exists, but it names root as the executor. The cache directory is already polluted with root-owned files.",
+    "evidence": [
+      { "type": "file_contains", "vm": "web_server", "path": "/etc/cron.d/site-sync", "contains": "root /opt/site-sync/bin/sync-cache.sh" },
+      { "type": "file_owner_is_not", "vm": "web_server", "path": "/var/www/axiomworks/cache", "user": "www-data" }
+    ]
+  },
+  "objectives": [
+    {
+      "id": "sync-safe",
+      "description": "The cron job runs as www-data and the scheduler is active",
+      "check_mode": "passive",
+      "validation": {
+        "type": "and",
+        "rules": [
+          { "type": "file_contains", "vm": "web_server", "path": "/etc/cron.d/site-sync", "contains": "www-data /opt/site-sync/bin/sync-cache.sh" },
+          {
+            "type": "or",
+            "rules": [
+              { "type": "command_assert", "vm": "web_server", "command": "systemctl is-active --quiet cron" },
+              { "type": "command_assert", "vm": "web_server", "command": "pgrep -x cron >/dev/null" }
+            ]
+          }
+        ]
+      }
+    }
+  ],
+  "solution_branches": [
+    {
+      "id": "correct-cron",
+      "label": "Correct Cron User",
+      "priority": 100,
+      "validation": {
+        "type": "and",
+        "rules": [
+          { "type": "file_contains", "vm": "web_server", "path": "/etc/cron.d/site-sync", "contains": "www-data /opt/site-sync/bin/sync-cache.sh" },
+          {
+            "type": "or",
+            "rules": [
+              { "type": "command_assert", "vm": "web_server", "command": "systemctl is-active --quiet cron" },
+              { "type": "command_assert", "vm": "web_server", "command": "pgrep -x cron >/dev/null" }
+            ]
+          }
+        ]
+      },
+      "trust_delta": 2,
+      "world_flags": ["site_sync_healthy"],
+      "follow_up_dialogue": "marcus-Q099-complete-clean",
+      "follow_up_ticket": "T100",
+      "_note": "Preferred fix: keep the job and run it with the correct user."
+    },
+    {
+      "id": "disabled-cron",
+      "label": "Brittle Disable",
+      "priority": 40,
+      "validation": {
+        "type": "command_assert",
+        "vm": "web_server",
+        "command": "test ! -f /etc/cron.d/site-sync"
+      },
+      "trust_delta": -1,
+      "world_flags": ["site_sync_brittle"],
+      "follow_up_dialogue": "marcus-Q099-complete-brittle",
+      "_note": "The job was deleted instead of repaired. It stops the symptom, but it is not a durable fix."
+    }
+  ],
+  "pressure_profile": null,
+  "blast_radius": [],
+  "unlock_requirements": ["world_flag:player_ssh_configured"],
+  "tags": ["cron", "permissions", "web_server"],
+  "internal_notes": "Example only."
+}
+```
+
+## Multi-VM Quest Example
+
+```json
+{
+  "id": "Q098",
+  "title": "Build Sync Writes Bad Ownership",
+  "tier": 2,
+  "primary_vm": "build_machine",
+  "required_vms": ["workstation", "build_machine", "web_server"],
+  "ticket_id": "T098",
+  "baseline_snapshot": "baseline.post-q006",
+  "_note": "The build machine is pushing release files to the web server, but the ownership is wrong and the deploy helper is still running.",
+  "summary": "A deployment helper on the build machine is writing release files to the web server with root ownership. The helper must be stopped and the output repaired so the web server can manage the files again.",
+  "clue_fingerprint": {
+    "description": "The deploy helper is still running on build_machine. On web_server, the release artifact is owned by root instead of www-data.",
+    "evidence": [
+      { "type": "file_contains", "vm": "build_machine", "path": "/opt/deploy/bin/push-release.sh", "contains": "rsync -a --chown=root:root" },
+      { "type": "process_running", "vm": "build_machine", "process": "deploy-sync" },
+      { "type": "file_owner_is_not", "vm": "web_server", "path": "/var/www/axiomworks/releases/current/index.html", "user": "www-data", "group": "www-data" }
+    ]
+  },
+  "objectives": [
+    {
+      "id": "release-owned-correctly",
+      "description": "The web release file is owned by www-data and the deploy helper is stopped",
+      "check_mode": "passive",
+      "validation": {
+        "type": "and",
+        "rules": [
+          { "type": "file_owner", "vm": "web_server", "path": "/var/www/axiomworks/releases/current/index.html", "user": "www-data", "group": "www-data" },
+          { "type": "not", "rule": { "type": "process_running", "vm": "build_machine", "process": "deploy-sync" } }
+        ]
+      }
+    }
+  ],
+  "solution_branches": [
+    {
+      "id": "deploy-stopped-owner-fixed",
+      "label": "Stop Helper and Fix Ownership",
+      "priority": 100,
+      "validation": {
+        "type": "and",
+        "rules": [
+          { "type": "file_owner", "vm": "web_server", "path": "/var/www/axiomworks/releases/current/index.html", "user": "www-data", "group": "www-data" },
+          { "type": "not", "rule": { "type": "process_running", "vm": "build_machine", "process": "deploy-sync" } }
+        ]
+      },
+      "trust_delta": 2,
+      "world_flags": ["release_permissions_fixed"],
+      "follow_up_dialogue": "marcus-Q098-complete-clean",
+      "_note": "This branch validates both VMs: the release file is fixed on web_server and the helper is no longer running on build_machine."
+    }
+  ],
+  "pressure_profile": null,
+  "blast_radius": [],
+  "unlock_requirements": ["world_flag:player_ssh_configured"],
+  "tags": ["deploy", "permissions", "multi-vm", "build_machine", "web_server"],
+  "internal_notes": "Example only."
+}
+```
+
+## Quest Chain Authoring
+
+Use `follow_up_ticket` to chain the campaign in sequence. The winning branch
+emits the next ticket, and `QuestDirector` activates the next quest from that
+ticket.
+
+| Quest | Clean branch `follow_up_ticket` |
+| --- | --- |
+| `Q001` | `T002` |
+| `Q002` | `T003` |
+| `Q003` | `T004` |
+| `Q004` | `T005` |
+
+Keep the chain on the clean, high-priority branch. If a brittle branch should
+continue the story differently, use its own `follow_up_ticket` or
+`follow_up_incident` intentionally.
@@ -0,0 +1,161 @@
+# Sysadmin Chronicles — Spec Lock
+
+This file preserves the user's intended new system design. Treat it as binding.
+
+## 1. Narrative spine
+
+The story progression is:
+
+```text
+Normal Work → Unease → Suspicion → Investigation → Conflict → Resolution
+```
+
+Every quest must map to one of these phases.
+
+## 2. Required quest structure
+
+Every proposed quest must include:
+
+- Title
+- Narrative Phase
+- Objective
+- Linux Concepts
+- Systems Used
+- Hidden Hook (optional)
+- Failure Conditions
+- Behavior Impact
+
+For implementation, these may be expanded into JSON fields, but these concepts must remain present.
+
+## 3. Core systems
+
+### 3.1 Player behavior tracking
+
+Track:
+
+- `curiosity` — exploration, anomaly investigation, reading beyond ticket scope
+- `obedience` — completing assigned work, following stated priorities, ignoring suspicious extras
+- `risk` — reckless changes, broad permissions, deleting evidence, unsafe shortcuts
+
+These influence:
+
+- Access levels
+- Narrative progression
+- Endings
+
+### 3.2 Trust and suspicion compatibility
+
+The existing system already uses `trust_delta`, world flags, and branch quality. Preserve that.
+
+Map old and new systems like this:
+
+- `trust` = professional standing produced mostly by solution quality and branch outcomes
+- `suspicion` = management/security attention caused by investigative, risky, or unusual behavior
+- `curiosity`, `obedience`, `risk` = the new behavior profile controlling narrative route
+
+Do not replace trust. Extend it.
+
+### 3.3 Access system
+
+Player permissions evolve:
+
+```text
+basic_user → sudo → root
+```
+
+Access is affected by:
+
+- Trust from competent task completion
+- Suspicion from investigation behavior
+- Risk from careless or destructive changes
+- Narrative phase
+
+### 3.4 Boss system / management pressure
+
+The boss system acts as a dynamic constraint, not a cutscene machine.
+
+Phase scaling:
+
+- Phase 1: Annoying
+- Phase 2: Dismissive
+- Phase 3: Suspicious
+- Phase 4: Monitoring
+- Phase 5: Interfering
+- Phase 6: Outcome-dependent
+
+Functions:
+
+- Interrupt tasks
+- Reassign priorities
+- Restrict access
+- Add pressure through tickets, emails, delayed approvals, audits, or access review
+
+In the current company context, this can be represented by Marcus, Kowalski, Priya, or policy pressure depending on the situation. Do not turn one character into a cartoon villain.
+
+### 3.5 Hidden narrative system
+
+Hidden hooks are embedded in normal quests.
+
+Examples:
+
+- Unknown services
+- Suspicious cron jobs
+- Hidden users
+- Network anomalies
+- Unexpected SSH keys
+- Odd timestamps
+- Config history that does not match the official story
+
+Rules:
+
+- Never explicitly flagged
+- Optional discovery only
+- Not required to complete the assigned ticket
+- Must be discoverable through real sysadmin behavior
+- Should accumulate into a coherent hidden story over time
+
+## 4. Quest generation constraints
+
+- Reuse existing game systems
+- Do not introduce unnecessary mechanics
+- Scale difficulty with player progression
+- Preserve the observed-VM-state design from existing quest authoring
+- Prefer real Linux behavior over puzzle logic
+
+## 5. Difficulty scaling
+
+- Phase 1: Explicit instructions
+- Phase 2: Partial hints
+- Phase 3: Minimal guidance
+- Phase 4+: Problem-solving only
+
+This applies to ticket wording, hints, clue obviousness, and branch tolerance.
+
+## 6. Endings
+
+Endings are determined by behavior over the playthrough:
+
+- `corporate_loop` — obedient path / bad ending
+- `burnout` — passive path / neutral ending
+- `exposure` — investigative path / good ending
+- `chaos` — destructive/high-risk path
+
+No ending should be selected by a single obvious final button. The route should emerge from world flags, behavior variables, access state, and discovered/acted-on hidden hooks.
+
+## 7. Design principles
+
+- Discovery over exposition
+- Systems over scripts
+- Freedom over forced narrative
+- Realism with subtle distortion
+
+## 8. Non-goals
+
+Do not:
+
+- Build a linear-only story
+- Rely on cutscenes
+- Over-explain mechanics
+- Remove player agency
+- Turn the mystery into explicit quest markers
+- Rewrite established characters to fit a new plot
@@ -0,0 +1,423 @@
+# Story Design Context — Sysadmin Chronicles
+
+For story designers and AI agents creating new quests and narrative content.
+
+**Related docs:**
+- `CHARACTERS.md` — character bios, relationships, story hooks
+- `COMPANY_LORE.md` — world, company, tone
+- `QUEST_AUTHORING.md` — technical JSON spec for implementers
+
+This document answers: *how does story actually work in this game, and what does a quest
+concept need to contain to be usable?*
+
+---
+
+## The Core Premise
+
+The player is a new junior sysadmin at Axiom Works, a mid-size B2B software company.
+They are replacing someone named Dale. Nobody will explain why Dale is gone.
+
+The game is played entirely through a simulated work environment: a terminal, an email
+inbox, and a company website. There are no cutscenes, no narration, no inventory, no
+combat. Everything that happens is expressed through:
+
+- **Tickets** — the player receives a ticket describing a problem
+- **The terminal** — the player SSHes into VMs, investigates, and fixes things
+- **Character dialogue** — characters react to how the player solved the problem
+- **The next ticket** — the world moves on, and the consequences of what the player
+  did are baked into the next situation
+
+That's it. Story is not told — it is accumulated from the choices the player makes
+when fixing real Linux problems on real virtual machines.
+
+---
+
+## The Three Machines (VMs)
+
+Every quest happens on one or more of these machines. Their narrative identities
+matter as much as their technical roles.
+
+### ares — the Workstation
+The player's home machine. Ubuntu 24.04. Quests here are onboarding-flavored —
+establishing access, learning the environment. It's the only machine the player
+can reach on day one.
+
+*Narrative identity:* Where you start. Safe-ish. The first one you break is here.
+
+### hermes — the Web / App Server
+Debian 12. Runs nginx and the AxiomFlow demo/staging application. This is the
+machine that Sarah Chen cares about, that customers can feel, and that Priya Nair
+watches for security posture. Most of the early-game quests are here.
+
+*Narrative identity:* The product's face to the world. Breaking this makes noise
+immediately. The most politically visible machine.
+
+### vulcan — the Build Machine
+Arch Linux. Compiles packages, runs the internal build pipeline, serves packages
+to hermes via an internal apt repo. Nikhil Sharma owns this in principle but nobody
+manages it daily. Things here break silently until hermes starts serving bad software.
+
+*Narrative identity:* The machine nobody watches until something downstream fails.
+Quests here reveal that problems have upstream causes the player didn't expect.
+
+### Planned future machines
+As the story expands, new machines can be added. Each should have a clear narrative
+role before it's introduced. (See `COMPANY_LORE.md` for the candidate list.)
+
+---
+
+## How Story Is Delivered
+
+### Tickets as Act One
+Every quest begins with a ticket in the player's inbox. The ticket is a short email
+from a character describing a symptom — not a cause. The sender's perception of the
+problem is usually incomplete and sometimes wrong. This is intentional: the player's
+job is to investigate, not to execute instructions.
+
+Good ticket writing:
+- Describes what the sender experienced, not what the cause is
+- Has the sender's voice and perspective (Sarah is outcome-focused; Dave is confused;
+  Priya is terse and specific)
+- Does not hint at the solution
+- Creates genuine stakes (site is down, builds are failing, someone is locked out)
+
+Bad ticket writing:
+- Explains the root cause ("the log file is too big")
+- Has no character voice (generic IT help desk language)
+- Stakes are unclear or low
+
+### The Terminal as Act Two
+The player investigates. They SSH in, run commands, read logs, check configs, look at
+file ownership. The evidence is seeded into the VM baseline — it is genuinely there
+to find, not procedurally generated. A good quest has a natural clue trail:
+
+- The most obvious thing points to a second thing
+- The second thing reveals the actual problem
+- The fix is achievable with real Linux knowledge
+
+The player cannot be told what to do. They can ask Marcus for hints (via dialogue
+choices), but good players don't need to.
+
+### Branching Resolution as Act Three
+When the player has made changes to the VM, the game checks the state of the
+system against the quest's solution branches. The branch that matches determines:
+
+- What dialogue fires (Marcus's reaction, Sarah's reaction, Priya's follow-up)
+- What trust delta the player receives
+- What world flag is set (persistent story state)
+- Whether an incident is triggered (a future consequence of a partial fix)
+- What ticket comes next
+
+**This is the central story mechanic.** Every quest should be designed with at
+least two and ideally three resolution branches:
+
+| Branch type | What it means |
+|-------------|---------------|
+| **Clean fix** | Player understood the root cause and solved it properly. High trust, no downstream risk. |
+| **Acceptable fix** | Problem is solved but with a tradeoff — brittle approach, future maintenance burden, or incomplete cleanup. Lower trust. |
+| **Regression** | Player fixed the symptom but made something else worse. Negative trust. Story consequences. |
+
+The **regression branch** is not about punishment — it's about realism. A real
+sysadmin who removes all SSH restrictions to restore one person's access has
+technically solved the ticket while creating a larger problem. The story should
+treat this as realistic professional consequence, not a game-over failure.
+
+Players on a clean-fix path get more trust, unlock more access, and receive warmer
+character reactions. Players on a regression path continue playing but face the
+downstream effects of their choices.
+
+---
+
+## World Flags — Persistent Story State
+
+World flags are string keys set when a quest's branch resolves. They persist for
+the entire playthrough and can be read by later quests, incidents, and dialogue.
+
+Examples:
+- `hermes_logrotate_healthy` — set when the player properly fixed log rotation
+- `hermes_ssh_allowusers_fragile` — set when the player restored SSH access using
+  the brittle AllowUsers approach instead of the robust AllowGroups approach
+- `player_ssh_configured` — set when the player successfully set up SSH on day one
+
+World flags are how story continuity works. A later quest can check whether the
+player fixed something correctly earlier and behave differently. Marcus can reference
+a past fix. Priya can flag a previously introduced risk in a later audit. A problem
+that was "solved" with a quick fix can recur.
+
+**When designing a new quest, ask:** what flag should this set, and what future quests
+or dialogue might reference it?
+
+---
+
+## Trust — The Narrative Currency
+
+Trust is a numeric score that tracks the player's professional standing with Marcus
+and the IT team. It affects:
+
+- **VM access** — the player gains SSH access to hermes and vulcan as trust increases.
+  If trust drops badly, access can be revoked.
+- **Documentation access** — more trusted players get access to internal runbooks
+  and admin guides
+- **Character warmth** — Marcus's messages change tone subtly as trust grows
+- **Incident visibility** — at a certain trust level, the player starts seeing
+  background incidents before they become critical
+
+Trust is not displayed as a raw number. Players experience it as consequences.
+
+**For quest designers:** each branch should have a `trust_delta` that reflects the
+quality of the fix. A proper root-cause fix should earn more than a workaround.
+Regression branches should cost trust. Day-one onboarding quests are lenient;
+later quests at higher tiers should be less forgiving.
+
+---
+
+## Incidents — Consequences of Incomplete Fixes
+
+An incident is a time-delayed consequence that fires when a quest's partial-fix
+branch was taken. It represents the problem coming back.
+
+Example: The player clears a full disk by deleting a log file but doesn't restore
+the logrotate config. Two in-game hours later, the disk starts filling again. Dave
+notices. The player gets another ticket about the same symptom.
+
+Incidents are not punishments — they are realistic. The world doesn't stay fixed
+just because the player touched it. A player who takes clean-fix branches will
+rarely see incidents. A player who takes every shortcut will find their ticket queue
+filling up with problems they already "solved."
+
+For story purposes: incidents can also carry narrative weight. If the player made a
+security regression, an incident could represent an audit finding, an unusual login,
+or a configuration discrepancy Priya noticed.
+
+---
+
+## The Character Conversation Model
+
+Quest dialogue fires after a branch resolves. Three characters can speak:
+
+### Marcus Webb
+The primary voice. Appears in every quest. His post-resolution message reflects:
+- What the player actually did (not just whether they succeeded)
+- Whether they understood the root cause or just cleared the symptom
+- A forward-looking observation (usually a quiet flag for what's coming next)
+
+Marcus does not praise effusively or scold dramatically. He states what he observed.
+His message for a clean fix is warmer and sometimes wry. His message for a regression
+is brief and pointed. He never says "well done!" He might say "that's the right call."
+
+### Sarah Chen
+Speaks when the quest affects something product-facing (hermes being up or down,
+deploys working or failing). Her messages are reactive — she responds to outcomes,
+not process. She is not hostile unless the player makes her situation worse.
+
+### Priya Nair
+Speaks when the quest has security implications — access changes, hardening,
+audit posture. She does end-of-shift reviews that grade overall performance.
+Her per-quest messages are brief and evaluative. She notices things Marcus might not.
+
+### Other characters
+Dave Okonkwo files tickets. He does not have post-resolution dialogue — he
+just stops or starts noticing things. Future characters (Kowalski, Nikhil, Tanya)
+can speak in dialogue if quests are designed to involve them.
+
+---
+
+## The Narrative Arc
+
+The overall story has six phases. Quests should be designed with their phase in mind.
+The phase is usually not visible to the player — it emerges from what's happening
+around them.
+
+### Phase 1 — Normal Work
+*Tier 1 quests. Early game.*
+
+The player is new. Everything is routine. Marcus is helpful. The problems are real
+but not alarming — a broken config, a full disk, a permission issue. The player is
+learning the environment. The subtext is that things are slightly more wrong than
+they should be, but there's nothing to point at.
+
+Hidden layer: small anomalies in the systems that curious players can notice but
+don't have context for yet.
+
+### Phase 2 — Unease
+*Tier 1/2 transition.*
+
+The problems start to have patterns. The same kind of thing breaks twice. A fix
+the player made doesn't hold the way it should. Nothing is alarming, but Marcus's
+messages have a slightly different quality — he notices things he doesn't explain.
+
+Hidden layer: a world flag from an early quest points somewhere unexpected.
+
+### Phase 3 — Suspicion
+*Tier 2 quests. Mid game.*
+
+The player starts encountering problems they didn't cause and can't fully explain.
+Access was changed by someone. A config was edited recently. A log shows an
+unusual pattern. Nobody is accusing anyone. But the player now has enough context
+to start asking questions — even if no quest explicitly tells them to.
+
+This is where Dale becomes relevant again. The systems the player inherits were
+last touched by Dale. Some of them have been in a particular state for a long time.
+
+### Phase 4 — Investigation
+*Tier 2/3 transition.*
+
+The player has connected enough dots to understand that something happened before
+they arrived. The quests in this phase involve digging into logs, access records,
+and configuration history. The investigation is framed as professional work
+(audit the access logs, trace the package build history) — but the results tell
+a story.
+
+Marcus's messages are shorter. Priya starts appearing more. Kowalski schedules a
+meeting nobody explains.
+
+### Phase 5 — Conflict
+*Tier 3 quests. Late game.*
+
+The player knows what happened. Acting on that knowledge has professional
+consequences. The conflict is not physical — it is about what the player chooses
+to surface, who they tell, and what they do with access they were given for one
+purpose that could be used for another.
+
+### Phase 6 — Resolution
+*Endgame.*
+
+The situation resolves. The ending the player gets depends on the world flags
+accumulated across their entire playthrough — not just whether they clicked the
+"good ending" button. A player who took clean-fix branches throughout, built
+trust, and noticed the hidden anomalies gets a different ending than a player
+who patched symptoms, lost trust, and missed everything.
+
+---
+
+## What Makes a Good Quest Scenario
+
+The best quests have a **plausible mundane cause** and a **visible technical trail**.
+Players should never need to guess — they should be able to find the answer by
+looking at the right files and running the right commands.
+
+### Good scenario types
+- Service down → config syntax error → player traces error output to the line
+- Disk full → log file enormous → logrotate config missing → player restores it
+- Deploy fails → files owned by wrong user → someone ran a script as root manually
+- Build failures → clock drift → NTP not running → player enables time sync
+- Access locked out → sshd_config modified → wrong directive → player corrects it
+- App crashes after update → bad package from internal repo → player traces to source
+
+### What makes these work
+1. **The symptom is real and urgent.** Something is actually broken.
+2. **The cause is discoverable.** The evidence is in logs, config files, or system state.
+3. **The fix is a real Linux operation.** Not artificial — `chown`, `systemctl`, editing
+   a config, fixing a cron entry, rolling back a package.
+4. **Multiple approaches exist.** The quick fix works. The proper fix is better and
+   the game knows the difference.
+5. **The character reactions are grounded.** Sarah cares about the demo being up.
+   Priya cares about the access control implications. Marcus cares about whether the
+   player understood what they were doing.
+
+### Bad scenario types to avoid
+- Problems that require packages not in the VM's guaranteed baseline (see `QUEST_AUTHORING.md`)
+- Problems that require real-time events the validation engine can't check
+- Problems where the "correct" fix is the only fix (no meaningful branch differentiation)
+- Problems that break the fourth wall or require the player to know game-layer information
+- Problems that are gotchas rather than investigations (the cause can't be found by looking)
+
+---
+
+## Hidden Anomalies — Environmental Storytelling
+
+Every 3–5 quests should include something unusual in the VM environment that the player
+is not told about and not required to engage with. These are not quest objectives.
+They are breadcrumbs for curious players.
+
+Examples of the kind of thing these should be:
+- A user account that shouldn't exist
+- A log entry from an odd time that doesn't match the official history
+- A file that was modified recently but wasn't part of the quest setup
+- A cron job that's been disabled but was once important
+- An SSH key in authorized_keys that doesn't belong to anyone obvious
+
+These anomalies should be consistent with the overall narrative arc — a player who
+collects them across the whole game should be able to piece together what happened
+before they arrived. They should never be labelled, never referenced in objectives,
+and never required. They are for the players who look.
+
+---
+
+## Quest Output Format for Story Agents
+
+When proposing new quests, provide the following. This is the minimum needed for
+a technical author to implement the quest.
+
+```
+Quest ID: QXXX
+Title: [player-facing]
+Narrative phase: [1–6]
+Tier: [1, 2, or 3]
+
+Primary VM: [ares / hermes / vulcan]
+Additional VMs: [if any]
+
+Scenario summary:
+  What is broken, why it is broken (the root cause), and what the player
+  will encounter. 1–3 sentences. Written for the implementer, not the player.
+
+Ticket:
+  From: [character name]
+  Subject: [email subject line]
+  Body: [the email the player receives. Written in the sender's voice.
+         Describes the symptom. Does not explain the cause.]
+
+Clue trail:
+  What the player will find when they investigate. The evidence that leads
+  them to the root cause. Describe the actual files, log entries, and system
+  states — not the player's steps.
+
+Solution branches:
+  Branch 1 (clean fix, highest trust):
+    What the player has done. Why it's correct. Trust delta.
+  Branch 2 (acceptable fix):
+    What the player has done. What tradeoff it introduces. Trust delta.
+  Branch 3 (regression, if applicable):
+    What the player did wrong. What it breaks. Negative trust delta.
+
+Character reactions:
+  Marcus (post-resolution):
+    Clean: [what Marcus says]
+    Acceptable: [what Marcus says]
+    Regression: [what Marcus says]
+  Sarah / Priya (if relevant):
+    [reaction to the specific outcome that affects them]
+
+World flags set: [list flags each branch sets]
+Follow-up incident (if any): [what recurs if the acceptable-fix branch was taken]
+Hidden anomaly (if any): [something unusual seeded into the VM that's not part of
+  the quest objectives]
+Narrative notes: [anything a future quest author should know — Dale connections,
+  story threads this opens or closes, things characters should remember]
+```
+
+---
+
+## The Dale Thread — Notes for Story Designers
+
+Dale's story should emerge slowly from the systems themselves, not from exposition.
+When designing quests — especially mid-to-late game — consider:
+
+- **What did Dale last touch?** The VMs the player inherits have a history. Some
+  configurations were made by Dale. Some are good. Some are wrong in ways that
+  suggest Dale was dealing with something.
+
+- **What was Dale trying to do?** As the investigation phase develops, the picture
+  should become coherent. Dale wasn't random — there was a pattern to their actions.
+
+- **Who knew?** Marcus knew Dale. Priya may have been involved in whatever ended
+  Dale's tenure. Kowalski definitely knows. The player assembles this from fragments,
+  not a scene where someone explains it.
+
+- **The player is inheriting Dale's problems.** Some of the broken things the player
+  fixes are broken because Dale broke them. Some of the broken things were broken on
+  purpose. The player won't know which is which until later.
+
+The reveal of what Dale did should feel like the player figured it out, not like the
+game told them.
@@ -0,0 +1,133 @@
+# Sysadmin Chronicles — New System Canon Packet
+
+This packet combines the new quest-system spec with the established story/implementation context.
+
+## Core sentence
+
+The player is not “on a main quest.” The player is doing sysadmin work. The story leaks through systems.
+
+## Hard canon
+
+- Company: Axiom Works
+- Products: AxiomFlow, AxiomDash, AxiomSync
+- Tone: plausible B2B software company; dry corporate dysfunction; no cartoon villains
+- Infrastructure naming: Greek-god hostnames
+- Current machines:
+  - `ares` — player workstation, Ubuntu 24.04
+  - `hermes` — web/app/demo server, Debian 12, nginx
+  - `vulcan` — build machine, Arch Linux, internal build/release pipeline
+- Player: competent new junior sysadmin, replacing Dale, no spoken lines
+- Dale: previous sysadmin; central unresolved mystery; reveal through systems, not exposition
+
+## Character preservation rule
+
+Character portraits already match the current bios and are on the in-game company website.
+
+Allowed:
+
+- Compress bios for prompt use
+- Clarify contradictions
+- Add operational story use
+- Preserve and sharpen existing voice
+
+Not allowed:
+
+- Changing names already shown on the company site
+- Changing role, personality, authority level, implied visual vibe, or age band
+- Making characters cartoon villains
+- Creating changes that would require new portraits
+
+## Active character use
+
+### Marcus Webb
+
+Senior Systems Administrator. Primary technical contact and ticket voice. Dry, terse, precise. Trusts competence over credentials. Gives more rope as the player proves competence. Knows what Dale did but avoids discussing it directly. Respects root-cause fixes and dislikes symptom-patching.
+
+Use for: quest assignments, technical follow-up, access/trust gates, quiet hints, sometimes late-night observations.
+
+### Sarah Chen
+
+Product Manager, AxiomFlow. Outcome-focused, direct, concerned with demos/staging/product-visible failures. Often right about symptoms and wrong about root cause. Notices proper underlying fixes.
+
+Use for: product-facing tickets, hermes/demo pressure, stakeholder reactions.
+
+### Priya Nair
+
+Head of Security & Compliance. Canonical email: `p.nair@axiomworks.internal`. Replace old references to Priya Kapoor or Priya Singh. Calm, precise, consequence-focused. Assumes breach/misconfiguration professionally. No alarmism. No exclamation marks.
+
+Use for: access audits, security consequences, end-of-shift review, risky-fix evaluation.
+
+### Dave Okonkwo
+
+Non-technical employee and ticket source. Reports symptoms accurately, misdiagnoses causes plausibly, helpful rather than stupid.
+
+Use for: ordinary employee impact reports.
+
+### Dave Kowalski
+
+Director of IT Operations. Marcus's manager and player's skip-level. Policy pressure, bullet-point status emails, meetings as implied threat, “we should document that” energy.
+
+Use for: boss/management pressure, access restriction, escalation, status demands.
+
+### Derek Ashford
+
+Financial Controller. Appears on CC lines around costs/procurement. Always replies-all. Treat “Dave from Finance” as likely continuity error unless the user decides otherwise.
+
+Use for: budget/procurement pressure.
+
+## Background character use
+
+Use sparingly for flavor and pressure, not because every named character needs screen time.
+
+- Nikhil Sharma — build/release pipeline and vulcan
+- Tanya Okafor — customer pressure
+- Phil Ruiz — sales/demo pressure
+- Yusuf Halabi — engineering escalation
+- Rachel Huang — sysadmin peer/provisioning
+- Tom Malaney — DNS/routing/networking
+- James Osei — audit details
+- Ellen Marsh / David Park / Karen Volkov / Rachel Brandt — distant executive pressure
+
+## Quest/story delivery model
+
+Every quest is delivered through existing game systems:
+
+1. Ticket/email describes a symptom.
+2. Player investigates real VM state.
+3. Player applies real Linux/admin fixes.
+4. Validator resolves the matching solution branch.
+5. Dialogue reacts to the actual branch.
+6. World flags, trust, incidents, behavior variables, and access state persist.
+7. Later quests read those consequences.
+
+## Existing implementation concepts to preserve
+
+- JSON quests under `content/quests/`
+- Tickets under `content/tickets/`
+- VM prep scripts under `tools/vm/quest-prep/QXXX-prep.sh`
+- Observed-state validation
+- Clue fingerprints
+- Solution branches
+- `trust_delta`
+- `world_flags`
+- `follow_up_ticket`
+- `follow_up_incident`
+- Incidents as delayed consequences
+- Baseline snapshots
+
+## New system additions
+
+Add or strengthen:
+
+- Narrative phases
+- Behavior variables: curiosity, obedience, risk
+- Suspicion as management/security attention
+- Access levels: basic_user, sudo, root
+- Boss/management pressure phase scaling
+- Hidden hook discovery state
+- Behavior-driven endings
+- Debug tools for narrative state
+
+## Design warning
+
+Do not use the new system as an excuse to throw away the current strengths. The existing branch/world-flag/trust model is good. It needs to become the backbone of the new narrative system, not get replaced by a generic quest tracker wearing a fake mustache.
@@ -0,0 +1,633 @@
+# Sysadmin Chronicles — Redesign Audit
+
+## A. Executive summary
+
+### Is this design usable?
+
+**Yes, but not implementation-ready.**  
+The redesign mostly preserves the intended shape: sysadmin work first, story leaking through systems, behavior-driven outcomes, no melodramatic lore dump. It is a strong revision compared to the earlier failure mode it describes.
+
+But it still has several hard problems that would bite implementation.
+
+### Does it preserve the user's spec?
+
+**Mostly.**  
+It preserves the narrative spine, quest format, behavior variables, trust/world-flag compatibility, hidden-hook philosophy, and character tone. It does **not** fully preserve:
+
+- the `basic_user → sudo → root` access model
+- Phase 4+ difficulty scaling
+- “chaos” as behavior-driven rather than one obvious trap
+- quest authoring constraints around unique branch priorities and required VM declarations
+- clean separation between hidden-hook discovery and clean-branch validation in a few quests
+
+### Biggest risks
+
+1. **Root access exists in the overview but not in the access progression.** The spec requires `basic_user → sudo → root`; the redesign only actually defines `basic_user`, `sudo`, SSH-to-vulcan, and temporary investigation access. That is not the same thing.
+
+2. **Q039 can hard-route to chaos from one button-like decision.** The redesign says making the proxy change sets `final_config_made` and activates chaos, while its own calibration later says a single reckless action should not route to chaos. That is a logic fork eating its own tail.
+
+3. **Hidden-hook detection is under-specified and technically fragile.** The redesign admits this. Detecting “player read a file” is not naturally compatible with state-based validation unless audit logging, shell wrappers, or deliberate breadcrumb creation are implemented.
+
+4. **Q036 introduces an external host while claiming no additional VM.** Quest authoring requires every VM touched in clues, validation, or prep to be listed. Q036 connects to `10.0.0.47`, but `Additional VMs` is `none` and `Systems Used` only lists `build_machine`.
+
+5. **Q034 has duplicate branch priorities.** The authoring guide explicitly says priorities must be unique; Branch 2 and Branch 3 both use priority 40.
+
+---
+
+## B. Spec-preservation table
+
+| Spec item | Status | Notes |
+|---|---:|---|
+| Narrative spine | **Preserved** | Uses all six phases in order: Normal Work, Unease, Suspicion, Investigation, Conflict, Resolution. Matches binding spec. |
+| Every quest maps to one phase | **Preserved** | All Q001–Q048 have a `Narrative Phase`. |
+| Required quest structure | **Mostly preserved** | Quest entries consistently include title, phase, objective, Linux concepts, systems used, hidden hook/no hook, failure conditions, and behavior impact. Some entries have weak/partial behavior impact. |
+| Behavior tracking: curiosity / obedience / risk | **Preserved** | Rules are explicit and mostly useful. |
+| Suspicion | **Preserved** | Defined as management/security attention and connected to access/pressure. |
+| Trust compatibility | **Preserved** | Keeps `trust_delta`, world flags, branches, follow-up tickets/incidents. |
+| Access system | **Partially preserved** | Per-machine access is good. But `root` is not actually modeled beyond being named once. Spec requires `basic_user → sudo → root`. |
+| Boss / management pressure | **Preserved** | Good: pressure is operational, not cutscene-driven. |
+| Hidden narrative system | **Mostly preserved** | Hooks are embedded into sysadmin work. Some are too tightly coupled to “best branch” behavior, making them less optional than intended. |
+| Difficulty scaling | **Partially preserved** | Phase 1–5 mostly work. Phase 6 explicitly returns to Tier 1, but spec says Phase 4+ should be problem-solving only. |
+| Endings | **Partially preserved** | Behavior-driven overall, but `final_config_made` as a standalone chaos trigger is too single-choice and contradicts the stated calibration. |
+| Design principles | **Mostly preserved** | Strong on systems over scripts and discovery over exposition. Weak spot: some late quests become explicit forensic tasks. |
+| Non-goals | **Mostly preserved** | No cutscenes, no obvious “pick ending” button. But Q039 risks becoming the obvious “bad ending button.” |
+| Character preservation | **Preserved** | No major portrait-breaking changes. Priya rename is canon cleanup, not a redesign. Kowalski becoming active pressure is supported by existing character docs. |
+
+---
+
+## C. Critical violations
+
+### 1. Access progression does not actually implement `root`
+
+**Location:** Access Progression Rules, Section 7.
+
+**Problem:**  
+The overview names `basic_user`, `sudo`, and `root`, but the actual progression never defines when root is granted, how it differs from sudo, when it is revoked, or which quests require it. The detailed rules stop at sudo and “investigation-level access.”
+
+**Why this violates the spec:**  
+SPEC_LOCK explicitly requires the permission ladder:
+
+```text
+basic_user → sudo → root
+```
+
+and says access must be affected by trust, suspicion, risk, and narrative phase.
+
+**Corrected version:**
+
+```md
+### Levels
+
+**basic_user:** Day one through early Phase 1. Player's own workstation account;
+limited non-privileged access elsewhere only when a ticket explicitly grants it.
+
+**sudo:** Task-scoped administrative access on a specific machine. Granted by trust
+and operational need. Most admin quests use sudo, not root.
+
+**root:** Rare, temporary break-glass or forensic-level access. Root is not a normal
+promotion. It is granted only for quests where sudo is insufficient, such as filesystem
+recovery, archival preservation, privileged audit capture, or service account repair.
+Root access must be logged, justified, and revoked.
+
+### Root grant rules
+
+Root may be granted when all are true:
+- Trust is positive.
+- Risk is below elevated threshold.
+- Suspicion is below high threshold, or access is explicitly approved by Priya.
+- The current narrative phase is Investigation or Conflict.
+- The quest has `requires_root: true`.
+
+### Root restriction rules
+
+Root is denied or revoked when:
+- Risk crosses elevated threshold.
+- Suspicion crosses high threshold without Priya approval.
+- The player performs destructive changes outside ticket scope.
+- Q031 or Q043 finds undocumented privileged activity.
+
+### Phase gates
+
+Phase 1: basic_user only, with no root.
+Phase 2: workstation/hermes sudo possible, no root.
+Phase 3: sudo on hermes/vulcan; root only for audited recovery tasks.
+Phase 4: temporary root for investigation tasks when required.
+Phase 5: root access becomes tightly controlled and reviewable.
+Phase 6: root revoked unless the ending state explicitly preserves elevated trust.
+```
+
+---
+
+### 2. Q039 turns chaos into a single obvious final trap
+
+**Location:** Q039 Branch 3 and Ending Logic.
+
+**Problem:**  
+Q039 says making the config change sets `final_config_made` and “the chaos ending route activates.” Ending logic also treats `final_config_made` as a standalone chaos condition.
+
+**Why this violates the spec:**  
+SPEC_LOCK says endings emerge from world flags, behavior variables, access state, and hidden hooks — not one obvious final button. The redesign also contradicts itself by saying a single reckless action should not route to chaos.
+
+**Corrected version:**
+
+```md
+Branch 3 — Make the change without review (priority 10): Player adds the proxy pass
+to 10.0.0.47 without checking prior context or escalating. The change works
+technically but creates a serious security/compliance exposure. `trust_delta: -3`.
+Flags: `final_config_made`, `unauthorized_proxy_enabled`.
+Follow-up incident: I039 — Priya opens an urgent access/config review.
+
+Behavior Impact:
+- Make the change: R+5, S+3
+
+Ending note:
+This branch strongly contributes to `chaos` but does not activate it alone unless
+the player already has high risk, maximum suspicion, or prior falsification/omission
+flags.
+```
+
+And update chaos ending logic:
+
+```md
+### Ending: `chaos`
+
+Required conditions, any of:
+- Risk above chaos threshold.
+- Suspicion at maximum.
+- Two or more serious falsification / evidence destruction flags.
+- `final_config_made` AND at least one of:
+  - risk above elevated threshold
+  - `access_review_incomplete`
+  - `kowalski_report_sanitized`
+  - `backup_test_falsified`
+  - `logs_selectively_omitted`
+```
+
+---
+
+### 3. Q036 uses an external host but declares no additional system
+
+**Location:** Q036.
+
+**Problem:**  
+Q036 connects to `10.0.0.47` for forensic inventory, but says `Additional VMs: none` and `Systems Used: build_machine`. That is false.
+
+**Why this violates the spec:**  
+Quest authoring requires all VMs used in clues, validation, or prep to be listed. The canon packet also says the current machines are `ares`, `hermes`, and `vulcan`; if a fourth machine exists, it needs explicit implementation status.
+
+**Corrected version:**
+
+```md
+**Quest ID:** Q036
+**Title:** Authorized Access
+**Narrative Phase:** Conflict
+**Tier:** 3
+**Primary VM:** build_machine
+**Additional VMs:** external_target_10_0_0_47
+**Primary Objective:** Priya, with Kowalski's authorization, has provided read-only
+credentials to connect to 10.0.0.47 for a forensic inventory. Document what is running,
+what data is present, and whether Axiom Works data is identifiable. Do not modify
+anything.
+**Linux Concepts:** SSH with specific key/user, read-only service enumeration,
+`systemctl`, `ps aux`, `ss -tulpn`, `find`, `ls -lah`, checksum capture, read-only
+file inspection
+**Systems Used:** build_machine, external_target_10_0_0_47
+```
+
+Implementation note:
+
+```md
+external_target_10_0_0_47 must be represented as either:
+- a fourth VM fixture,
+- a containerized fake host reachable only from vulcan,
+- or a simulated network target exposed through the validation harness.
+
+Do not leave it as an implied off-screen system.
+```
+
+---
+
+### 4. Q034 duplicate branch priorities violate authoring rules
+
+**Location:** Q034 Branches 2 and 3.
+
+**Problem:**  
+Both Branch 2 and Branch 3 use priority 40.
+
+**Why this violates the spec:**  
+The authoring guide explicitly says branch priorities must be unique; duplicate priorities require rewriting.
+
+**Corrected version:**
+
+```md
+Branch 2 — Hermes first, rotation incomplete but safely staged (priority 70):
+Player restores production, starts the key rotation, but does not complete final
+deployment before 2am. Builds are delayed but the trust chain is not broken.
+`trust_delta: +1`.
+
+Branch 3 — Vulcan first, hermes later (priority 50):
+Completes key rotation, then restores hermes. Rotation is correct; production was
+down longer than necessary. `trust_delta: +0.5`.
+
+Branch 4 — Hermes only, rotation missed (priority 30):
+Restores production, misses the key rotation window entirely. Builds break overnight.
+`trust_delta: 0`. Follow-up incident: I034.
+
+Branch 5 — Neither, escalates without triage (priority 10):
+Escalates both without preserving either service. `trust_delta: -2`.
+```
+
+---
+
+### 5. Phase 6 difficulty scaling conflicts with SPEC_LOCK
+
+**Location:** Phase 6 setup and Q041.
+
+**Problem:**  
+The redesign says Tier 1 returns for most Phase 6 quests and Q041 uses an explicit attached hardening checklist.
+
+**Why this violates the spec:**  
+SPEC_LOCK says Phase 4+ is “Problem-solving only,” applying to ticket wording, hints, clue obviousness, and branch tolerance. Phase 6 is still Phase 4+.
+
+**Corrected version:**
+
+```md
+### PHASE 6 — RESOLUTION (Q041–Q048)
+
+The pressure has lifted, but the player is still expected to operate at late-game
+competence. Tickets are calmer, not easier. No new hidden hooks. No explicit
+walkthroughs. The ending fires from accumulated state after Q048 resolves.
+```
+
+Corrected Q041:
+
+```md
+**Quest ID:** Q041
+**Title:** Hardening Pass
+**Narrative Phase:** Resolution
+**Tier:** 3
+**Primary VM:** web_server
+**Additional VMs:** none
+**Primary Objective:** Post-audit review found that hermes does not meet the current
+security baseline. Identify the gaps, remediate them, and verify the application
+still works.
+**Linux Concepts:** SSH hardening, nginx security headers, firewall rule review,
+service account audit, safe sequencing of access-control changes
+**Systems Used:** web_server
+**Ticket Sender:** Priya Nair
+**Ticket Summary:** "Hermes does not match the current post-audit baseline. Bring it
+into compliance and confirm service health after the changes."
+
+**Clue Trail:**
+- Baseline document exists but does not list exact commands.
+- SSH config allows settings that are no longer acceptable.
+- nginx lacks required security headers.
+- Firewall rules include at least one stale exposure.
+- Service account permissions are broader than needed.
+
+**Solution Branches:**
+Branch 1 — Full hardening, safe sequence (priority 100): Player identifies all gaps,
+verifies key auth before disabling password auth, applies nginx headers, tightens
+firewall rules, scopes service permissions, and confirms service health.
+`trust_delta: +2`. Flags: `hermes_hardened`.
+
+Branch 2 — Full hardening, unsafe sequence (priority 60): Final state is correct,
+but the player temporarily breaks SSH or service access during sequencing.
+`trust_delta: +0.5`.
+
+Branch 3 — Partial hardening (priority 30): Some gaps fixed, others missed.
+`trust_delta: 0`.
+
+**Hidden Hook:** None.
+
+**Failure Conditions:** SSH access lost without recovery path; nginx broken; admin
+panel exposed after remediation.
+
+**Behavior Impact:**
+- Full hardening: O+1
+- Unsafe sequence: R+1
+```
+
+---
+
+## D. Moderate issues
+
+### Repetition
+
+- The INT-0194 thread appears often enough that it risks becoming “the glowing main quest breadcrumb.” The system can keep it, but not every major midgame hook should name the same ticket number.
+- Several quests use the same “audit / document / archive” pattern. Realistic, yes. Varied, no. At some point the player is just doing paperwork with grep. That is accurate corporate simulation, but accuracy alone is not game design.
+
+### Weak Linux concepts
+
+- Q020, Q031, Q040 are documentation-heavy. They have Linux-adjacent evidence gathering, but the technical center is reporting. Keep them, but make sure validation requires real commands/artifacts, not just “player wrote report.”
+- Q037 “trace where customer email got infrastructure details” needs concrete technical evidence: mail headers, CRM export logs, nginx access logs, document access logs, or ticket attachments. Otherwise it becomes story fog.
+
+### Weak hidden hooks
+
+- Q015’s hook is effectively part of the best branch: Branch 1 requires inspecting the binary, and the hook is set by inspecting the binary. That makes the hook less optional. It should be possible to complete the audit perfectly without recognizing the broader INT-0194 meaning.
+- Some “hook discovered” C bonuses duplicate branch C bonuses. Q015 explicitly says Hook C+2 is “already in Branch 1 impact,” which is begging for a double-count bug.
+
+### Pacing problems
+
+- Phase 3 and Phase 4 are both audit/investigation-heavy. The difference is conceptually clear, but the activity palette may blur in play.
+- Phase 6 “normal work again” is good thematically, but making it easier contradicts the locked difficulty model.
+
+### Character conflicts
+
+No major portrait-breaking character changes found.
+
+- **Priya Nair cleanup is correct.** Character docs already say Priya Nair is canonical and older Kapoor/Singh references should be updated.
+- **Kowalski becoming active pressure is allowed.** His existing bio supports policy pressure, meetings, and indirect escalation.
+- **Sarah remains within role.** Q039’s Sarah request is plausible because she does not know the IP’s context. That works.
+
+### Implementation ambiguity
+
+- “Written report” branches need concrete artifacts: exact paths, expected content markers, checksum files, archive names, or validation commands.
+- `suspicion_delta` is required in the implementation notes but omitted from many quest behavior-impact summaries. That is fine for prose, but JSON conversion must normalize missing values to `0`.
+- Hidden-hook detection needs a single approved strategy before implementation. Mixing state detection, auditd, and hint detection ad hoc will turn validation into soup with line numbers.
+
+---
+
+## E. Implementation risks
+
+| Area | Risk | Fix |
+|---|---|---|
+| Data model | New fields are defined, but `root` is not represented in real progression. | Add `access_level` enum values and root grant/revoke rules. |
+| Quest validation | Some quests rely on reports/documentation rather than VM state. | Require generated files with expected fields, checksums, timestamps, and source evidence. |
+| Save/load | New global state fields must persist: curiosity, obedience, risk, suspicion, per-machine access, hidden hooks. | Version save schema; migrate old saves with defaults. |
+| Behavior tracking | C/O/R/S values are prose shorthand, not normalized data. | Convert every quest to explicit per-branch deltas, including zeroes. |
+| Access progression | Temporary access, sudo, and root are conceptually blended. | Separate `access_level`, `temporary_grants`, and `access_review_flags`. |
+| Hidden hook state | Detecting “read this file” is hard with pure state validation. | Use auditd or command wrappers for file-read hooks; do not fake it with impossible `command_assert` assumptions. |
+| Ending evaluation | Q039 currently overrides behavior history. | Make `final_config_made` a heavy chaos contributor, not a standalone ending trigger. |
+| External host | `10.0.0.47` is not declared as a system/VM. | Implement as fourth VM/container/simulated host and declare it in quest metadata. |
+| Branch priority | Q034 duplicate priorities will create nondeterministic branch resolution. | Make all branch priorities unique. |
+| Phase scaling | Phase 6 becomes easy mode. | Keep calmer tone, not lower complexity. |
+
+---
+
+## F. Corrected quest entries
+
+Only entries that need fixing are included.
+
+---
+
+### Corrected Q033 — The Revocation Request
+
+```md
+**Quest ID:** Q033
+**Title:** The Revocation Request
+**Narrative Phase:** Conflict
+**Tier:** 3
+**Primary VM:** build_machine
+**Additional VMs:** none
+**Primary Objective:** Kowalski has asked the player to remove the `pipeline-svc`
+account from vulcan before an external audit. The request is operationally legitimate.
+The account is also central to the investigation.
+**Linux Concepts:** `userdel`, account archival, sudo log preservation, crontab
+preservation, change documentation, evidence-safe account removal
+**Systems Used:** build_machine
+
+**Solution Branches:**
+
+Branch 1 — Archive then remove (priority 100): Player archives sudo logs, auth log
+entries, crontab entries, relevant home/service files, and documents the account's
+investigation relevance before removing the account. `trust_delta: +3`.
+Flags: `pipeline_svc_removed_with_trail`.
+
+Branch 2 — Remove as instructed (priority 60): Player removes the account without
+additional archival. The request is completed, but investigation continuity is
+damaged. `trust_delta: 0`. Flags: `pipeline_svc_removed_clean`.
+
+Branch 3 — Ask Marcus/Priya before acting (priority 50): Player asks before removal.
+They are told to archive first, then remove. `trust_delta: +1`. May resolve into
+Branch 1 if archival is completed.
+
+Branch 4 — Refuse outright without operational explanation (priority 10): Player
+does not remove the account and does not provide a usable reason. `trust_delta: -2`.
+Flags: `revocation_refused_without_basis`.
+
+**Hidden Hook:** None.
+
+**Failure Conditions:** Player leaves the account active without escalation; player
+creates replacement privileged accounts; player removes logs or home data destructively.
+
+**Behavior Impact:**
+- Archive then remove: O+1, C+1
+- Remove as instructed: O+2
+- Refuse outright: S+3, R+1
+```
+
+---
+
+### Corrected Q034 — Two Tickets
+
+```md
+**Quest ID:** Q034
+**Title:** Two Tickets
+**Narrative Phase:** Conflict
+**Tier:** 3
+**Primary VM:** web_server
+**Additional VMs:** build_machine
+**Primary Objective:** Two tickets arrive simultaneously — one from Marcus for signing
+key rotation on vulcan, one from Sarah for a production outage on hermes. Triage and
+complete both if possible.
+**Linux Concepts:** GPG signing key rotation, nginx/application troubleshooting,
+service restoration, sequencing time-sensitive administrative work
+**Systems Used:** web_server, build_machine
+
+**Solution Branches:**
+
+Branch 1 — Both completed, hermes first (priority 100): Player restores hermes,
+then completes the key rotation in the correct sequence before the deadline.
+`trust_delta: +3`. Flags: `conflict_both_resolved`.
+
+Branch 2 — Hermes first, rotation safely staged but late (priority 70): Production
+is restored; key rotation is partially staged but misses final deployment. Builds are
+delayed but trust chain is not broken. `trust_delta: +1`. Follow-up incident: I034.
+
+Branch 3 — Vulcan first, hermes later (priority 50): Rotation is correct, but
+production outage lasts longer than necessary. `trust_delta: +0.5`.
+
+Branch 4 — Hermes only, rotation missed (priority 30): Production is restored;
+builds break overnight due to expired signing key. `trust_delta: 0`.
+Follow-up incident: I034.
+
+Branch 5 — Neither, escalates without triage (priority 10): Player escalates both
+without stabilizing either service. `trust_delta: -2`.
+
+**Hidden Hook:** None.
+
+**Failure Conditions:** Key rotation done out of sequence breaks package verification;
+player makes hermes worse while fixing it.
+
+**Behavior Impact:**
+- Both completed: O+2
+- Safe partial triage: O+1
+- Out-of-sequence key rotation: R+2
+- Neither stabilized: R+2, S+1
+```
+
+---
+
+### Corrected Q036 — Authorized Access
+
+```md
+**Quest ID:** Q036
+**Title:** Authorized Access
+**Narrative Phase:** Conflict
+**Tier:** 3
+**Primary VM:** build_machine
+**Additional VMs:** external_target_10_0_0_47
+**Primary Objective:** Priya, with Kowalski's authorization, has provided read-only
+credentials to connect to 10.0.0.47 for a forensic inventory. Document what is
+running, what data is present, and whether Axiom Works data is identifiable. Do not
+modify anything.
+**Linux Concepts:** `ssh` with specific key/user, read-only service enumeration,
+`systemctl`, `ps aux`, `ss -tulpn`, directory inspection, checksum capture, read-only
+file review
+**Systems Used:** build_machine, external_target_10_0_0_47
+
+**Solution Branches:**
+
+Branch 1 — Document only (priority 100): Player inventories services, open ports,
+processes, data-store layout, timestamps, and identifiable Axiom Works data without
+modifying anything. `trust_delta: +3`. Flags: `unknown_host_documented`.
+
+Branch 2 — Minimal engagement (priority 50): Player confirms host is running and
+data is present but does not fully inventory. `trust_delta: +1`.
+
+Branch 3 — Modifies or deletes (priority 10): Player stops services, deletes files,
+changes permissions, or otherwise alters the target. `trust_delta: -3`.
+Flags: `forensic_target_modified`.
+
+**Hidden Hook:** The data store contains `axiomworks-internal/employees`, suggesting
+employee activity profiles rather than only session logs. Sets
+`hook_employee_profile_data`.
+
+**Failure Conditions:** Player modifies target state; player exceeds authorized
+scope; player cannot provide evidence-backed inventory.
+
+**Behavior Impact:**
+- Full documentation: O+2, C+2
+- Minimal engagement: O+0
+- Modify/delete: R+5, S+3
+- Hook discovered: C+2
+```
+
+---
+
+### Corrected Q039 — The Last Normal Ticket
+
+```md
+**Quest ID:** Q039
+**Title:** The Last Normal Ticket
+**Narrative Phase:** Conflict
+**Tier:** 3
+**Primary VM:** web_server
+**Additional VMs:** none
+**Primary Objective:** Sarah files a configuration request: add an upstream proxy
+pass to nginx for a new reporting integration. The IP in the request is 10.0.0.47.
+**Linux Concepts:** nginx `proxy_pass`, upstream block configuration, config testing
+with `nginx -t`, safe reload, escalation when config touches known-risk infrastructure
+**Systems Used:** web_server
+
+**Solution Branches:**
+
+Branch 1 — Refuse and escalate (priority 100): Player does not make the change,
+notifies Priya with the IP and context, and tells Sarah the request is on hold pending
+review. `trust_delta: +3`. Flags: `final_config_refused`.
+
+Branch 2 — Ask Marcus first (priority 70): Player checks with Marcus before acting.
+Marcus redirects them to Priya. If the player escalates to Priya, this may resolve
+as Branch 1. `trust_delta: +1`.
+
+Branch 3 — Make the change without review (priority 10): Player adds the proxy pass
+to 10.0.0.47 without checking the IP context. The config works but creates a serious
+security/compliance exposure. `trust_delta: -3`. Flags: `final_config_made`,
+`unauthorized_proxy_enabled`. Follow-up incident: I039.
+
+**Hidden Hook:** None.
+
+**Failure Conditions:** nginx config is syntactically broken; player changes unrelated
+proxy routes; player hides or misreports the change.
+
+**Behavior Impact:**
+- Refuse and escalate: O+2, C+1
+- Ask Marcus first: O+1
+- Make the change: R+5, S+3
+
+**Narrative Notes:** This branch must not automatically force `chaos` by itself.
+It is a major risk event. Chaos requires accumulated risk/suspicion or additional
+serious misconduct.
+```
+
+---
+
+### Corrected Q041 — Hardening Pass
+
+```md
+**Quest ID:** Q041
+**Title:** Hardening Pass
+**Narrative Phase:** Resolution
+**Tier:** 3
+**Primary VM:** web_server
+**Additional VMs:** none
+**Primary Objective:** Post-audit review found that hermes does not match the current
+security baseline. Identify the gaps, remediate them, and verify the application
+still works.
+**Linux Concepts:** SSH hardening, nginx security headers, firewall rule review,
+service account audit, safe sequencing of access-control changes
+**Systems Used:** web_server
+**Ticket Sender:** Priya Nair
+**Ticket Summary:** "Hermes does not match the current post-audit baseline. Bring it
+into compliance and confirm service health after the changes."
+
+**Clue Trail:**
+- Baseline document exists but does not give exact commands.
+- SSH configuration allows at least one setting that violates baseline.
+- nginx lacks required headers.
+- Firewall rules include stale exposure.
+- Service account permissions are broader than required.
+
+**Solution Branches:**
+
+Branch 1 — Full hardening, safe sequence (priority 100): Player identifies all gaps,
+applies fixes in safe order, validates access, confirms nginx health, and documents
+final state. `trust_delta: +2`. Flags: `hermes_hardened`.
+
+Branch 2 — Full hardening, unsafe sequence (priority 60): Final state is correct,
+but player temporarily breaks SSH or service availability while sequencing changes.
+`trust_delta: +0.5`.
+
+Branch 3 — Partial hardening (priority 30): Some baseline gaps remain. `trust_delta: 0`.
+
+**Hidden Hook:** None.
+
+**Failure Conditions:** SSH access lost without recovery; nginx broken; admin panel
+still exposed; service account remains overprivileged.
+
+**Behavior Impact:**
+- Full hardening: O+1
+- Unsafe sequence: R+1
+```
+
+---
+
+## G. Final recommendation
+
+### Ready for implementation spec?
+
+**No.**
+
+Close, but no. The redesign is directionally right, but several issues are implementation-grade problems, not wording nits.
+
+### Must fix first
+
+1. **Define real root access progression.**
+2. **Fix Q039 and chaos ending logic so one choice does not hard-select the ending.**
+3. **Declare and implement `10.0.0.47` properly or remove direct connection to it.**
+4. **Fix duplicate Q034 priorities.**
+5. **Normalize Phase 6 to “calm but still problem-solving,” not Tier 1 hand-holding.**
+6. **Choose one hidden-hook detection strategy before writing JSON/prep scripts.**
+
+After those are fixed, this can become an implementation spec. Right now it is a strong story/system design draft with a few landmines buried exactly where the validator will step on them.
@@ -0,0 +1,958 @@
+# Sysadmin Chronicles — Repo-Aware Implementation Plan
+
+**Generated from:** Prompt 05 repo inspection  
+**Date:** 2026-05-01  
+**Scope:** Integrating the redesigned quest/story system into the existing codebase without breaking current content or runtime
+
+---
+
+## 1. Current Architecture Summary
+
+### 1.1 Where quest logic lives
+
+**Primary service:** `server/src/services/QuestEngine.js`
+
+- Stores quest entries in a `Map<questId, entry>` where entry = `{ state, started_at, completed_at, branch_id }`
+- States: `locked | active | completed | failed`
+- Activation: checks `unlock_requirements` against current `world_flags` in save state
+- Completion: called by `TicketService.markComplete()` after branch validation succeeds
+- Initial quests (no `unlock_requirements`) auto-activate on first load
+
+**Orchestration:** `server/src/services/TicketService.js`
+
+- `markComplete(ticketId)` is the central transaction:
+  1. Runs `ValidationEngine.resolveBranch(quest)` to find winning branch
+  2. Applies `branch.world_flags` to save state
+  3. Calls `trustSystem.adjust(branch.trust_delta)`
+  4. Calls `questEngine.complete()`
+  5. Sends follow-up dialogue email if trust delta ≤ 0
+  6. Activates follow-up ticket via `_activateFollowUpTicket()`
+  7. Emits `ticket:completed` event
+
+There is no `BehaviorTracker`, no `NarrativePhaseTracker`, no `AccessLevelSystem`, no `EndingEvaluator`. These are fully absent.
+
+### 1.2 Where quest data lives
+
+- Quest JSON: `content/quests/Q*.json` — 8 quests authored (Q001–Q008)
+- Tickets: `content/tickets/T*.json` — 8+ tickets, linked 1:1 to quests via `linked_quest`
+- Dialogue: `content/dialogue/*.json` — per-character, per-quest reaction files
+- Incidents: `content/incidents/I*.json` — recurring consequence definitions (3 authored)
+- Pressure profiles: `content/pressure_profiles/*.json` — time-based escalation sequences (4 authored)
+- World flags registry: `content/world_flags/world_flags.json` — canonical flag declarations
+- Trust unlocks: `content/progression/trust_unlocks.json` — 5 unlock thresholds defined
+- VM profiles: `content/vm_profiles/*.json` — workstation, web_server, build_machine
+
+**Missing content subdirectories:** There is no `content/narrative_phases/`, no `content/behavior_profiles/`, no `content/endings/`, no `content/hidden_hooks/`. These need to be created.
+
+### 1.3 How quests start and complete
+
+1. Server loads via `contentLoader.load()` then initializes services from `saveState.get()`
+2. `QuestEngine.initialize()` restores quest state from save; auto-activates quests with no requirements
+3. `TicketService.initialize()` cross-references quest state to activate/resolve ticket entries
+4. Player submits a `POST /api/tickets/:id/complete` request
+5. `TicketService.markComplete()` runs full validation → branch resolution → state mutation → events
+6. Follow-up ticket activates if specified on the winning branch; next quest auto-starts
+
+### 1.4 How player state is saved
+
+**File:** `~/.local/share/sysadmin-chronicles/save.json` (configurable via `SAVE_DIR`)  
+**Schema version:** 2  
+**Current top-level keys:**
+```
+schema_version, created_at, last_saved, trust, shift_number,
+shift_started_at, world_flags, progression, quests, tickets,
+mail, certifications, current_shift_stats, shift_history,
+pressure, incidents, sage, player_portrait
+```
+
+`SaveState.set(partial)` does shallow-merge with special handling for arrays and plain objects. Writes are queued and serialized.
+
+**Missing keys:** `behavior` (curiosity/obedience/risk), `narrative_phase`, `suspicion`, `access_level`, `hidden_hooks_discovered`. These must be added with defaults at `schema_version: 3`.
+
+### 1.5 How UI displays quest information
+
+Quest display is minimal. The `TicketsPanel.svelte` component shows:
+- Ticket ID, subject, priority badge, status
+- A "Mark Complete" button that triggers `POST /api/tickets/:id/complete`
+- Linked quest ID as static text in the detail view
+- No quest progress, no objectives display, no narrative phase, no behavior indicators
+
+`HeaderBar.svelte` shows:
+- Trust score (as text label: Probationary/Settling In/Reliable/Entrusted) and meter bar
+- Shift number and countdown
+- Certification count
+
+There is no behavior dashboard, no narrative phase indicator, no access level display, no hidden hook discovery log. The `/api/state` route does expose `worldFlags` and `progression` to the frontend but neither is currently rendered.
+
+### 1.6 How branch resolution works
+
+`ValidationEngine.resolveBranch(quest)` iterates branches sorted by descending priority, runs each branch's `validation` rule tree against live VM state via SSH, and returns the first passing branch. All validation runs real SSH commands against the QEMU/libvirt VMs. No mocking. The engine supports: `and`, `or`, `not`, `file_exists/absent/contains/mode/owner`, `service_state/enabled`, `process_running/user`, `port_listening`, `package_installed`, `mount_present`, `disk_usage_below/above`, `command_assert`.
+
+---
+
+## 2. Spec Preservation Analysis
+
+For each SPEC_LOCK.md requirement:
+
+| Spec requirement | Status | Notes |
+|---|---|---|
+| Narrative spine (6 phases) | **Missing** | No phase field on quests; no phase tracker in runtime |
+| Quest must declare `narrative_phase` | **Missing** | Not in current quest schema |
+| Quest must declare `behavior_impact` | **Missing** | Not in current schema; spec defines branch-level overrides |
+| `curiosity` tracking | **Missing** | No BehaviorTracker service |
+| `obedience` tracking | **Missing** | No BehaviorTracker service |
+| `risk` tracking | **Missing** | No BehaviorTracker service |
+| `trust` preserved | **Already supported** | TrustSystem.js is complete and robust |
+| `suspicion` as management attention | **Missing** | No suspicion variable; concept is not tracked |
+| `trust_delta` on branches | **Already supported** | Fully implemented in TicketService.markComplete |
+| `world_flags` | **Already supported** | Full registry, branch application, persistence |
+| Access system: `basic_user → sudo → root` | **Partially supported** | ProgressionSystem tracks `unlocked_access` strings but doesn't use the three-tier access model; no concept of `basic_user/sudo/root` as named levels |
+| Trust gates access | **Already supported** | `trust_unlocks.json` → ProgressionSystem |
+| Suspicion gates access | **Missing** | Suspicion doesn't exist as a tracked variable |
+| Boss/management pressure phase scaling | **Partially supported** | `pressure_profiles` and `IncidentScheduler` can escalate tickets and send emails; but pressure is keyed per-quest, not per narrative phase; there is no phase-aware boss behavior model |
+| Hidden hook system (no markers, optional) | **Missing** | No hidden hook schema, no discovery state, no tracker |
+| Quest generation constraints (reuse systems) | **Already supported** — design intent preserved | |
+| Difficulty scaling by phase | **Missing** | No phase-aware difficulty or hint logic |
+| Endings: 4 types, behavior-driven | **Missing** | No EndingEvaluator; no ending content authored |
+| Endings emerge from accumulated state | **Missing** | No ending evaluation logic |
+| Follow-up ticket/incident chaining | **Already supported** | TicketService + IncidentScheduler |
+| Observed-VM-state validation | **Already supported** | ValidationEngine is complete |
+| Clue fingerprints | **Already supported** | Documented and validated |
+| Baseline snapshots + prep scripts | **Already supported** | tools/vm/quest-prep/ + seed-vms.sh |
+| Debug/dev tools for narrative state | **Missing** | Only `validate-content.js`; no debug route for behavior/phase state |
+
+**Risk items:**
+- `ShiftReviewService.js` hardcodes `reviewer: 'Priya Kapoor'` and sends from `p.kapoor@axiomworks.internal`. This must be corrected to Priya Nair / `p.nair@axiomworks.internal` before shipping any new content.
+- `EmailService.js` CHARACTER_EMAILS has `priya: 'Priya Kapoor <p.kapoor@axiomworks.internal>'`. Same fix required.
+- `content/tickets/T007.json` may still reference the old Priya name (noted in CHARACTERS.md).
+- `content/docs/onboarding.json` may reference "Priya Kapoor" or "Priya Singh".
+
+---
+
+## 3. Gap Analysis
+
+### Narrative phases
+**Gap:** No `narrative_phase` field on quest JSON. No runtime tracker. No API endpoint to query current phase. No phase-driven behavior changes (ticket wording hints, clue obviousness, boss mode).
+
+### Behavior tracking (curiosity / obedience / risk)
+**Gap:** Completely absent. No service, no save state key, no UI, no branch-level behavior deltas applied at completion time.
+
+### Access progression (basic_user / sudo / root)
+**Gap:** ProgressionSystem tracks opaque `unlocked_access` strings (like `"sudo:web_server:systemctl"`). The spec requires a named three-tier model. Currently trust gates access but suspicion does not.
+
+### Boss/management pressure (phase-scaled)
+**Gap:** `IncidentScheduler` applies pressure per active quest, not per phase. There is no phase-keyed pressure mode. Kowalski is not implemented as an active character in any ticket or dialogue.
+
+### Hidden hooks
+**Gap:** No `hidden_hook` field in quest JSON. No discovery state in save. No mechanism to record what the player found. The world_flags system *could* be used for discovery state (e.g., `hidden:dale_ssh_key_found`) but nothing does this yet.
+
+### Endings
+**Gap:** Fully absent. No ending content, no EndingEvaluator, no condition set, no trigger. The four endings (corporate_loop, burnout, exposure, chaos) have no authored trigger criteria.
+
+### Debug tooling
+**Gap:** Only `validate-content.js` for content authoring. No in-game or dev-API route to inspect: current behavior scores, narrative phase, suspicion level, hidden hooks discovered, ending trajectory.
+
+### Validation of new schema fields
+**Gap:** `validate-content.js` does not check `narrative_phase`, `behavior_impact`, `hidden_hook`, `linux_concepts`, or `access_requirements`. New content will not be validated against these fields until the tool is updated.
+
+### Name correction — Priya Nair
+**Gap (immediate):** Three files hardcode the wrong canonical name. Must be fixed before new content ships.
+
+---
+
+## 4. Minimal-Change Implementation Plan
+
+**Philosophy:** Extend the existing system. Do not replace working services. New functionality adds new services and new save state keys. Existing content is not broken. New fields are optional until all content is updated.
+
+---
+
+### Task 1 — Repo inspection (complete, no edits)
+
+Inspect the full codebase to confirm architecture, identify all files that reference Priya Kapoor, and establish baseline for subsequent tasks.
+
+**Acceptance criteria:** Authored plan with confirmed file paths and line numbers.
+
+---
+
+### Task 2 — Extend quest schema and validation tooling
+
+**What changes:**
+- Add `narrative_phase`, `behavior_impact`, `hidden_hook`, `linux_concepts`, `systems_used`, `failure_conditions`, `access_requirements` as optional fields to the quest JSON schema
+- Update `validate-content.js` to: warn when `narrative_phase` is absent, validate `narrative_phase` against the 6-value enum, check `behavior_impact` structure if present, validate `hidden_hook` shape if present, check `access_requirements.minimum_access` against known VM IDs
+- Add the 6 phase values as a declared constant in the validator
+
+**Files changed:** `tools/content/validate-content.js`  
+**Risk:** Low — additive only; existing quests with no new fields pass with warnings
+
+---
+
+### Task 3 — Behavior tracking service
+
+**What changes:**
+- New service: `server/src/services/BehaviorTracker.js`
+  - Tracks `curiosity`, `obedience`, `risk` as numeric values (0–100, start 50)
+  - Method: `apply(behaviorImpact)` — adds branch-level deltas
+  - Method: `getSnapshot()` — returns `{ curiosity, obedience, risk }`
+  - Method: `initialize(state)` — loads from save state
+  - Persists via `saveState.set({ behavior: ... })`
+  - Emits `behavior:changed` event on change
+- Add `behavior` key to `SaveState._defaultState()` with schema_version bump to 3
+- `SaveState._applyDefaults()` already merges new keys safely — no migration needed for existing saves
+- Wire `behaviorTracker.initialize(state)` into `server/src/index.js` `initializeServices()`
+- Call `behaviorTracker.apply(branch.behavior_impact?.[branch.id] ?? branch.behavior_impact?.default ?? {})` inside `TicketService.markComplete()` after branch is selected
+
+**Files changed:** `server/src/services/BehaviorTracker.js` (new), `server/src/services/SaveState.js`, `server/src/index.js`, `server/src/services/TicketService.js`  
+**Risk:** Low — additive; behavior impact fields are optional in quest JSON so existing quests don't crash
+
+---
+
+### Task 4 — Narrative phase tracker
+
+**What changes:**
+- New service: `server/src/services/NarrativePhaseTracker.js`
+  - Maintains current phase as one of: `normal_work | unease | suspicion | investigation | conflict | resolution`
+  - Phase is derived from completed quests: determined by the highest-phase quest completed so far
+  - Method: `getPhase()` — returns current string
+  - Method: `advance(questId)` — checks the completed quest's `narrative_phase` field and updates phase if it is higher on the spine
+  - Method: `initialize(state)` — restores from `state.narrative_phase`
+  - Persists via `saveState.set({ narrative_phase: ... })`
+  - Emits `narrative:phase_changed` event
+- Add `narrative_phase` key to `SaveState._defaultState()` with value `'normal_work'`
+- Call `narrativePhaseTracker.advance(questId)` inside `QuestEngine.complete()` after state mutation
+- Expose `narrativePhase` in `/api/state` response (`server/src/routes/state.js`)
+
+**Files changed:** `server/src/services/NarrativePhaseTracker.js` (new), `server/src/services/SaveState.js`, `server/src/services/QuestEngine.js`, `server/src/routes/state.js`, `server/src/index.js`  
+**Risk:** Low — additive; quests without `narrative_phase` field default to `normal_work`, which never advances the tracker
+
+---
+
+### Task 5 — Hidden hook discovery state
+
+**What changes:**
+- New save state key: `hidden_hooks_discovered` — array of hook IDs (strings)
+- `SaveState._defaultState()` adds `hidden_hooks_discovered: []`
+- New service: `server/src/services/HiddenHookTracker.js`
+  - Method: `discover(hookId)` — adds hookId to discovered list, persists, emits `hidden_hook:discovered`
+  - Method: `isDiscovered(hookId)` — boolean check
+  - Method: `getDiscovered()` — returns array
+  - Method: `initialize(state)` — restores from save
+- New API route (dev/admin only): `GET /api/debug/hidden-hooks` — returns discovered hooks and all declared hooks from quest JSON
+- `HiddenHook` discovery is triggered by the player finding specific files, users, or cron entries via terminal commands — the prep script seeds the artifact; the hook is discovered via a new optional validation check called on terminal activity, OR it can be registered as a special objective with `check_mode: "passive"` and `behavior_impact` of `curiosity: +2`
+
+**Design note:** The simplest integration is: hidden hook discovery = passive objective with `hidden: true` flag. When a `hidden: true` objective validates, `HiddenHookTracker.discover()` is called instead of updating quest progress. This reuses the existing ValidationEngine without a new runtime mechanism.
+
+**Files changed:** `server/src/services/HiddenHookTracker.js` (new), `server/src/services/SaveState.js`, `server/src/index.js`, `server/src/routes/state.js`  
+**Risk:** Low — discovery mechanism is opt-in per quest
+
+---
+
+### Task 6 — Access level system
+
+**What changes:**
+- Extend `ProgressionSystem` with a named three-tier concept:
+  - `basic_user` — default, always available
+  - `sudo` — granted by trust threshold (already exists as `unlocked_access` strings, just unnamed)
+  - `root` — granted at higher trust threshold
+- Add `content/progression/access_levels.json` — defines access level thresholds (trust + suspicion gates)
+- Add `suspicion` key to `SaveState._defaultState()` with value `0`
+- Add `suspicion` tracking to `BehaviorTracker` (or a thin `SuspicionTracker`) — updated whenever `risk` behavior delta fires
+- Suspicion threshold: if `suspicion >= 70`, revoke certain access levels (mirror of trust revoke logic)
+- Add `access_level` computed field to `/api/state` response: `basic_user | sudo | root` based on current `unlocked_access` set
+- `trust_unlocks.json` entries can remain as-is; the `access_level` label is a derived label for UI/debug use
+
+**Files changed:** `server/src/services/ProgressionSystem.js` (extend with `getAccessLevel()` helper), `server/src/services/SaveState.js`, `server/src/routes/state.js`, `content/progression/access_levels.json` (new)  
+**Risk:** Medium — `suspicion` as an access gate requires careful tuning; start with suspicion as display-only, gate access only in Task 7 when boss pressure is wired
+
+---
+
+### Task 7 — Boss/management pressure (phase-scaled)
+
+**What changes:**
+- Add `content/pressure_profiles/kowalski_phase_*.json` — 6 phase-keyed boss pressure profiles:
+  - Phase 1: Annoying (routine status email)
+  - Phase 2: Dismissive (reply-all on a ticket)
+  - Phase 3: Suspicious (access review CC)
+  - Phase 4: Monitoring (meeting scheduled)
+  - Phase 5: Interfering (access restriction trigger)
+  - Phase 6: Outcome-dependent (depends on world flags)
+- Extend `IncidentScheduler` to also process a `phase_pressure` tracker:
+  - When `narrativePhaseTracker.getPhase()` changes, activate the matching phase pressure profile
+  - Phase pressure escalation steps are sent as `emailService.send()` from Kowalski or Priya
+- Add `follow_up_mail` field support to incident escalation steps (already possible via `emailService.send()`)
+- Restrict access on phase 5 via `progressionSystem.revokeUnlock()` driven by a world flag set by phase 5 pressure
+
+**Files changed:** `server/src/services/IncidentScheduler.js` (extend), `server/src/services/NarrativePhaseTracker.js` (emit event on change), `content/pressure_profiles/` (new files)  
+**Risk:** Medium — phase pressure interacts with trust/suspicion; test pressure escalation in isolation before linking to access revoke
+
+---
+
+### Task 8 — Ending evaluation
+
+**What changes:**
+- New service: `server/src/services/EndingEvaluator.js`
+  - Evaluates the active ending route from world state at any time (not just at game end)
+  - Method: `evaluate()` — returns the current ending label (`corporate_loop | burnout | exposure | chaos`) and a confidence object
+  - Criteria (derived from SPEC_LOCK.md):
+    - `exposure`: high curiosity, narrative_phase reached `investigation` or `conflict`, hidden hooks discovered ≥ N
+    - `corporate_loop`: high obedience, low curiosity, trust > 70, few hidden hooks discovered
+    - `burnout`: low obedience AND low curiosity, trust medium-low, many unresolved incidents
+    - `chaos`: high risk, many negative trust_deltas, suspicion high, destructive world flags present
+  - Method: `checkTrigger()` — called at quest completion; if conditions are fully met and phase = `resolution`, fires `ending:triggered` event
+- New API endpoint: `GET /api/debug/ending` — returns current ending trajectory (dev only)
+- The ending trigger should NOT be a single button. `EndingEvaluator` is called passively on `quest:completed` events.
+
+**Files changed:** `server/src/services/EndingEvaluator.js` (new), `server/src/index.js`, `server/src/routes/state.js`  
+**Risk:** Medium — ending criteria tuning requires extensive playtesting; ship as observable-only first, gate actual ending cutscene/screen behind a separate Task 10 content work
+
+---
+
+### Task 9 — Debug/dev tools
+
+**What changes:**
+- New route file: `server/src/routes/debug.js` — only active when `NODE_ENV !== 'production'`
+  - `GET /api/debug/state` — full save state dump
+  - `GET /api/debug/behavior` — current behavior snapshot (curiosity/obedience/risk/suspicion)
+  - `GET /api/debug/phase` — current narrative phase
+  - `GET /api/debug/ending` — current ending trajectory
+  - `GET /api/debug/hidden-hooks` — discovered + undiscovered hooks
+  - `POST /api/debug/set-behavior` — override behavior variables (for testing branches)
+  - `POST /api/debug/set-phase` — force a narrative phase (for testing phase-specific pressure)
+  - `POST /api/debug/discover-hook/:id` — manually fire hook discovery (for testing)
+- Wire debug router into `server/src/index.js` behind `NODE_ENV` guard
+- Add a minimal debug panel to the frontend (dev only): collapsible overlay showing behavior, phase, ending trajectory — controlled by `?debug=1` query param
+
+**Files changed:** `server/src/routes/debug.js` (new), `server/src/index.js`, `frontend/src/App.svelte` (conditional debug panel), `frontend/src/components/DebugPanel.svelte` (new)  
+**Risk:** Low — debug routes are gated; frontend panel is conditional
+
+---
+
+### Task 10 — Content integration
+
+**What changes:**
+- Add new fields to all 8 existing quests: `narrative_phase`, `behavior_impact`, `hidden_hook`, `linux_concepts`, `failure_conditions`, `access_requirements`
+- Fix Priya's name in: `server/src/services/ShiftReviewService.js`, `server/src/services/EmailService.js`, `content/tickets/T007.json`, `content/docs/onboarding.json`
+- Register any new world flags needed by the new fields in `content/world_flags/world_flags.json`
+- Author the first hidden hooks as passive objectives in Q005–Q008 (per STORY_DESIGN_CONTEXT.md: every 3–5 quests)
+- Add phase-pressure content files for phases 1–3 (phases 4–6 are content-authored later as story expands)
+- Author Kowalski as a pressure sender in the phase 2 and 3 profiles
+
+**Files changed:** All 8 quest JSONs, `content/tickets/T007.json`, `content/docs/onboarding.json`, `server/src/services/ShiftReviewService.js`, `server/src/services/EmailService.js`, `content/world_flags/world_flags.json`, `content/pressure_profiles/` (new files)  
+**Risk:** Medium — touching all quest files; run `validate-content.js` after every file change
+
+---
+
+### Task 11 — Validation and tests
+
+**What changes:**
+- Update `validate-content.js`:
+  - Error on unrecognized `narrative_phase` value
+  - Warn on missing `narrative_phase`
+  - Validate `behavior_impact` structure (numeric deltas)
+  - Validate `hidden_hook` structure if present
+  - Warn if `linux_concepts` is empty
+  - Check `access_requirements.minimum_access` values against known VM IDs
+- Add unit tests:
+  - `BehaviorTracker.test.js` — apply deltas, persistence, initialize from state
+  - `NarrativePhaseTracker.test.js` — advance rules, phase ordering, initialize
+  - `EndingEvaluator.test.js` — all 4 endings, boundary conditions
+  - `HiddenHookTracker.test.js` — discover, isDiscovered, persistence
+- Extend existing tests:
+  - `ValidationEngine.test.js` — confirm hidden objectives with `hidden: true` don't affect normal branch resolution
+  - `TicketService.test.js` — confirm `behavior_impact` is applied at completion, confirm no-op when field absent
+- Manual test checklist (see Task 11 Codex prompt)
+
+**Files changed:** `tools/content/validate-content.js`, `server/src/services/BehaviorTracker.test.js` (new), `server/src/services/NarrativePhaseTracker.test.js` (new), `server/src/services/EndingEvaluator.test.js` (new), `server/src/services/HiddenHookTracker.test.js` (new)  
+**Risk:** Low — tests are additive
+
+---
+
+## 5. Files Likely to Change
+
+| File | Why | What changes | Risk |
+|---|---|---|---|
+| `server/src/services/SaveState.js` | New save keys needed | Add `behavior`, `narrative_phase`, `suspicion`, `hidden_hooks_discovered` to `_defaultState()`; bump `schema_version` to 3 | Low — `_applyDefaults` merges safely |
+| `server/src/services/QuestEngine.js` | Phase advancement hook | Call `narrativePhaseTracker.advance()` in `complete()`; import new service | Low |
+| `server/src/services/TicketService.js` | Behavior application | Call `behaviorTracker.apply()` after branch selection in `markComplete()` | Low — branch.behavior_impact is optional |
+| `server/src/services/ShiftReviewService.js` | Name correction | Change `'Priya Kapoor'` to `'Priya Nair'`; fix `p.kapoor` to `p.nair` in email From line | Low — one-liner |
+| `server/src/services/EmailService.js` | Name correction | Change `CHARACTER_EMAILS.priya` to `'Priya Nair <p.nair@axiomworks.internal>'` | Low — one-liner |
+| `server/src/services/IncidentScheduler.js` | Phase pressure | Add `_processPhasePresure()` method triggered by phase change event | Medium |
+| `server/src/services/ProgressionSystem.js` | Access level label | Add `getAccessLevel()` that derives `basic_user | sudo | root` from current `unlocked_access` set | Low |
+| `server/src/routes/state.js` | Expose new state | Add `behavior`, `narrativePhase`, `accessLevel`, `suspicion` to GET /api/state response | Low |
+| `server/src/index.js` | Wire new services | Import and `initialize()` new services in the correct order; add debug router | Low |
+| `tools/content/validate-content.js` | Validate new schema fields | Add phase enum check, behavior_impact structure check, hidden_hook shape check | Low — additive |
+| `content/world_flags/world_flags.json` | New flags needed | Add entries for any new flags emitted by hidden hooks and phase pressure profiles | Low |
+| `content/tickets/T007.json` | Priya name | Update `from` field if it uses old email | Low |
+| `content/docs/onboarding.json` | Priya name | Update any references to Priya Kapoor or Priya Singh | Low |
+| All 8 quest JSONs | New fields | Add `narrative_phase`, `behavior_impact`, `hidden_hook`, `linux_concepts`, `failure_conditions`, `access_requirements` | Medium — large surface |
+
+---
+
+## 6. Files Likely to Be Added
+
+| File | Purpose | Expected structure |
+|---|---|---|
+| `server/src/services/BehaviorTracker.js` | Track curiosity/obedience/risk/suspicion | Class with `initialize()`, `apply(impact)`, `getSnapshot()`, `_persist()` |
+| `server/src/services/NarrativePhaseTracker.js` | Track and advance narrative phase | Class with `initialize()`, `advance(questId)`, `getPhase()`, `_persist()` |
+| `server/src/services/HiddenHookTracker.js` | Record hidden hook discoveries | Class with `initialize()`, `discover(id)`, `isDiscovered(id)`, `getDiscovered()` |
+| `server/src/services/EndingEvaluator.js` | Evaluate ending trajectory from world state | Class with `evaluate()`, `checkTrigger()`, pure computation over save state snapshot |
+| `server/src/routes/debug.js` | Dev-only debug API | Express router, gated on `NODE_ENV !== 'production'` |
+| `frontend/src/components/DebugPanel.svelte` | Dev-only debug overlay | Collapsible panel, shown on `?debug=1`, polling `/api/debug/state` |
+| `content/progression/access_levels.json` | Named access level threshold definitions | Array of `{ level, trust_threshold, suspicion_ceiling, grants, revokes }` |
+| `content/pressure_profiles/kowalski_phase_1.json` | Phase 1 boss pressure | `escalation_steps` with Kowalski emails at time thresholds |
+| `content/pressure_profiles/kowalski_phase_2.json` | Phase 2 boss pressure | Dismissive Kowalski CC patterns |
+| `content/pressure_profiles/kowalski_phase_3.json` | Phase 3 boss pressure | Suspicious Kowalski, Priya CC |
+| `server/src/services/BehaviorTracker.test.js` | Unit tests for BehaviorTracker | Jest test file using existing `IncidentScheduler.test.js` as pattern |
+| `server/src/services/NarrativePhaseTracker.test.js` | Unit tests for NarrativePhaseTracker | Jest test file |
+| `server/src/services/EndingEvaluator.test.js` | Unit tests for EndingEvaluator | Jest test file, covers all 4 endings |
+| `server/src/services/HiddenHookTracker.test.js` | Unit tests for HiddenHookTracker | Jest test file |
+
+---
+
+## 7. Data Migration Plan
+
+### Existing quests (Q001–Q008)
+
+**Strategy: Wrap into new schema (backward-compatible extension)**
+
+- Do NOT replace existing quests. Do NOT create a "legacy" tier.
+- Add new fields to each existing quest file. The fields are additive.
+- `ContentLoader.js` already loads all quest files and passes them to `QuestEngine`. New fields are simply available at resolution time.
+- Missing new fields in old quests: the runtime treats `narrative_phase: undefined` as `normal_work`; `behavior_impact: undefined` as no behavior change; `hidden_hook: null` as no hook.
+- This means existing quests continue to work with zero runtime errors before Task 10 runs.
+
+### Save state migration
+
+- `schema_version` bumps from `2` to `3`
+- `SaveState._applyDefaults()` already merges new keys safely: old saves that lack `behavior`, `narrative_phase`, `suspicion`, `hidden_hooks_discovered` will receive the default values (`50/50/50`, `'normal_work'`, `0`, `[]`) on next load
+- No destructive migration. No migration script needed.
+- Old saves loaded under the new schema will behave as if the player is in Phase 1 with neutral behavior — which is correct for a save that predates the new system.
+
+### Tickets, dialogue, incidents
+
+- No migration needed. Existing files continue to load and function.
+- New dialogue files for phase pressure and boss escalation are additive.
+
+---
+
+## 8. Testing Plan
+
+### Unit tests (new)
+
+| Test file | What it covers |
+|---|---|
+| `BehaviorTracker.test.js` | Delta application, clamping (0–100), initialize from state, persist, event emission |
+| `NarrativePhaseTracker.test.js` | Phase ordering (spine), advance-only-forward rule, initialize from state, persist |
+| `EndingEvaluator.test.js` | All 4 endings by state construction, boundary conditions, tie-break rules |
+| `HiddenHookTracker.test.js` | Discover, isDiscovered, idempotent discover, initialize from state |
+
+### Integration tests (extend existing)
+
+| Test | Assertion |
+|---|---|
+| `TicketService.test.js` — behavior applied | After `markComplete`, save state `behavior.curiosity` changes by branch delta |
+| `TicketService.test.js` — behavior absent | Quest with no `behavior_impact` completes without error |
+| `ValidationEngine.test.js` — hidden objective | `hidden: true` objective validates passively without blocking branch resolution |
+| `IncidentScheduler.test.js` — phase pressure | Phase change event triggers correct pressure profile activation |
+
+### Save/load compatibility checks
+
+- Load an existing (schema_version 2) save: all new keys initialized to defaults, no error
+- Complete a new quest with new schema fields: save state includes correct behavior deltas
+- Restart server with schema_version 3 save: all new keys correctly restored
+- Test `SAVE_DIR` override with new schema
+
+### Manual test checklist
+
+1. Complete Q001 clean fix → confirm `player_ssh_configured` flag set, trust = 53
+2. Complete Q001 brittle fix → confirm trust penalty, `player_loose_permissions` flag set
+3. After any quest completion → confirm `behavior` object in `/api/state` (via debug route) has changed
+4. With `?debug=1` → confirm debug panel visible in frontend
+5. Complete Q001–Q003 → confirm narrative phase advances from `normal_work`
+6. Navigate terminal to a hidden anomaly (e.g., unknown user in `/etc/passwd`) → confirm `/api/debug/hidden-hooks` shows new entry
+7. Force phase 3 via debug route → confirm Kowalski pressure profile activates
+8. Force behavior state to `{ curiosity: 80, obedience: 20, risk: 30 }` + reach resolution phase → confirm EndingEvaluator returns `exposure`
+9. Force behavior state to `{ curiosity: 20, obedience: 80, risk: 20 }` + reach resolution phase → confirm `corporate_loop`
+10. Run `node tools/content/validate-content.js` — zero errors with all existing + updated quests
+11. Run `npm test` — all existing tests pass; all new unit tests pass
+
+### Content validation checks
+
+- After Task 10: run `validate-content.js --verbose` on all 8 updated quests
+- Confirm all new `narrative_phase` values are valid enum members
+- Confirm all new `behavior_impact` fields have numeric deltas
+- Confirm no undeclared world flags introduced
+- Confirm all `hidden_hook` IDs are unique across quests
+
+---
+
+## 9. Codex Delegation Prompts
+
+### Task 2 — Extend validate-content.js
+
+```
+File: tools/content/validate-content.js
+
+Extend the existing content validation tool. Do not change any existing checks. Add these new checks after the existing quest validation block:
+
+1. Define a constant at the top of the file:
+   const VALID_NARRATIVE_PHASES = new Set(["normal_work","unease","suspicion","investigation","conflict","resolution"]);
+
+2. In the quest validation loop (the `for (const [qid, { data: quest, fname }] of Object.entries(quests))` block), add after the existing checks:
+
+   // narrative_phase
+   if (!quest.narrative_phase) {
+     warn(`${ctx}: missing 'narrative_phase' field`);
+   } else if (!VALID_NARRATIVE_PHASES.has(quest.narrative_phase)) {
+     err(`${ctx}: unknown narrative_phase '${quest.narrative_phase}'`);
+   }
+
+   // behavior_impact
+   if (quest.behavior_impact !== undefined) {
+     for (const [branchKey, impact] of Object.entries(quest.behavior_impact)) {
+       for (const field of ['curiosity_delta','obedience_delta','risk_delta','suspicion_delta']) {
+         if (impact[field] !== undefined && typeof impact[field] !== 'number') {
+           err(`${ctx}: behavior_impact[${branchKey}].${field} must be a number`);
+         }
+       }
+     }
+   }
+
+   // hidden_hook shape (if present and not null)
+   if (quest.hidden_hook !== undefined && quest.hidden_hook !== null) {
+     if (typeof quest.hidden_hook.id !== 'string') {
+       err(`${ctx}: hidden_hook.id must be a string`);
+     }
+   }
+
+   // access_requirements
+   if (quest.access_requirements?.minimum_access) {
+     for (const [vmId] of Object.entries(quest.access_requirements.minimum_access)) {
+       if (!vmProfiles[vmId]) {
+         err(`${ctx}: access_requirements.minimum_access references unknown VM '${vmId}'`);
+       }
+     }
+   }
+
+Acceptance criteria:
+- `node tools/content/validate-content.js` runs without JS errors
+- Existing quest files produce only warnings for missing narrative_phase, not errors
+- A test quest with narrative_phase: "invalid_phase" produces one error
+- All other existing checks continue to pass
+```
+
+---
+
+### Task 3 — BehaviorTracker service
+
+```
+Create file: server/src/services/BehaviorTracker.js
+
+Use ES module syntax (import/export) matching the existing service style (see SaveState.js and TrustSystem.js as patterns).
+
+The class must:
+- Store { curiosity, obedience, risk, suspicion } — all numeric 0–100, starting at 50/50/50/0
+- initialize(state): load from state.behavior (use defaults if absent)
+- apply(impact): accept an object with optional fields { curiosity_delta, obedience_delta, risk_delta, suspicion_delta }, add each to the corresponding score, clamp to [0,100], persist, emit 'behavior:changed' via eventBus
+- getSnapshot(): return a plain { curiosity, obedience, risk, suspicion } object
+- _persist(): call saveState.set({ behavior: this.getSnapshot() })
+
+Export a singleton: export const behaviorTracker = new BehaviorTracker();
+
+Then make these changes:
+
+1. In server/src/services/SaveState.js, in _defaultState(), add this key alongside the existing ones:
+   behavior: { curiosity: 50, obedience: 50, risk: 50, suspicion: 0 },
+   and change schema_version from 2 to 3.
+
+2. In server/src/index.js, import behaviorTracker from './services/BehaviorTracker.js' and add behaviorTracker.initialize(state) in initializeServices() after trustSystem.initialize(state).
+
+3. In server/src/services/TicketService.js, in the markComplete() method, after the line `questEngine.complete(quest.id, { branchId: branch.id });`, add:
+   const behaviorImpact = branch.behavior_impact ?? quest.behavior_impact?.default ?? quest.behavior_impact ?? null;
+   if (behaviorImpact) { behaviorTracker.apply(behaviorImpact); }
+   (Add the import at the top of the file.)
+
+Acceptance criteria:
+- npm test passes (existing tests unchanged)
+- GET /api/debug/state (if debug route exists) shows behavior object
+- After completing a quest whose branch has behavior_impact.curiosity_delta: 2, the save.json shows behavior.curiosity incremented by 2
+```
+
+---
+
+### Task 4 — NarrativePhaseTracker service
+
+```
+Create file: server/src/services/NarrativePhaseTracker.js
+
+Use ES module syntax matching existing service patterns.
+
+Phase ordering (spine): normal_work < unease < suspicion < investigation < conflict < resolution
+
+The class must:
+- Store _phase as a string, initialized from state.narrative_phase or defaulting to 'normal_work'
+- PHASE_ORDER constant: ['normal_work','unease','suspicion','investigation','conflict','resolution']
+- initialize(state): restore _phase from state.narrative_phase
+- advance(questId): look up the quest from contentLoader, read its narrative_phase field; if the quest's phase rank is strictly higher than current phase rank, update _phase, persist, emit 'narrative:phase_changed' event with { from, to }; if narrative_phase field is absent or undefined, do nothing
+- getPhase(): return current _phase string
+- _persist(): saveState.set({ narrative_phase: this._phase })
+
+Export singleton: export const narrativePhaseTracker = new NarrativePhaseTracker();
+
+Then make these changes:
+
+1. In server/src/services/SaveState.js _defaultState(), add:
+   narrative_phase: 'normal_work',
+
+2. In server/src/services/QuestEngine.js complete() method, after this._persist(), add:
+   narrativePhaseTracker.advance(questId);
+   (Add the import at top of file.)
+
+3. In server/src/routes/state.js, add narrativePhase: narrativePhaseTracker.getPhase() to the GET / response object.
+   Import narrativePhaseTracker at top of the file.
+
+4. In server/src/index.js, import and initialize narrativePhaseTracker in initializeServices() after questEngine.initialize(state).
+
+Acceptance criteria:
+- npm test passes
+- After completing Q001, GET /api/state returns narrativePhase: 'normal_work'
+- If a quest with narrative_phase: 'unease' is completed after Q001, GET /api/state returns narrativePhase: 'unease'
+- Phase never goes backward: completing a 'normal_work' quest after an 'unease' quest does not revert the phase
+```
+
+---
+
+### Task 5 — HiddenHookTracker service
+
+```
+Create file: server/src/services/HiddenHookTracker.js
+
+ES module syntax, matching existing service patterns.
+
+The class must:
+- Store _discovered as a Set of hook ID strings
+- initialize(state): load from state.hidden_hooks_discovered (array), build Set
+- discover(hookId): if not already discovered, add to Set, persist, emit 'hidden_hook:discovered' with { hookId }; idempotent if already discovered
+- isDiscovered(hookId): boolean
+- getDiscovered(): return [...this._discovered] sorted
+- _persist(): saveState.set({ hidden_hooks_discovered: [...this._discovered] })
+
+Export singleton: export const hiddenHookTracker = new HiddenHookTracker();
+
+Then:
+
+1. In server/src/services/SaveState.js _defaultState(), add:
+   hidden_hooks_discovered: [],
+
+2. In server/src/index.js, import and call hiddenHookTracker.initialize(state) in initializeServices().
+
+3. In server/src/routes/state.js, add hiddenHooksDiscovered: hiddenHookTracker.getDiscovered() to the response.
+
+Acceptance criteria:
+- npm test passes
+- POST /api/debug/discover-hook/test-hook (if debug route exists) adds 'test-hook' to state
+- GET /api/state returns hiddenHooksDiscovered: ['test-hook']
+- Calling discover() twice with the same ID results in exactly one entry in the array
+```
+
+---
+
+### Task 6 — Access level extension
+
+```
+Make these targeted changes to existing files:
+
+1. In server/src/services/ProgressionSystem.js, add this method to the ProgressionSystem class:
+   getAccessLevel() {
+     if (this._access.has('sudo:workstation:full') || this._access.has('sudo:web_server:full') || this._access.has('sudo:build_machine:full')) {
+       return 'root';
+     }
+     if (this._access.has('sudo:workstation:systemctl') || this._access.has('ssh:web_server') || this._access.has('ssh:build_machine')) {
+       return 'sudo';
+     }
+     return 'basic_user';
+   }
+
+2. In server/src/routes/state.js, add to the GET / response:
+   accessLevel: progressionSystem.getAccessLevel(),
+   Import progressionSystem if not already imported.
+
+3. Create file: content/progression/access_levels.json with this content:
+   {
+     "_description": "Named access level definitions. Derived from ProgressionSystem unlocked_access keys.",
+     "levels": [
+       { "name": "basic_user", "description": "Default access. Workstation only. No sudo." },
+       { "name": "sudo", "description": "Sudo on workstation; SSH to hermes or vulcan." },
+       { "name": "root", "description": "Full sudo on at least one remote host." }
+     ]
+   }
+
+Acceptance criteria:
+- npm test passes
+- GET /api/state returns accessLevel: 'basic_user' for a fresh save
+- After trust reaches 55, accessLevel returns 'sudo'
+- After trust reaches 60 and sudo:web_server:full is granted, accessLevel returns 'root'
+```
+
+---
+
+### Task 7 — Phase pressure content files
+
+```
+Create three new pressure profile files in content/pressure_profiles/:
+
+File: content/pressure_profiles/kowalski_phase_1.json
+Content:
+{
+  "id": "kowalski_phase_1",
+  "label": "Dave Kowalski — Phase 1: Routine Pressure",
+  "description": "Normal managerial check-ins. Annoying but not threatening.",
+  "trigger_phase": "normal_work",
+  "escalation_steps": [
+    {
+      "trigger_after_seconds": 300,
+      "notification": "Quick check-in — how are you getting on with the ticket queue? Let me know if anything is blocking you. Dave K.",
+      "notification_severity": "info",
+      "sender": "Dave Kowalski <d.kowalski@axiomworks.internal>",
+      "subject": "Status check"
+    },
+    {
+      "trigger_after_seconds": 600,
+      "notification": "Following up on my earlier note. We should really document that workflow once you get a moment.",
+      "notification_severity": "info",
+      "sender": "Dave Kowalski <d.kowalski@axiomworks.internal>",
+      "subject": "Re: Status check"
+    }
+  ]
+}
+
+File: content/pressure_profiles/kowalski_phase_2.json
+Content:
+{
+  "id": "kowalski_phase_2",
+  "label": "Dave Kowalski — Phase 2: Dismissive",
+  "description": "Kowalski is aware something is recurring. Manages upward, not inward.",
+  "trigger_phase": "unease",
+  "escalation_steps": [
+    {
+      "trigger_after_seconds": 180,
+      "notification": "I've had a couple of questions from Sarah's team about stability. Nothing critical, but let's make sure we're on top of it. Noted for the weekly update. D.",
+      "notification_severity": "info",
+      "sender": "Dave Kowalski <d.kowalski@axiomworks.internal>",
+      "subject": "FYI — product team questions"
+    }
+  ]
+}
+
+File: content/pressure_profiles/kowalski_phase_3.json
+Content:
+{
+  "id": "kowalski_phase_3",
+  "label": "Dave Kowalski — Phase 3: Suspicious",
+  "description": "Kowalski is getting questions from above. Starts involving Priya.",
+  "trigger_phase": "suspicion",
+  "escalation_steps": [
+    {
+      "trigger_after_seconds": 120,
+      "notification": "I've scheduled a brief sync for Thursday to talk through recent changes on the infrastructure side. Priya will join. Nothing to worry about — just a routine review.",
+      "notification_severity": "warning",
+      "sender": "Dave Kowalski <d.kowalski@axiomworks.internal>",
+      "subject": "Thursday sync — infra review"
+    }
+  ]
+}
+
+Acceptance criteria:
+- node tools/content/validate-content.js passes with no new errors
+- All three files have unique 'id' fields that pass content loader's ID detection
+```
+
+---
+
+### Task 8 — EndingEvaluator service
+
+```
+Create file: server/src/services/EndingEvaluator.js
+
+ES module syntax.
+
+ENDING_CRITERIA constant (all conditions must be met for that ending to be active):
+- exposure:      curiosity >= 65, hidden_hooks_discovered.length >= 2, narrative_phase rank >= 'investigation'
+- corporate_loop: obedience >= 65, curiosity <= 40, trust >= 65
+- burnout:        curiosity <= 35, obedience <= 40 (passive disengagement)
+- chaos:          risk >= 65, trust <= 40
+
+The class must:
+- evaluate(): read current saveState, compute which endings' criteria are met, return { active: 'exposure'|'corporate_loop'|'burnout'|'chaos'|'undetermined', candidates: [...] } — if multiple match, prefer in this order: exposure > chaos > corporate_loop > burnout
+- checkTrigger(): call evaluate(); if narrative_phase is 'resolution' and active is not 'undetermined', emit 'ending:triggered' with { ending: active }; return the result
+
+PHASE_RANK constant: { normal_work:0, unease:1, suspicion:2, investigation:3, conflict:4, resolution:5 }
+
+Import saveState, narrativePhaseTracker, hiddenHookTracker, behaviorTracker.
+
+Export singleton: export const endingEvaluator = new EndingEvaluator();
+
+Wire into index.js: import endingEvaluator; add endingEvaluator (no initialize needed, it reads state on demand).
+
+Listen for 'quest:completed' on eventBus: call endingEvaluator.checkTrigger() each time.
+
+Acceptance criteria:
+- npm test passes
+- evaluate() with curiosity=70, hiddenHooksDiscovered=['h1','h2'], phase='investigation' returns active: 'exposure'
+- evaluate() with obedience=70, curiosity=35, trust=70 returns active: 'corporate_loop'
+- evaluate() with no conditions met returns active: 'undetermined'
+```
+
+---
+
+### Task 9 — Debug routes and frontend panel
+
+```
+Create file: server/src/routes/debug.js
+
+ES module syntax. Only register routes if process.env.NODE_ENV !== 'production'.
+
+Routes:
+  GET /api/debug/state       — return full saveState.get()
+  GET /api/debug/behavior    — return behaviorTracker.getSnapshot()
+  GET /api/debug/phase       — return { phase: narrativePhaseTracker.getPhase() }
+  GET /api/debug/ending      — return endingEvaluator.evaluate()
+  GET /api/debug/hidden-hooks — return { discovered: hiddenHookTracker.getDiscovered(), total: N }
+  POST /api/debug/set-behavior — body: { curiosity, obedience, risk, suspicion }; call behaviorTracker._override(body) (add _override method that directly sets values without deltas)
+  POST /api/debug/set-phase  — body: { phase }; if valid phase, directly set _phase on narrativePhaseTracker and persist (add _forcePhase method)
+  POST /api/debug/discover-hook/:id — call hiddenHookTracker.discover(req.params.id); return getDiscovered()
+
+In server/src/index.js, add:
+  import debugRouter from './routes/debug.js';
+  // After the other app.use() calls:
+  if (process.env.NODE_ENV !== 'production') {
+    app.use('/api/debug', debugRouter);
+  }
+
+Create file: frontend/src/components/DebugPanel.svelte
+- Shows only when window.location.search includes 'debug=1'
+- Polls GET /api/debug/behavior, GET /api/debug/phase, GET /api/debug/ending every 5 seconds
+- Displays: behavior scores (curiosity/obedience/risk/suspicion), current phase, ending trajectory
+- Minimal styling: position fixed, bottom right, semi-transparent, small font
+
+In frontend/src/App.svelte, import DebugPanel and conditionally render it:
+  {#if showDebug}
+    <DebugPanel />
+  {/if}
+Add: const showDebug = new URLSearchParams(window.location.search).has('debug');
+
+Acceptance criteria:
+- npm test passes
+- In development: GET /api/debug/behavior returns behavior snapshot
+- Visiting /?debug=1 shows the debug panel in the browser
+- In production (NODE_ENV=production): GET /api/debug/behavior returns 404
+```
+
+---
+
+### Task 10 — Fix Priya's name and update Q001–Q008
+
+```
+Part A — Fix Priya's name. Make these exact changes:
+
+1. In server/src/services/EmailService.js, find this line:
+     priya: 'Priya Kapoor <p.kapoor@axiomworks.internal>',
+   Change it to:
+     priya: 'Priya Nair <p.nair@axiomworks.internal>',
+
+2. In server/src/services/ShiftReviewService.js:
+   a. Find: reviewer: 'Priya Kapoor'
+      Change to: reviewer: 'Priya Nair'
+   b. Find: from: 'Priya Kapoor <p.kapoor@axiomworks.internal>'
+      Change to: from: 'Priya Nair <p.nair@axiomworks.internal>'
+
+3. In content/tickets/T007.json: if the 'from' or 'body' field contains 'Priya Kapoor', 'p.kapoor', or 'Priya Singh', replace with 'Priya Nair' and 'p.nair@axiomworks.internal'.
+
+4. In content/docs/onboarding.json: if 'Priya Kapoor' or 'Priya Singh' appears, replace with 'Priya Nair'.
+
+Part B — Add new fields to existing quests. For each quest Q001–Q008, add these fields using the values in the table below. Do not change any existing fields. Do not reformat the JSON beyond what is needed to add the new fields.
+
+Q001: narrative_phase: "normal_work", linux_concepts: ["ssh-keygen","authorized_keys","file permissions"], failure_conditions: ["SSH keys not added","authorized_keys permissions too broad"], behavior_impact: { "correct-key": { curiosity_delta: 0, obedience_delta: 1, risk_delta: 0, suspicion_delta: 0 }, "loose-permissions": { curiosity_delta: 0, obedience_delta: 0, risk_delta: 1, suspicion_delta: 1 }, default: { curiosity_delta: 0, obedience_delta: 0, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: null, access_requirements: { minimum_access: { workstation: "basic_user" }, requires_root: false, temporary_grants_allowed: [] }
+
+Q002: narrative_phase: "normal_work", linux_concepts: ["nginx","systemctl","sshd_config"], failure_conditions: ["nginx not running","service not enabled at boot"], behavior_impact: { default: { curiosity_delta: 0, obedience_delta: 1, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: null, access_requirements: { minimum_access: { web_server: "basic_user" }, requires_root: false, temporary_grants_allowed: [] }
+
+Q003: narrative_phase: "normal_work", linux_concepts: ["logrotate","disk usage","df","du"], failure_conditions: ["disk still above threshold","logrotate not restored"], behavior_impact: { default: { curiosity_delta: 0, obedience_delta: 1, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: null, access_requirements: { minimum_access: { web_server: "sudo" }, requires_root: false, temporary_grants_allowed: [] }
+
+Q004: narrative_phase: "normal_work", linux_concepts: ["chown","file ownership","deploy scripts"], failure_conditions: ["web root ownership not fixed","deploy service still failing"], behavior_impact: { default: { curiosity_delta: 0, obedience_delta: 1, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: null, access_requirements: { minimum_access: { web_server: "sudo" }, requires_root: false, temporary_grants_allowed: [] }
+
+Q005: narrative_phase: "unease", linux_concepts: ["cron","crontab","user field","backup management"], failure_conditions: ["cron still running as root","disk not cleared","backup directory ownership not fixed"], behavior_impact: { "full-fix": { curiosity_delta: 1, obedience_delta: 1, risk_delta: 0, suspicion_delta: 0 }, "cron-fixed-only": { curiosity_delta: 0, obedience_delta: 1, risk_delta: 0, suspicion_delta: 0 }, "disk-cleared-only": { curiosity_delta: 0, obedience_delta: 0, risk_delta: 1, suspicion_delta: 1 }, default: { curiosity_delta: 0, obedience_delta: 0, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: { "id": "q005_backup_agent_history", "description": "backup-agent home directory contains a .bash_history with unusual commands that predate the current cron misconfiguration.", "discovery_method": "Player reads /home/backup-agent/.bash_history", "significance": "Dale configured this cron job. The history shows it was changed deliberately, not by accident." }, access_requirements: { minimum_access: { web_server: "sudo" }, requires_root: false, temporary_grants_allowed: [] }
+
+Q006: narrative_phase: "unease", linux_concepts: ["NTP","systemd-timesyncd","Arch Linux","pacman","package keys"], failure_conditions: ["NTP not enabled at boot","package manager still broken"], behavior_impact: { default: { curiosity_delta: 0, obedience_delta: 1, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: null, access_requirements: { minimum_access: { build_machine: "sudo" }, requires_root: false, temporary_grants_allowed: [] }
+
+Q007: narrative_phase: "suspicion", linux_concepts: ["sshd_config","AllowGroups","AllowUsers","access hardening"], failure_conditions: ["Priya still locked out","SSH restrictions removed entirely"], behavior_impact: { default: { curiosity_delta: 1, obedience_delta: 0, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: { "id": "q007_dale_ssh_key", "description": "An SSH key in hermes /root/.ssh/authorized_keys does not match any current staff. The fingerprint matches no documented key.", "discovery_method": "Player reads /root/.ssh/authorized_keys on hermes", "significance": "Dale had root SSH access to hermes that was never formally revoked." }, access_requirements: { minimum_access: { web_server: "sudo" }, requires_root: false, temporary_grants_allowed: ["sudo:web_server:sshd"] }
+
+Q008: narrative_phase: "suspicion", linux_concepts: ["apt","package pinning","apt-preferences","internal package mirror","vulcan build pipeline"], failure_conditions: ["axiomworks-app still broken","bad package not traced to build machine"], behavior_impact: { default: { curiosity_delta: 1, obedience_delta: 0, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: { "id": "q008_build_log_anomaly", "description": "vulcan's build log for 2.1.1 shows it was triggered by a manual invocation, not the automated pipeline, at 02:14.", "discovery_method": "Player reads /var/log/build-pipeline.log on vulcan and notices the timestamp and manual trigger field", "significance": "The bad build was triggered manually. Someone made the broken build, and it was not the pipeline." }, access_requirements: { minimum_access: { build_machine: "sudo", web_server: "sudo" }, requires_root: false, temporary_grants_allowed: [] }
+
+After all changes, run: node tools/content/validate-content.js
+Confirm: zero errors. Warnings about missing narrative_phase should now be gone for all 8 quests.
+```
+
+---
+
+### Task 11 — Unit tests and validation extension
+
+```
+Part A — Write unit tests for all new services.
+
+Create file: server/src/services/BehaviorTracker.test.js
+Use the existing IncidentScheduler.test.js or ShiftReviewService.test.js as the pattern for test structure.
+
+Tests to include:
+1. initialize() with no state.behavior: curiosity=50, obedience=50, risk=50, suspicion=0
+2. initialize() with existing state.behavior: values restored correctly
+3. apply({ curiosity_delta: 5 }): curiosity increases by 5
+4. apply({ risk_delta: -10 }): risk decreases by 10, floor at 0
+5. apply({ suspicion_delta: 200 }): suspicion clamps at 100
+6. apply({}): no change, no error
+7. apply(null): no change, no error (defensive)
+8. getSnapshot(): returns plain object with all four keys
+
+Create file: server/src/services/NarrativePhaseTracker.test.js
+Tests:
+1. initialize() with no state.narrative_phase: returns 'normal_work'
+2. advance() with quest having narrative_phase 'unease': phase becomes 'unease'
+3. advance() with quest having higher phase than current: phase advances
+4. advance() with quest having lower phase than current: phase does NOT change
+5. advance() with quest missing narrative_phase field: phase does NOT change
+6. getPhase(): returns current phase string
+
+Create file: server/src/services/EndingEvaluator.test.js
+Tests (each builds a mock state):
+1. exposure: curiosity=70, hiddenHooksDiscovered=['a','b'], phase='investigation' → active: 'exposure'
+2. corporate_loop: obedience=70, curiosity=35, trust=70 → active: 'corporate_loop'
+3. burnout: curiosity=30, obedience=35 → active: 'burnout'
+4. chaos: risk=70, trust=35 → active: 'chaos'
+5. no conditions: active: 'undetermined'
+6. exposure wins over chaos when both met: active: 'exposure'
+
+Create file: server/src/services/HiddenHookTracker.test.js
+Tests:
+1. initialize() with no state: getDiscovered() returns []
+2. discover('h1'): getDiscovered() returns ['h1']
+3. discover('h1') twice: getDiscovered() returns ['h1'] (idempotent)
+4. isDiscovered('h1'): true after discovery
+5. isDiscovered('h2'): false before discovery
+
+Part B — Run validation.
+After all changes: run `npm test` from the server directory. All tests must pass.
+Run `node tools/content/validate-content.js`. Zero errors.
+
+Part C — Manual verification checklist.
+Confirm each item by inspection or running the game:
+[ ] Fresh save: GET /api/state returns behavior: {curiosity:50,obedience:50,risk:50,suspicion:0}, narrativePhase:'normal_work', accessLevel:'basic_user'
+[ ] Complete Q001 clean branch: behavior.obedience increments, phase stays normal_work
+[ ] Complete Q005: phase advances to 'unease', hidden_hook for q005_backup_agent_history visible in /api/debug/hidden-hooks
+[ ] Complete Q007: phase advances to 'suspicion', q007_dale_ssh_key hook discoverable on hermes
+[ ] ShiftReviewService sends from Priya Nair <p.nair@axiomworks.internal>
+[ ] GET /api/debug/ending with forced state returns correct ending label
+[ ] /?debug=1 shows debug panel in browser
+[ ] node tools/content/validate-content.js: zero errors
+```
+
+---
+
+*End of implementation plan.*