chore: bootstrap lean sysadmin-chronicles repo

Import the runnable game code, content, docs, scripts, and repo guidance while leaving local agent state, dependency installs, build output, and backup copies out of the published tree.
This commit is contained in:
2026-05-02 11:49:07 -04:00
commit 0265afa054
252 changed files with 37574 additions and 0 deletions
+702
View File
@@ -0,0 +1,702 @@
# SYSADMIN CHRONICLES — ARCHITECTURE DOCUMENT
> Version 5.0 | Status: Active development
>
> Changelog:
> v5.0 — GDScript/Godot codebase removed. Node.js + Svelte is the only codebase.
> v4.0 — Full architecture pivot to Node.js game server + Svelte web HUD.
> v3.x — Save system, world flags, trust, incidents, pressure system (GDScript era).
> v2.0 — Native Godot 4 + libvirt design (superseded).
> v1.0 — Browser/v86 prototype (superseded).
---
## 1. PROJECT OVERVIEW
**Sysadmin Chronicles** is a native Linux-only game where the player works as a
junior sysadmin at Axiom Works, handling tickets inside **real Linux virtual
machines** managed by **QEMU/KVM via libvirt**.
The runtime stack (as of v4.0):
- **Game server** — Node.js / Express + WebSocket (`server/`). Owns all game
logic: quest state, trust, validation, VM lifecycle, incidents, save state.
- **Web HUD** — Svelte single-page app (`frontend/`). Tickets, mail, Sage, docs,
trust bar. Served from the game server at `http://192.168.100.1:3000`.
- **Workstation VM** — XFCE desktop (Debian 12, sc-workstation). Player's desk.
Chromium auto-opens the HUD. Tilix provides a real terminal for SSH to target VMs.
- **Target VMs** — Headless Debian (hermes) and Arch (vulcan). Quest objectives
live here. Player investigates and fixes via SSH from the workstation terminal.
The player experience:
- Sits at the workstation VM (via SPICE/remote-viewer fullscreen on the host)
- Reads tickets and mail in the Chromium HUD
- Opens Tilix, SSHes to hermes or vulcan, fixes real problems
- Clicks "Mark Complete" in the HUD — game server SSHes in and validates VM state
- World reacts, trust shifts, new mail arrives via WebSocket push
No simulated terminal. No fake SSH sessions.
---
## 2. CORE DESIGN PRINCIPLES
- Realism over simulation
- Native Linux execution only
- CLI-first development and asset wiring
- Minimal, stable scenes; behavior lives in scripts
- Data-driven content for quests, tickets, incidents, and dialogue
- State-based validation only; never command-sequence checking
- Multiple valid solutions where possible
- Pressure comes from evolving systems, not arbitrary timers
- Progression unlocks access, tools, and scope, not RPG stats
- Deterministic systems so content is testable and agent-friendly
- The dirty VM state is the game — preserve it, do not erase it
---
## 3. HIGH-LEVEL ARCHITECTURE
```
HOST MACHINE
├── game-server/ Node.js/Express + WebSocket (server/src/)
│ ├── ContentLoader loads content/ JSON at startup
│ ├── QuestEngine quest state machine
│ ├── TicketService ticket state, mark-complete handler
│ ├── ValidationEngine SSH into VMs, evaluates rules
│ ├── VMManager virsh start/stop/snapshot wrappers
│ ├── TrustSystem score, unlock evaluation, revocation
│ ├── ProgressionSystem unlocked docs, VMs, access
│ ├── EmailService inbox, follow-up emails, reply options
│ ├── SageService rule-based knowledge base / dialogue
│ ├── ShiftTimer shift clock, pressure tick schedule
│ ├── IncidentScheduler incident injection
│ └── SaveState ~/.local/share/sysadmin-chronicles/save.json
├── frontend/ Svelte web HUD (frontend/src/)
│ ├── TicketsPanel ticket list, detail, "Mark Complete" button
│ ├── MailPanel inbox, message view, reply buttons
│ ├── DocsPanel trust-gated internal docs
│ ├── SagePanel chat / knowledge base search
│ └── HeaderBar trust indicator, shift timer, unread count
└── content/ JSON content — quests, tickets, dialogue, etc.
NETWORK: sc-internal (libvirt bridge 192.168.100.0/24)
192.168.100.1 host (game server port 3000)
VMs on sc-internal
├── sc-workstation (ares) Debian 12 XFCE — player's desk
│ ├── Chromium → http://192.168.100.1:3000 (HUD, always open)
│ └── Tilix → SSH to hermes/vulcan (real terminal)
├── sc-web-server (hermes) headless Debian (Q002Q005, Q007)
└── sc-build-machine (vulcan) headless Arch (Q006, Q008)
PLAYER FLOW:
Host starts game server → boots sc-workstation via SPICE
Player sees XFCE desktop → Chromium with HUD auto-open
Reads ticket → opens Tilix → SSH hermes → fixes problem
Clicks "Mark Complete" → server SSHes hermes → validates
Trust updates → WebSocket pushes to browser → new mail arrives
```
---
## 4. RUNTIME MODEL
### 4.1 Game Server — Node.js
The game server (`server/src/index.js`) is a Node.js/Express application:
- Serves `frontend/dist/` as static files at `/`
- WebSocket server on the same port (real-time event push to HUD)
- On startup: loads all content JSON, hydrates services from save file,
ensures workstation VM is live via VMManager
The server is responsible for:
- All game logic (quest state, trust, progression, incidents)
- VM lifecycle management (virsh via child_process)
- Validation — SSH into target VMs and evaluate rules
- Save/load (single JSON file at `~/.local/share/sysadmin-chronicles/save.json`)
- WebSocket broadcast of trust changes, new mail, shift ticks, incident alerts
### 4.2 Frontend — Svelte
The web HUD (`frontend/src/`) is a Svelte single-page app:
- Built with Vite; output lands in `frontend/dist/` and is served by the game server
- All data fetched from the game server API; no local state beyond UI
- WebSocket client for real-time updates
- Does not run validation — only displays results
### 4.3 Target Platform
- Host OS: Linux
- Supported deployment model: start game server on host, view workstation via SPICE
- Required host: KVM, libvirt, virsh, Node.js 18+, virt-viewer
- Required install model: one-time host setup with clean uninstall path
No Windows, macOS, or browser target is planned for the host. The HUD is a web
app served locally — it is never exposed to the internet.
---
## 5. VIRTUAL MACHINE SYSTEM
### 5.1 Required Stack
- `qemu-system-*`
- `KVM`
- `libvirtd`
- `virsh`
- libvirt virtual networks
- qcow2-backed VM images
Runtime policy:
- The shipped game should not require broad `sudo` usage during normal play
- One-time host setup may require admin approval
- Ongoing gameplay should run as a regular user against a prepared VM runtime
### 5.2 Core Behavior
The game controls VMs through libvirt, not by emulating them internally.
Responsibilities:
- Ensure required domains and networks exist
- Start the active VM
- Stop or suspend inactive VMs
- Revert to known snapshots for resets
- Query runtime state for evaluation
- Attach the player to the appropriate VM workflow
The workstation and at least one target VM must be able to run at the same
time. This is required for real SSH-based play and for background incidents to
continue evolving while the player works elsewhere.
Operational guidance:
- `workstation` stays live during normal play
- At least one target VM stays live with it
- Later phases may keep all major quest VMs active simultaneously
- Resource budgets should be documented and enforced conservatively
Lab finding:
- Small headless target VMs were inexpensive on the test host
- The workstation became materially heavier once a real graphical session and
browser were added
- Budget the workstation separately from server-style quest VMs
### 5.3 Initial VM Roles
| ID | Role | Distro | Hostname | Purpose |
|----|------|--------|----------|---------|
| `workstation` | Player desktop | Debian 12 | `ares` | XFCE + Chromium HUD + Tilix terminal |
| `web_server` | Service host | Debian 12 | `hermes` | Web/service quests (Q002Q005, Q007) |
| `build_machine` | Build box | Arch | `vulcan` | Package/build/update quests (Q006, Q008) |
### 5.3.1 Workstation Profile
The workstation is a full XFCE desktop (Debian 12, 7681536 MB RAM):
- **Chromium** — opens `http://192.168.100.1:3000` on login (game HUD)
- **Tilix** — split-pane terminal, set as default; player SSHes to hermes/vulcan from here
- **Full sysadmin CLI toolkit** pre-installed (vim, htop, tmux, curl, nmap, tcpdump, etc.)
- SPICE display with QXL video — dynamic resolution via vdagent; fullscreen via `remote-viewer`
- `always_live: true` — stays running between shifts; suspended on game quit, resumed on next launch
Player never needs to interact with the workstation VM's internal file system for
game objectives — all quest work happens on the target VMs via SSH.
### 5.3.2 Why XFCE + Chromium (not terminal-only)
Earlier iterations used a terminal-only workstation. The game was redesigned
because a terminal-only approach would require building a fake terminal and fake SSH.
The XFCE + real browser approach is simpler, more realistic, and requires no
terminal simulation at all:
- Player uses a real Tilix terminal — no simulation
- Player SSHes with real SSH — no protocol emulation
- The HUD is a real web app — no custom UI framework needed for game chrome
- Downside: workstation VM costs ~480768 MB RAM; budget accordingly
### 5.4 Snapshot Strategy
Snapshots are the reset primitive and the save primitive.
Named snapshot tiers per VM:
| Name | Purpose |
|------|---------|
| `baseline.clean` | Authored starting state for a fresh quest arc |
| `baseline.recovery` | Fallback if live state is unrecoverable |
| `checkpoint.shift-{N}` | Auto-saved at start of each in-game shift |
Rules:
- Snapshot names are deterministic
- Quest scripts may declare required baseline snapshots
- Validation never depends on snapshot history; only current observed state
- The game retains a maximum of 5 shift checkpoints per VM; older ones are pruned
- `baseline.clean` and `baseline.recovery` are never pruned by the game
### 5.5 Networking Model
Networking is host-controlled through libvirt.
Supported modes:
- `quest`: constrained, deterministic virtual networks and fixtures
- `sandbox`: broader connectivity for experimentation
Examples:
- Internal-only network between workstation and target VM
- Broken DNS as part of a quest
- Deliberately degraded service reachability
- Optional outbound package mirror access for selected scenarios
### 5.6 VM Provisioning Hooks
Quest-specific VM state — broken configs, missing files, log histories — is
authored into the VM baseline before the snapshot is taken. This is done via
idempotent provisioning scripts:
```
tools/vm/quest-prep/Q0XX-prep.sh
```
These scripts run against the target VM before the quest's `baseline.clean`
snapshot is taken. They are never run at quest activation time. See
QUEST_AUTHORING.md for the full provisioning workflow.
---
## 6. OBSERVATION AND VALIDATION
### 6.1 Validation Philosophy
Quest completion is based on **system state**, not on how the player got there.
Allowed evidence includes:
- Files and directory contents
- Ownership and permissions
- Service state
- Process state
- Open ports
- Package state
- Mount state
- Disk utilization
- System configuration values
Disallowed as primary success conditions:
- Specific commands typed
- Specific files opened
- UI click history
### 6.2 Observation Sources
Primary sources:
- `virsh domstate`, `domifaddr`, and domain metadata
- Host-driven inspection tooling such as libguestfs where practical
- SSH-based read-only checks initiated by the host when needed
- Quest-specific host probe scripts for higher-level state summaries
Authoritative rule:
- Quest validation must use host-authoritative checks only
- In-guest helpers may improve responsiveness, but cannot decide success
In-guest helpers should use neutral names (examples: `atlas-index`, `yardd`,
`ops-telemetry-cache`) and must not be trusted as a security boundary.
Operational note:
- Routine package operations inside guests may emit maintenance or virtualization
notices that break immersion
- Base images should suppress or tune guest maintenance messaging where safe
for the authored environment
- Validation and incident design should not rely on noisy package-manager side
effects being visible to the player
### 6.3 Validation Rule Model
Core rule families:
- `file_exists` / `file_contains` / `file_mode` / `file_owner`
- `directory_exists`
- `service_state` / `service_enabled`
- `process_running` / `process_user`
- `port_listening`
- `package_installed`
- `mount_present`
- `disk_usage_below` / `disk_usage_above`
- `command_assert` — fallback only, must verify state not behavior
- `and` / `or` / `not`
### 6.4 Trust Boundary
The player may gain root access on some machines. The guest is not trusted. The
host validation layer is trusted. Anti-cheat is achieved through external
validation, not secrecy.
---
## 7. GAMEPLAY SYSTEMS
### 7.1 Core Loop
1. Ticket arrives with incomplete context
2. Player evaluates urgency against other active problems
3. Player enters or connects into the relevant VM
4. Player investigates using real Linux tools
5. Player applies a fix
6. Game validates resulting state
7. World reacts
8. Trust shifts
9. Future conditions reflect earlier choices
### 7.2 System Pressure
Pressure is systemic, not a countdown bar. Examples:
- Disk usage keeps climbing
- A log fills with worsening symptoms
- A degraded service starts affecting another team
- A quick fix suppresses one symptom while creating later instability
Pressure is authored as state transitions and event chains via incident files.
### 7.3 Trust / Reputation
Trust measures how much the organization relies on the player.
Trust affects:
- sudo scope
- accessible machines
- diagnostic tooling
- ticket sensitivity
- documentation visibility
**Trust increases** when the player resolves problems cleanly, finds root causes,
and avoids collateral damage.
**Trust decreases** when the player breaks unrelated systems, applies fragile
fixes, ignores urgent incidents, or resolves symptoms but not causes.
**Trust revocation**: if trust falls below a declared threshold in the trust
unlock table, specific access strings are revoked. A subsequent trust increase
does not automatically restore revoked access — the player must re-earn the
unlock tier. Revocation rules must be explicitly declared per unlock tier.
### 7.4 Multiple Valid Solutions
Quests support realistic alternatives where possible:
- quick workaround
- operationally acceptable fix
- proper long-term fix
Branch resolution rule:
- multiple branches may match the same final state
- each branch must declare a numeric `priority`
- the highest matching priority wins
- ties are a content error and fail validation during authoring checks
### 7.5 Dynamic Events
Dynamic events inject prioritization pressure and are authored in incident files.
Events are selected from authored pools and activated by progression, trust,
current system state, and world flags.
Each incident declares a `blast_radius_quests` list so the incident scheduler
can avoid activating an incident that would corrupt active quest evidence or
simultaneously interfere with an in-progress objective.
### 7.6 Investigation Quality
Clues must be legible and grounded. Every quest declares a `clue_fingerprint`
documenting what evidence exists in the VM baseline. Content validation checks
that the fingerprint is plausible. The player should feel rewarded for competent
debugging rather than guessing.
### 7.7 Progression
Progression unlocks:
- broader sudo access
- new servers
- more dangerous responsibilities
- better internal docs
- helper scripts and diagnostics
This is institutional progression, not character stats.
### 7.8 Mentor Thread
Marcus is the primary mentor character. His dialogue runs across the full game
as a `series_id: marcus-main` thread. Each dialogue file that belongs to an
ongoing character relationship declares `series_id` and `series_position`.
The dialogue system tracks series state so Marcus remembers what happened in
earlier quests and can reference it in later ones. This is the primary vehicle
for institutional memory and character continuity.
### 7.9 Tone and Humor
The tone is dry, realistic, and slightly dysfunctional. Examples:
- contradictory runbooks
- tickets that misidentify the problem
- passive-aggressive internal notes
- perfect urgency attached to trivial formatting requests
Humor must support immersion, not break it.
---
## 8. COMMAND AND ACCESS MODEL
Access is controlled realistically through:
- user accounts and group membership
- sudoers configuration
- reachable hosts
- available packages and tooling
If a player cannot run `systemctl`, the reason is that the VM account lacks the
required privileges, not that the game disabled the verb.
---
## 9. PRESENTATION LAYER
The player's view is the workstation VM desktop, viewed fullscreen via SPICE:
```bash
scripts/start-game.sh
# → starts game server
# → virsh start sc-workstation (if not already running)
# → remote-viewer --full-screen spice://127.0.0.1:<port>
```
The player sees an XFCE desktop with Chromium pre-opened to the HUD.
### 9.1 VM Display
- **Protocol**: SPICE with QXL video driver
- **Client**: `remote-viewer` (from `virt-viewer` package) in fullscreen mode
- **Resolution**: dynamic — guest vdagent resizes to match host display
- **Cursor release**: `Ctrl+Alt`; fullscreen toggle: `F11`
- **Clipboard sharing**: via spice-vdagent in the guest
No VNC, no custom viewer widget. The host runs `remote-viewer` and the player
works inside the workstation VM.
### 9.2 HUD (Svelte Web App)
The game HUD is a Svelte single-page app served at `http://192.168.100.1:3000`:
- **TicketsPanel** — ticket list, detail view, "Mark Complete" button
- **MailPanel** — inbox, message body, reply buttons (where applicable)
- **DocsPanel** — trust-gated internal docs, rendered from content/docs/
- **SagePanel** — chat interface to SageService knowledge base
- **HeaderBar** — trust indicator (no number, behavior only), shift timer, unread badge
The HUD is a company intranet portal in look and feel — dark, monospace, minimal.
### 9.3 One-Time Setup and Uninstall
Host-side setup is unavoidable (KVM, libvirt, VM images). It must be simple.
Principles:
- one-time setup only (`tools/setup/first-run-setup.sh`)
- plain-language explanation of what will be installed
- managed resources use the `sc-` prefix (never touch other libvirt domains)
- full uninstall removes all game-owned domains, networks, storage, helper files
- normal gameplay does not require broad `sudo`
---
## 10. DATA MODEL
Authoring formats:
- JSON for quests, tickets, incidents, dialogue, documentation metadata
- Shell helper scripts where CLI integration is necessary
Top-level content domains:
| Domain | Purpose |
|--------|---------|
| `quests/` | Objective chains and validation rules |
| `tickets/` | Player-facing problem statements |
| `incidents/` | Dynamic system pressure events |
| `dialogue/` | Workplace messages, hints, follow-ups |
| `docs/` | Internal documentation metadata/content |
| `progression/` | Trust thresholds, unlocks, access tiers |
| `vm_profiles/` | Domain names, snapshots, networks, probe config |
| `helpers/` | Non-obvious guest helper naming/config data |
| `world_flags/` | Central registry of all world state flags |
Each authored scenario must declare:
- `required_vms` — all VMs the quest touches
- `baseline_snapshot` — starting snapshot for this quest
- `clue_fingerprint` — evidence declared in the VM baseline
- validation rules and branch priorities
- escalation behavior
- trust impact
- `blast_radius` — incident IDs the quest may interact with
- follow-on world effects
---
## 11. SAVE MODEL
### 11.1 Dirty State Model
The game uses a **dirty state model**. VM disk state is preserved across
sessions as-is. The game does not revert to a clean baseline on load — it
resumes from whatever state the VMs are currently in.
This is intentional. The player's history of changes is part of the game. A
machine they fixed stays fixed. A machine they damaged stays damaged until they
repair it or request reimage.
Two persistence layers:
**Game State Layer** — saved as JSON:
- Trust score and history
- Unlocked access, sudo scopes, docs, tools
- Active/completed quest and ticket state
- World flags (current values and change history)
- Incident scheduler state
- In-world clock and shift counter
**VM State Layer** — saved as libvirt snapshot references:
- Per-VM reference to current snapshot tier or live disk
- Per-VM managed recovery checkpoint list
- Reimage history per VM
### 11.2 Shift Checkpoints
At the start of each in-game shift:
1. Game state JSON is saved
2. A named snapshot is created per active VM: `checkpoint.shift-{N}`
3. The checkpoint reference is recorded in the save file
4. Shift checkpoints beyond the retention limit (default: 5) are pruned
Shift checkpoint rollback is an explicit player action ("start this shift
over") with a confirmation prompt. It does not undo trust changes or dialogue
already delivered.
### 11.3 Load-Time Reconciliation
On load, the observation service validates current VM state against saved world
flags. Minor drift is handled silently. Major drift — missing snapshots,
unbootable VMs — triggers the recovery flow.
If a referenced snapshot is missing:
- If `baseline.recovery` exists, offer resume from recovery
- If `baseline.recovery` is also gone, the VM is treated as unrecoverable
### 11.4 Recovery / Reimage Flow
When a VM is unrecoverable, the player can report it for reimage through an
in-world mechanic:
1. Player submits a reimage request (ticket to management)
2. In-world delay is imposed (one in-game shift)
3. Machine is restored from `baseline.recovery` or `baseline.clean`
4. Trust penalty is applied based on severity
5. In-progress quests on that VM are reset
6. Evidence from before the reimage is gone — acknowledged in-world
This is the designed escape valve. It has visible consequences but allows
forward progress.
### 11.5 Host Storage Management
qcow2 images with many snapshots can balloon. The game enforces:
- Maximum of 5 shift checkpoints per VM (configurable in vm_profile)
- Authored baseline and recovery snapshots are never pruned by the game
- `resource_budget` in vm_profile declares expected disk footprint
### 11.6 Developer Reset
Not available in the shipped game. CLI only:
```bash
bash tools/vm/snapshot-all.sh --revert-to baseline.clean
```
Completely resets all VMs to authored baseline. Used during content authoring
and automated test runs.
---
## 12. MODULE BREAKDOWN
### Server (`server/src/`)
| Module | Responsibility |
|--------|----------------|
| `index.js` | Express + WebSocket entry point; service wiring; static file serving |
| `ContentLoader` | Loads all content/ JSON at startup; never writes |
| `QuestEngine` | Quest state machine (pending → active → resolved) |
| `TicketService` | Ticket state, mark-complete handler, branch resolution |
| `ValidationEngine` | SSH into VMs, evaluates all rule types against real state |
| `VMManager` | virsh start/stop/snapshot/getIP wrappers |
| `TrustSystem` | Score tracking, unlock evaluation, revocation |
| `ProgressionSystem` | Unlocked docs, VMs, access strings |
| `EmailService` | Inbox, follow-up emails, reply options, WebSocket push |
| `SageService` | Rule-based dialogue / knowledge base |
| `ShiftTimer` | Shift clock, broadcasts shift:tick via WebSocket |
| `IncidentScheduler` | Pressure tick loop, incident injection |
| `ShiftReviewService` | End-of-shift performance review email generation |
| `CertificationService` | Awards internal certs after quest chain completion |
| `SaveState` | Read/write `~/.local/share/sysadmin-chronicles/save.json` |
| `lib/ssh.js` | Promisified SSH command execution (node-ssh) |
| `lib/virsh.js` | virsh command wrappers |
| `lib/eventBus.js` | Internal Node.js EventEmitter for service coordination |
### Frontend (`frontend/src/`)
| Component | Responsibility |
|-----------|----------------|
| `App.svelte` | Root component; WebSocket connection; panel routing |
| `TicketsPanel` | Ticket list, detail, mark-complete flow |
| `MailPanel` | Inbox, message body, reply buttons |
| `DocsPanel` | Trust-gated doc list and content viewer |
| `SagePanel` | Chat interface, follow-up prompts |
| `VmsPanel` | Live VM status indicators |
| `HeaderBar` | Trust display, shift timer, mail unread count |
| `lib/api.js` | Fetch wrapper for all REST API calls |
---
## 13. SECURITY AND SAFETY
Requirements:
- Scope libvirt resources to dedicated game domains/networks/storage pools
- Never operate on arbitrary host VMs by default
- Use explicit naming/prefixing for all game-managed resources (`sc-` prefix)
- Separate quest-mode constrained networks from broader sandbox networks
- Prefer least-privilege host integration
- Provide a dry-run and diagnostic mode for development scripts
The game manages only the resources it created or was explicitly pointed at
during setup.
---
## 14. TECHNOLOGY DECISIONS
| Technology | Role | Reason |
|-----------|------|--------|
| Node.js / Express | Game server | Async I/O, native SSH/virsh via child_process, easy JSON |
| Svelte / Vite | Web HUD | Lightweight, no virtual DOM overhead, fast build |
| WebSocket (`ws`) | Real-time push | Trust changes, mail, incidents without polling |
| QEMU/KVM | Virtualization backend | Real Linux environments |
| libvirt / virsh | VM lifecycle control | Standard Linux automation surface |
| SPICE + QXL | Workstation display | Dynamic resolution, clipboard sharing, fullscreen |
| `remote-viewer` | Host-side SPICE client | Ships with virt-viewer; fullscreen with F11 |
| JSON | Content authoring | Data-driven, easy to diff, unchanged from prior design |
| node-ssh | SSH execution in validation | Clean Promise API; BatchMode, key-based auth |
Not in scope: v86, WebAssembly, browser-only runtime, service-worker networking.
---
## 15. DEVELOPMENT PRIORITIES
1. Native architecture consistency
2. VM control integration
3. Observation and validation
4. Core gameplay loop
5. Pressure, trust, and dynamic event systems
6. Presentation polish
If a design choice improves presentation but weakens VM realism or maintainable
automation, reject it.
+459
View File
@@ -0,0 +1,459 @@
# Characters — Sysadmin Chronicles
Story design reference. All characters, bios, relationships, and open story hooks.
For company/world context see `COMPANY_LORE.md`. This file focuses on the people.
---
## Active Characters
These characters have an established in-game voice and presence. Any new quest work
should treat their characterization here as canonical.
---
### The Player
**Role:** New junior sysadmin hire, day one
**Identity:** Unnamed. Player-selected portrait (5 options).
Hired to replace Dale. Nobody will explain what Dale did. Badge number is still
pending — temp credentials were handled by someone in Finance on their first day.
The player is a competent professional, not a bumbling intern. They may not know
every answer but they know how to look.
The player has no spoken lines. Their character is expressed entirely through the
choices they make when fixing things — whether they understand root causes or just
clear symptoms, whether they leave systems better or just less broken.
---
### Marcus Webb
**Role:** Senior Systems Administrator
**Email:** `m.webb@axiomworks.internal`
**Reports to:** Dave Kowalski (Director of IT)
Six years at Axiom Works. Hired by Kowalski. Knows where everything is, why it's
there, and which parts were a mistake. Communicates in short, precise messages.
Does not explain things twice. Trusts competence over credentials — he will give
the player more rope as they demonstrate they know what to do with it. If they
don't, the rope gets shorter.
He was the one who onboarded the player. He assigned their first ticket. He will
assign most of the tickets that follow. His messages range from brief task
assignments to late-night observations about something that's been on his mind —
the latter usually mean something is about to become a problem.
He knows what Dale did. He has decided not to discuss it.
**Personality:** Dry. Technically precise. Does not perform enthusiasm. Occasionally
wry but never jokey. Respects players who fix root causes. Mildly annoyed by
players who fix symptoms and call it done.
**Relationships:**
- Kowalski: reports to him; respectful but not deferential
- Sarah: professional; takes her tickets seriously, occasionally says quiet things when she's wrong
- Priya: mutual professional respect; they operate in the same zone of "things that matter when they go wrong"
- Phil Ruiz (Sales VP): warm; Phil owes Marcus for saving a demo once and Marcus has never mentioned it
---
### Sarah Chen
**Role:** Product Manager, AxiomFlow
**Email:** `s.chen@axiomworks.internal`
Owns the AxiomFlow product roadmap. Coordinates between sales, engineering, and
customers. Emails Monday mornings. Cares intensely about the demo and staging
environments because those are the product she can actually see and touch. Not wrong
about their importance.
She files tickets when things break on the product-facing side. Her descriptions of
problems are accurate about symptoms and often wrong about causes — she will
confidently diagnose a permissions issue as a script bug, or a package problem as a
config error. She is not incompetent; she just doesn't have the full picture. When
the player fixes the underlying cause rather than the surface symptom, she notices.
She has a sharp edge when things get worse after someone touches them. She will say
so, clearly, without being melodramatic about it.
**Personality:** Direct. Metric-oriented. Not patient with vague timelines or "we're
looking into it." Appreciates being told what the actual problem was, not just that
it's fixed.
**Relationships:**
- Marcus: professional; trusts that her tickets will be handled, doesn't ask for much
- Player: initially impersonal (they're new); warms or cools based on outcomes
- Nikhil Sharma: upstream dependency — his build pipeline affects her deployments
---
### Priya Nair
**Role:** Head of Security & Compliance
**Email:** `p.nair@axiomworks.internal`
**Direct report:** James Osei (Security Analyst)
Leads all security reviews, access audits, and compliance programmes. Has a standing
Thursday meeting with David Park (CTO) that has existed since 2017. Was brought in
after an incident nobody discusses in public. Has been building the security function
from something informal into something that can survive a SOC 2 audit.
She frames everything in terms of what happens when things go wrong, not whether they
will. She assumes breach. She assumes misconfiguration. She is often right. She is
not someone who appreciates hearing about a production change after it has already
happened.
She will tell the player when a fix is correct and why. She will also tell them when
a fix works but leaves the environment in a worse position than before. She is not
punitive about this — she just states it.
She does shift reviews at end-of-shift and grades the player's overall performance.
Her criteria: did the work move forward, did the environment stay stable, did the
player create extra problems.
**Personality:** Precise. Consequence-focused. Calm in tone even when the content
is not calm. Economical with words. Does not use exclamation marks.
**Relationships:**
- Player: evaluative; her trust is earned by demonstrating that security is a
consideration, not an afterthought
- Marcus: peer respect; they operate in different domains with overlapping concerns
- Dave Kowalski: reports indirectly up through him for infrastructure decisions
- David Park: standing Thursday meeting; she has the CTO's ear
> **Name note for developers:** The in-game email service and some ticket files
> previously used "Priya Kapoor" and the onboarding doc used "Priya Singh."
> These are all the same character. **Priya Nair** is the canonical name.
> Email should be `p.nair@axiomworks.internal`. Update references in
> `server/src/services/EmailService.js`, `content/tickets/T007.json`, and
> `content/docs/onboarding.json`.
---
### Dave Okonkwo
**Role:** Internal employee, non-technical
**Email:** `d.okonkwo@axiomworks.internal`
A regular Axiom Works employee who notices when things aren't working and files
tickets about it. He doesn't know enough to diagnose the problem — he reports
symptoms accurately and assumes the wrong cause. His reports are useful precisely
because they represent what a non-technical user actually experiences.
He is not on the company website (280 employees, most of them aren't). He's
somewhere in operations or general staff. He's not in Finance, not in IT.
> **Open decision:** Dave Okonkwo is currently the only employee-level character who
> submits tickets. The company website has Dave Kowalski as Director of IT Operations
> (Marcus's boss), which is a completely different person. This is not a naming
> inconsistency — they're two different people. However: if the story wants Kowalski
> to become an active character who also files tickets or escalates issues, that's a
> separate thread. Okonkwo and Kowalski coexist.
---
## Named Background Characters
On the company website. No current in-game presence. Available for story use —
they can send emails, appear on CC lines, be referenced in dialogue, or become
active characters in new quests.
Listed in rough order of story relevance to the IT/sysadmin context.
---
### Dave Kowalski — Director of IT Operations
Marcus's manager. The player's skip-level. Background is network engineering —
has Cisco certifications he will not volunteer unless provoked. Oversees systems
(Marcus's domain), networking (Tom Malaney), and IT support. Has been at Axiom
Works since 2015. Describes the infrastructure as "mature." Sends weekly status
emails in bullet points that never quite answer the question. When things go wrong
he schedules a meeting to "talk through the situation," which everyone has learned
is worse than a direct message.
Has said "we should really document that" more times than he can count. Has
documented very little personally. Maintains a mysterious Tuesday 23pm calendar
block.
Story use: source of policy pressure, indirect escalation, the person who asks
questions that reveal Marcus hasn't told the player everything.
---
### Nikhil Sharma — Platform Engineer
Owns the internal build and release pipeline, the CI infrastructure, and the
parts of deployment that nobody else wants to think about. Strong opinions about
reproducible builds. Sends Slack messages at 6am. Occasionally at 11pm.
He is the engineer most directly connected to what happens on vulcan — if a build
is broken, it's probably something Nikhil built or maintains. He has never met the
player. He almost certainly doesn't know the player exists.
Story use: the author of broken packages the player has to debug; a character who
can explain (or fail to explain) what went wrong upstream; an escalation path when
a build problem is genuinely his fault.
---
### Tanya Okafor — Head of Customer Success
Manages post-sale relationships for all AxiomFlow customers and the twelve legacy
AxiomSync accounts that haven't migrated. Uses the word "partnership" a lot.
Usually the first person to know when something is wrong in production, because a
customer has already called her before IT knows there's a problem. Her call log
is an early warning system. She is not hostile to IT but she has learned that
"we're looking into it" is not an answer she can give a customer.
Story use: pressure vector from the customer direction; source of urgency that
doesn't come from Marcus or the ticket queue; demonstrates real-world stakes when
things go down.
---
### Phil Ruiz — VP of Sales
Has been promising features to prospects since 2016. Maintains a warm relationship
with the infrastructure team because Marcus once fixed the staging environment with
twenty minutes to spare before a major demo — Phil has never forgotten this. Travels
frequently. Expense reports submitted promptly, which Marcus has noted approvingly.
Story use: indirect beneficiary when demos work; pressure source when a sales demo
is scheduled and something is broken; the person who will tell the CTO what IT did
right in a room the player will never be in.
---
### Yusuf Halabi — Engineering Manager
Reports to David Park (CTO). Manages the core AxiomFlow platform team. Runs the
Thursday architecture review. Has opinions about test coverage. Leaves pull request
comments that are technically correct and diplomatically suboptimal.
Story use: engineering-side escalation; source of tickets about internal tooling;
the person who will ask why a config change broke a downstream process.
---
### Derek Ashford — Financial Controller
Does not appear at team meetings. Does appear on CC lines of every email that
mentions cloud costs, hardware procurement, or infrastructure budget. Always
replies-all. His manager is Rachel Brandt (CFO).
Story use: background texture on procurement requests; the voice that makes any
infrastructure spending feel like a negotiation.
> **Note on "Dave from Finance":** Marcus's day-one message references "Dave from
> Finance" as the person holding the player's temp credentials. This is almost
> certainly Derek Ashford — Marcus using his first name informally, or a
> continuity error. Derek Ashford is the only Finance character plausibly holding
> IT credentials. His first name is Derek, not Dave — either the message should
> be corrected, or "Dave from Finance" is a third unnamed Finance employee.
---
### Rachel Huang — Systems Administrator
Marcus's peer on the IT team. Handles provisioning, patch cycles, and the ongoing
negotiation with Finance over cloud consolidation. Came from a managed services
background. Has strong opinions about monitoring dashboards, most of which are
correct.
Story use: the person who set something up that the player now has to maintain;
a colleague who can provide context Marcus won't; someone whose provisioning
decisions the player will encounter as infrastructure.
---
### Tom Malaney — Network Engineer
Responsible for network infrastructure across the office and hosted environments.
On-call for more holiday weekends than he would like. Thorough in documentation
when he finds time for it.
Story use: DNS, firewall, or routing problems that are not the player's fault
but become the player's problem; someone who can be reached but is slow to
respond.
---
### James Osei — Security Analyst
Priya's direct report. Handles vulnerability assessments, access reviews, and
quarterly compliance reporting. Methodical. Has a spreadsheet for everything,
which is not a criticism.
Story use: the person who runs the actual audit that Priya will summarize to the
player; a source of detailed (sometimes overwhelming) security findings.
---
### Ellen Marsh — CEO & Co-Founder
Built the first version of AxiomFlow after a decade in operations. No CS background.
Attends all-hands twice a year. Does not use Slack. Has final say on pricing and
major customer commitments.
Story use: the distant authority whose priorities shape everything; never interacts
with the player directly, but her decisions land as constraints.
---
### David Park — CTO & Co-Founder
Wrote the original rules engine in 2011. Now manages engineering managers. Still has
opinions about the data model. Has a standing Thursday meeting with Priya that hasn't
moved since 2017.
Story use: architectural decisions from above; the person Priya reports significant
security findings to.
---
### Karen Volkov — COO
Joined 2014. Responsible for the fact that the company has documented processes for
anything at all. Has opinions about infrastructure costs that surface in IT's world
via Finance. Prefers decisions with clear owners and deadlines.
---
### Rachel Brandt — CFO
Joined 2016. Approves all capital expenditure over $5,000. Working to consolidate
cloud spend. Does not enjoy surprises in the infrastructure budget. Derek Ashford
reports to her.
---
### Mei Lin — Senior Software Engineer
Has maintained AxiomSync's integration layer since 2018. Knows more about it than
anyone would prefer, including herself. Currently leading the migration tooling
project for the remaining legacy accounts.
---
### Cora Reyes — Software Engineer
Works on the AxiomDash reporting pipeline. Has submitted more internal RFCs than
anyone else on the team in the past year. Moving toward senior.
---
### Ben Portillo — Product Manager, AxiomDash
Leads product development for the analytics add-on. Works closely with large
accounts to understand what they actually want from dashboards (usually different
from what they asked for).
---
### Annika Gosse — UX Designer
Responsible for AxiomFlow's interface. Has been advocating for a redesign of the
workflow builder since 2022. Patient.
---
### Sandra Wu — HR Manager
Manages hiring, onboarding, and employee relations since 2016. Runs the new-hire
onboarding process (three days, thorough). Sends birthday emails on time, every time.
---
### Owen Blake — Office Manager
Keeps the office running. Has fixed more things than his job title implies. The
person to contact if conference room equipment stops working.
---
### Mike Kawamoto — Account Executive
Handles mid-market manufacturing accounts in the northeast. Believes strongly in
the demo environment. Closes more deals in Q4 than any other quarter.
---
### Lisa Ferreira — Customer Success Manager
Manages onboarding for new AxiomFlow deployments. Has a talent for understanding
what customers mean rather than what they say.
---
## Unresolved Characters (Story Hooks)
These are referenced in existing content but never defined. They represent the
strongest open narrative threads.
---
### Dale — The Previous Sysadmin
**Reference:** Marcus's day-one message — "You're replacing Dale. Nobody will tell you
what Dale did because it's complicated."
Dale is gone. The player has their desk, their access provisioning slot, and
apparently their reputation — people know the player is "Dale's replacement" before
they know the player's name. The systems the player inherits are the systems Dale
last touched.
What Dale did is unknown. It is described as "complicated." Marcus knows. Possibly
Kowalski knows. Possibly Priya knows, if it was security-related.
This is the strongest existing narrative mystery in the game. It has setup and no
payoff. Dale's story could be:
- A technical incident (something Dale broke and couldn't fix)
- A policy violation (something Dale did that wasn't malicious but wasn't right)
- A trust collapse (competent but burned bridges)
- Something personal
- Any combination
The player finding out what Dale did — gradually, through the systems they work on,
through things people let slip — is a natural story spine for the whole game.
---
### "Dave from Finance" — Day One Reference
**Reference:** Marcus's day-one message — "Dave from Finance has your temp credentials.
He's on three today."
Almost certainly Derek Ashford (Financial Controller), referred to informally. But
Derek's first name is Derek, not Dave — this is either Marcus being casual with
names, a continuity error, or a genuinely separate unlisted Finance employee.
Needs a decision: correct "Dave" to "Derek" in Marcus's message, or introduce a
separate "Dave from Finance" as a minor character.
---
## Key Relationships Map
```
Ellen Marsh (CEO)
└── David Park (CTO)
└── Yusuf Halabi (Eng Manager)
├── Mei Lin
├── Cora Reyes
└── Nikhil Sharma
└── Karen Volkov (COO)
└── Rachel Brandt (CFO)
└── Derek Ashford (Financial Controller)
└── Phil Ruiz (VP Sales)
├── Mike Kawamoto
└── Tanya Okafor
└── Lisa Ferreira
Dave Kowalski (Director of IT)
├── Marcus Webb ←── Player's manager
│ └── [Player]
├── Rachel Huang
└── Tom Malaney
Priya Nair (Head of Security)
└── James Osei
Sarah Chen (Product, AxiomFlow) ←── frequent ticket source
Ben Portillo (Product, AxiomDash)
Annika Gosse (UX)
```
---
## Tone Notes for New Story Work
- **Marcus talks like someone who has answered this question before.** Precise, low
affect, no wasted words. Never condescending — just efficient.
- **Sarah talks like a PM: outcome-focused, slightly impatient, specific about
what she needs.** She is not a villain. She has real deadlines.
- **Priya talks like someone who has already thought about what goes wrong.** She
doesn't speculate — she states. She's not alarming, she's matter-of-fact.
- **Dave Okonkwo talks like someone who doesn't know what the problem is** but is
trying to be helpful by reporting exactly what he observed. He should never be
made to look stupid — he's doing the right thing.
- **The company takes itself seriously.** Humor comes from the gap between official
language and reality, not from anyone being a cartoon.
- **Problems have plausible causes.** Systems broke because someone made a
reasonable decision under time pressure, not because they were careless idiots.
The player should feel like a professional, not a janitor.
+165
View File
@@ -0,0 +1,165 @@
# Axiom Works — Company Lore Reference
> For quest authors, dialogue writers, and ticket copy. Keep the tone dry and
> believable. The company should feel real, slightly dysfunctional, and just
> plausible enough that players recognise the type.
---
## Who They Are
**Axiom Works** is a B2B enterprise software company founded in 2011. Headquarters
is in a three-floor office park that is technically "downtown adjacent" depending
on how charitable you are with the map. They have about 280 employees. The
Glassdoor rating is 3.8 stars and management checks it obsessively.
Their flagship product is **AxiomFlow** — a workflow automation platform aimed at
mid-size manufacturers, logistics companies, and anyone who got a 90-minute demo
and thought it looked easy. Most customers are still on the workflow they set up
in 2019. The platform does what it says. Marketing says it does considerably more.
---
## Products
| Product | Description | Status |
|---------|-------------|--------|
| **AxiomFlow** | Workflow automation platform | Active, main revenue |
| **AxiomDash** | Reporting and analytics add-on | Active, profitable, under-resourced |
| **AxiomSync** | Legacy data integration layer | End-of-sale since 2021, still maintained for 12 customers who refuse to migrate |
The current marketing tagline is *"Streamline. Scale. Succeed."* It replaced
*"Work smarter, not harder"* in Q3 of last year. The one before that mentioned
AI. Nobody is sure what the AI was.
---
## Infrastructure
The company runs a mix of on-prem servers (named after Greek gods — a choice made
by a contractor in 2017 who left before documenting anything) and a handful of
cloud instances that accounting keeps trying to consolidate.
| Host | Role | Notes |
|------|------|-------|
| **ares** | Player workstation | XFCE desktop, where the player works |
| **hermes** | Web/app server | nginx, staging and demo environment for AxiomFlow |
| **vulcan** | Build machine | Arch Linux, compiles artifacts, runs scheduled jobs |
### Planned future systems
As the game grows, additional machines will be added. Candidates:
| Proposed host | Role | Greek connection |
|---|---|---|
| **poseidon** | Database server | Foundation, depths, reliability |
| **apollo** | Mail / notification server | Messenger, communication |
| **athena** | Internal tooling (ticketing, wiki) | Wisdom, knowledge management |
| **argus** | Monitoring / alerting | The hundred-eyed watcher |
| **mnemosyne** | Backup / storage | Memory, persistence |
---
## Characters
### Dave Kowalski — Director of IT Operations
The player's skip-level manager. Has been at Axiom Works since 2015. Hired Marcus.
Oversees three teams: systems (Marcus's domain), networking, and IT support. Background
is originally networking — has Cisco certifications he won't bring up unless someone else
brings up Cisco certifications first. Sends weekly status emails formatted in bullet
points that never quite answer the question you were asking. When things go wrong he
schedules a meeting to "talk through the situation," which everyone has learned is
worse than an email. Maintains a calendar block from 23pm on Tuesdays that nobody
has ever asked about. Has said "we should really document that" approximately 400 times.
Describes the infrastructure as "mature."
### Marcus Webb — Senior Sysadmin
The player's manager and the person who assigned them the ticket. Has been at
Axiom Works for six years. Knows where all the bodies are buried. Communicates
primarily in terse Slack messages and occasionally very long emails sent at 11pm.
Trusts competence over process. Gets irritated by people who confuse symptoms
with root causes.
### Priya Nair — Security / Compliance
Runs security reviews and has opinions about everything. Usually right. Tends to
frame concerns in terms of what will happen when things go wrong rather than
whether they will. Was brought in after an incident nobody talks about in public.
### Sarah Chen — Product Manager
Represents the product team's perspective in the ticket queue. Cares about demo
environments more than production ones because demos are what she can see. Not
technically wrong about their importance. Emails at 8am on Mondays.
### Derek Ashford — Financial Controller
Does not appear in person. Appears on CC lines of emails where infrastructure
costs are being discussed. Always replies-all. His full name is Derek Ashford.
His manager is Rachel Brandt (CFO).
---
## Background Characters (non-interactive, for world texture)
These characters exist on the company website and in lore but do not appear in
quests or dialogue. Use them for verisimilitude — email headers, CC lines, internal
wiki author credits, that sort of thing.
### Ellen Marsh — CEO & Co-Founder
Built AxiomFlow after a decade in operations. Not technical. Attends all-hands
twice a year. Has final say on pricing and major customer commitments. Does not
use Slack. The player will never interact with her.
### David Park — CTO & Co-Founder
Wrote the original rules engine. Now manages engineering managers. Still has
opinions about the data model. Has a standing Thursday meeting with security
that hasn't moved since 2017.
### Karen Volkov — COO
Joined 2014. Responsible for the fact that Axiom Works has documented processes
for anything. Has opinions about infrastructure costs. Prefers decisions with
clear owners and deadlines.
### Rachel Brandt — CFO
Joined 2016. Approves all capital expenditure over $5,000. Does not enjoy
surprises in the infrastructure budget. Derek reports to her.
### Phil Ruiz — VP of Sales
Has been promising features to prospects since 2016. Has a warm relationship
with the infrastructure team because Marcus once saved a demo with 20 minutes to
spare. Expense reports submitted promptly.
### Tanya Okafor — Head of Customer Success
Manages all post-sale customer relationships including the twelve AxiomSync
holdouts. Usually the first to know when something is wrong in production,
because a customer has already called her.
### Yusuf Halabi — Engineering Manager
Reports to the CTO. Manages the core AxiomFlow platform team. Has opinions
about test coverage. Runs the Thursday architecture review.
### Mei Lin — Senior Software Engineer
Has maintained AxiomSync's integration layer since 2018. Knows more about it
than anyone would prefer.
### Nikhil Sharma — Platform Engineer
Owns the build and release pipeline and internal CI infrastructure. Occasionally
sends Slack messages at 6am.
### Sandra Wu — HR Manager
Manages hiring, onboarding, and employee relations since 2016. Sends birthday
emails on time, every time. Runs the new-hire onboarding process that takes
three days.
---
## Tone Guidelines
- **Dry, not sarcastic.** The company takes itself seriously. The humour comes
from the gap between how they describe things and what's actually happening.
- **Specific, not generic.** "The AxiomSync customer in Cincinnati keeps calling"
is better than "a client is upset."
- **Plausible dysfunction.** Problems happen because of reasonable decisions made
under time pressure, not because people are incompetent. The player should feel
like a real professional, not a janitor.
- **No cartoon villains.** Derek from Finance is not evil. The product team is not
stupid. They have different priorities.
- **The infrastructure has history.** It was built over time. Some parts are good.
Some parts were good in 2017. The player's job is to keep it working.
+641
View File
@@ -0,0 +1,641 @@
# Installer & Distribution Plan
> Status: Planning — not yet implemented.
> Covers: installer, uninstaller, VM rebuild, save management, modular script architecture.
---
## Goals
- Download zip from GitHub/Gitea, run `install.sh`, done.
- Friendly tone throughout — this is a game, not a server deployment.
- No jargon (libvirt, pool, domain, NAT) in any user-facing output.
- Power users can follow the Manual Install section in README instead.
- VM images live wherever the user puts the game (portable, large-drive friendly).
- Full uninstall with explicit choices about what gets removed.
- Users can rebuild individual VMs if something goes wrong.
- Save data is resettable; save slots available for experimenting.
---
## `start-game.sh` Fixes
The current launcher works but has two real bugs, several fragile assumptions, and
no user-friendly output. Fix this in the same pass as the rest of the scripts since
it will share `lib/ui.sh` and `lib/config.sh`.
### Bugs to fix
**Orphaned server process**
The script ends with `exec remote-viewer`, which replaces the shell. The `trap`
that was set to kill the server on EXIT disappears with the shell — so when the
player closes the SPICE window, the game server keeps running silently.
Fix: don't `exec`. Run `remote-viewer` normally, capture its PID, wait for it to
exit, then kill the server cleanly.
```bash
# instead of:
exec remote-viewer "$spice_uri"
# do:
remote-viewer "$spice_uri" &
VIEWER_PID=$!
trap 'kill "$SERVER_PID" "$VIEWER_PID" 2>/dev/null || true' EXIT INT TERM
wait "$VIEWER_PID"
```
**`sleep 1` server readiness check**
One second is a race. On a slow machine or if npm install just ran, the server
may not be up. On a fast machine it's wasted time.
Fix: poll in a tight loop with a timeout.
```bash
wait_for_server() {
local port="$1" timeout=15 i=0
while ! ss -tlnp | grep -q ":${port} " 2>/dev/null; do
sleep 0.3
((i++))
[ $i -ge $((timeout * 3)) ] && return 1
done
}
```
### Fragile assumptions to fix
- **`lsof` for port check** — not universal. Replace with `ss -tlnp` (iproute2,
present on all modern Linux).
- **No network check** — if the `sc-internal` libvirt network is inactive, the VM
starts but has no network. The HUD loads but shows nothing. Check the network is
active (and start it if not) before starting the VM.
- **No images-dir check** — once portable installs land, `SC_IMAGES_DIR` might be
on an unmounted game drive. Check it exists before trying virsh ops.
- **Frontend build at launch** — `"Building frontend..."` at game launch is odd UX.
Move this guard to install time. The launcher should only verify `dist/index.html`
exists and fail clearly if it doesn't (don't silently trigger a build).
### UX improvements
- Source `lib/ui.sh` and `lib/config.sh` once they exist.
- Replace raw `echo "ERROR: ..."` with friendly messages. Examples:
| Current | Replacement |
|---|---|
| `ERROR: virsh is required.` | `Your system is missing the virtual machine tools.\nRun install.sh to set up the game.` |
| `ERROR: missing workstation domain: sc-workstation` | `Your game world hasn't been built yet.\nRun install.sh to finish setup.` |
| `ERROR: node is required. Install Node.js 18+.` | `Node.js is required but wasn't found.\nRun install.sh to set up the game.` |
- Show brief startup status so the player isn't staring at a blank terminal:
```
Starting Sysadmin Chronicles...
✓ Game server running
✓ Workstation online
Opening your desk...
```
- Add `--manage-saves` and `--reset-save` flags (forward to `tools/save/manage-saves.sh`).
### New flag: `--stop`
Since the server now outlives the viewer when fixed, add `start-game.sh --stop`
that kills any running game server process. Useful if something gets stuck.
### Summary of changes to `start-game.sh`
| Area | Change |
|---|---|
| Server shutdown | `exec` → normal run + `wait`, trap covers both server and viewer |
| Server readiness | `sleep 1` → poll loop with 15s timeout |
| Port check | `lsof``ss -tlnp` |
| Network check | Add: verify `sc-internal` active, start if not |
| Images dir check | Add: verify `SC_IMAGES_DIR` exists before virsh ops |
| Frontend build | Remove from launcher; fail clearly if dist missing |
| Error messages | Replace all with plain-English + fix instructions |
| Startup output | Add three-line status before opening SPICE |
| New flags | `--manage-saves`, `--reset-save`, `--stop` |
---
## Script Architecture
All user-facing scripts share a common library layer. No logic is duplicated.
```
tools/
lib/
ui.sh # colored output, prompts, spinners, progress bars
deps.sh # distro detection, package name map, dep check/install
libvirt.sh # virsh wrappers: network, pool, domain, snapshot ops
vm.sh # build, rebuild, snapshot, revert per VM
config.sh # read/write install config (~/.config/sysadmin-chronicles/config)
save.sh # save slot management, reset helpers
install.sh # project root — the entry point for new users
uninstall.sh # project root — removal with options
start-game.sh # project root — launcher (checks env, starts server, opens SPICE)
tools/
setup/
check-host.sh # kept, improved UX, used internally by install.sh
first-run-setup.sh # kept as internal lib target or merged into install.sh
seed-vms.sh # kept as internal lib target, called by install.sh and rebuild
vm/
rebuild-vms.sh # new: rebuild all or specific VMs
save/
manage-saves.sh # new: list/switch/reset save slots
```
### `lib/ui.sh`
- `sc_step "label"` — numbered step header
- `sc_ok "msg"`, `sc_warn "msg"`, `sc_fail "msg"` — status lines
- `sc_prompt "question" "default"` — interactive prompt, returns answer
- `sc_confirm "question"` — yes/no, returns 0/1
- `sc_spinner "label"` / `sc_spinner_stop` — background spinner for long ops
- `sc_progress "label" current total` — simple fraction display
### `lib/deps.sh`
- `detect_distro` — sets `$SC_DISTRO` (arch, debian, ubuntu, fedora, opensuse)
- `map_packages` — translates canonical dep names to distro package names
- `check_deps` — returns list of missing deps
- `install_deps "pkg1 pkg2 ..."` — runs the right package manager with sudo, logs what was installed
### `lib/libvirt.sh`
- `ensure_network name xml_path`
- `ensure_pool name path`
- `pool_path name` — returns the pool's target directory
- `domain_exists name`, `domain_state name`
- `snapshot_exists domain name`
- `snapshot_create domain name description`
- `snapshot_revert domain name`
- `snapshot_delete domain name`
### `lib/vm.sh`
- `vm_build profile [--dry-run] [--force]` — wraps `build-vm.sh`
- `vm_rebuild profile [--dry-run]` — destroy + rebuild from cloud image
- `vm_revert vm_id snapshot_name` — revert to named snapshot
- `vm_status vm_id` — running / stopped / missing
- `vm_start vm_id`, `vm_stop vm_id`
### `lib/config.sh`
Config file lives at `~/.config/sysadmin-chronicles/config` (survives game dir moves).
Variables stored:
```bash
SC_GAME_DIR=/home/user/Games/sysadmin-chronicles
SC_IMAGES_DIR=/home/user/Games/sysadmin-chronicles/images
SC_LIBVIRT_URI=qemu:///system
SC_INSTALL_DATE=2026-04-27
SC_INSTALLED_DEPS="libvirt qemu-system-x86 ..." # what we added, for the log
```
- `config_read` — sources the config file
- `config_write key value`
- `config_show` — pretty-prints current config
### `lib/save.sh`
- `save_list` — lists all save slots with name, date, trust score, quest progress
- `save_switch slot_name` — switch active save
- `save_new slot_name` — create a new empty save slot
- `save_reset [slot_name]` — wipe a slot back to new-game state
- `save_export slot_name path` — export save JSON for backup
- `save_import path slot_name` — import a save JSON
---
## Installer Design (`install.sh`)
### Phase 1 — Welcome
```
╔══════════════════════════════════════════╗
║ SYSADMIN CHRONICLES — SETUP ║
╚══════════════════════════════════════════╝
Welcome! This installer will:
• Install a few system tools (KVM, QEMU, libvirt)
• Set up a private virtual network for the game
• Build three virtual machines (~30 minutes, once only)
Where would you like to install the game?
[default: ~/Games/sysadmin-chronicles] >
```
### Phase 2 — System check (silent)
Internally calls `check_deps`. If all present, skip to Phase 4 silently.
### Phase 3 — Dependency install (only if needed)
```
Your system is missing the following tools:
• KVM virtualization support (qemu-system-x86)
• Virtual machine manager (libvirt, virt-install)
• SPICE display viewer (virt-viewer)
• Cloud image tools (cloud-image-utils, genisoimage)
Install them now? You'll be asked for your password. [Y/n]
```
After install:
- Log installed packages to `~/.local/share/sysadmin-chronicles/install.log`
- Format: timestamp, package name, version, distro. Human-readable.
- Note at end: "This log is kept so you know exactly what was added. See it at: ..."
### Phase 4 — One-time network and storage setup
```
── Setting up game network ──────────────────
✓ Private game network created
✓ VM image storage configured at ~/Games/sysadmin-chronicles/images
✓ Game access keys generated
```
User never sees "libvirt", "storage pool", "sc-internal", "sc-images".
### Phase 5 — VM build
```
── Building your game world ─────────────────
This happens once and takes about 30 minutes.
You can leave this running in the background.
Building workstation (1/3) ........... ✓ 8m 14s
Building web server (2/3) ........... ✓ 4m 02s
Building build server (3/3) ........... ✓ 5m 31s
Setting up quest scenarios ........... ✓ 1m 48s
```
### Phase 6 — Desktop entry
```
Create a desktop launcher so the game appears in your app menu? [Y/n]
```
Creates `~/.local/share/applications/sysadmin-chronicles.desktop` if yes.
### Phase 7 — Done
```
╔══════════════════════════════════════════╗
║ SETUP COMPLETE! ║
╚══════════════════════════════════════════╝
Start the game:
bash ~/Games/sysadmin-chronicles/start-game.sh
(or from your app menu if you created a launcher)
If you ever need to rebuild the virtual machines:
bash ~/Games/sysadmin-chronicles/tools/vm/rebuild-vms.sh
Install log saved at:
~/.local/share/sysadmin-chronicles/install.log
```
---
## Uninstaller Design (`uninstall.sh`)
Improved from current: shows sizes, explains consequences, three-tier removal.
### Menu approach
```
╔══════════════════════════════════════════╗
║ SYSADMIN CHRONICLES — UNINSTALL ║
╚══════════════════════════════════════════╝
What would you like to remove?
1) Everything — full uninstall (recommended)
2) Game world only — remove VMs, keep game files
3) Save data only — reset to new game
4) Custom — choose what to remove
q) Cancel
>
```
### "Everything" breakdown (shows before confirming)
```
This will remove:
Game virtual machines (3 VMs + all snapshots) ~38 GB
VM image files on disk ~38 GB ← ask separately
Game network and storage configuration <1 MB
Game access keys (~/.ssh/sc_host_key) <1 KB
Desktop launcher (if created) <1 KB
System packages (libvirt, QEMU, etc.) NOT removed
↑ These were installed by your package manager.
See ~/.local/share/sysadmin-chronicles/install.log
if you want to remove them manually.
Keep VM image files? If you ever reinstall, keeping them
saves the 30-minute rebuild. [Y/n — default: keep]
Type REMOVE to confirm: >
```
### What is never auto-removed
- System packages (libvirt, qemu, virt-viewer, etc.)
- Anything not prefixed with `sc-` in libvirt
- Any other libvirt VMs or networks not owned by this game
---
## VM Rebuild Tool (`tools/vm/rebuild-vms.sh`)
For when something goes wrong with a VM or the user wants a clean reset.
```
Usage:
rebuild-vms.sh Rebuild all VMs from scratch
rebuild-vms.sh --vm workstation Rebuild a single VM
rebuild-vms.sh --revert Revert all VMs to baseline snapshot (fast, ~30s)
rebuild-vms.sh --revert --vm workstation
Menu (interactive):
1) Revert all to last known good (fast — restores baseline snapshot)
2) Rebuild workstation (~8 min — rebuilds from cloud image)
3) Rebuild web server (~4 min)
4) Rebuild build server (~5 min)
5) Rebuild everything (~20 min)
q) Cancel
```
Key behavior:
- Always confirm before destroying a VM
- Show what quest progress will be affected
- Offer to back up save data before proceeding
- After rebuild, re-runs the appropriate quest-prep scripts and re-takes baseline snapshot
---
## User Snapshots
Players can take their own named snapshots of any VM — useful before attempting
something risky, or to bookmark a state they want to return to.
These are distinct from the game's automatic shift checkpoints and baseline
snapshots. User snapshots are never pruned automatically.
### Via `manage-saves.sh` (recommended)
The save management menu will include a **VM Snapshots** section:
```
VM Snapshots
workstation (ares)
1) before-ssh-experiment 2026-05-01 19:14
2) checkpoint.shift-3 2026-05-01 22:00 [auto]
3) baseline.day-one [protected]
web server (hermes)
1) my-nginx-fix 2026-05-02 11:30
2) checkpoint.shift-3 2026-05-01 22:00 [auto]
3) baseline.clean [protected]
Actions: [t]ake snapshot [r]evert [d]elete [q]uit
```
Taking a snapshot prompts for a name (letters, numbers, hyphens only).
Reverting shows a confirmation with the snapshot date.
Protected snapshots (baseline.*, checkpoint.*) cannot be deleted from this menu.
### Via `tools/vm/rebuild-vms.sh --snapshot`
For scripting or quick one-liners:
```bash
rebuild-vms.sh --snapshot --vm workstation --name before-risky-thing
rebuild-vms.sh --snapshot --all --name pre-shift-4
rebuild-vms.sh --revert --vm workstation --name before-risky-thing
```
### Storage note
Each VM snapshot is an internal qcow2 differential — typically 100 MB2 GB
depending on how much disk has changed since the baseline. The uninstaller shows
the total size of user snapshots separately so the user can decide whether to
keep them.
### `lib/vm.sh` additions needed
- `vm_snapshot_create vm_id name` — with name validation
- `vm_snapshot_list vm_id` — returns name, date, size, protection flag
- `vm_snapshot_revert vm_id name`
- `vm_snapshot_delete vm_id name` — refuses if name matches `baseline.*` or `checkpoint.*`
---
## Save Management
### Save file layout
```
~/.local/share/sysadmin-chronicles/
saves/
autosave.json ← always-present auto save (current session)
slot-1.json
slot-2.json
slot-3.json
install.log
```
### Save slot semantics
Save slots store JSON state only:
- Trust score and history
- Quest and ticket state
- World flags
- Inbox
- In-world clock
**VM state is not per-slot.** The shift checkpoint snapshots (checkpoint.shift-N) are the VM save mechanism and are independent of JSON slots. This is a known limitation but keeps disk usage manageable.
When switching slots: if the VM state doesn't match the JSON slot's expected state, warn the user. They may need to revert VMs manually.
### `tools/save/manage-saves.sh`
```
Usage:
manage-saves.sh Show save slot menu
manage-saves.sh --reset Reset current save to new game
manage-saves.sh --reset slot-1 Reset a specific slot
manage-saves.sh --list List all slots
Interactive menu:
Current save: autosave (Day 3, Trust: 67, 4/8 quests)
1) autosave Day 3 Trust 67 Q4/8 [active]
2) slot-1 Day 1 Trust 50 Q1/8
3) slot-2 —empty—
4) slot-3 —empty—
Actions: [s]witch [n]ew [r]eset [e]xport [i]mport [q]uit
```
### Reset save (standalone, accessible from start-game.sh)
The launcher `start-game.sh` should have an escape hatch:
```
start-game.sh --manage-saves → opens save management menu
start-game.sh --reset-save → confirms and resets to new game
```
---
## Launcher Improvements (`start-game.sh`)
Current issues to fix:
- Silently fails if images drive not mounted
- No check that the libvirt network is up before starting
- `sleep 1` to wait for server is fragile
Improvements:
- `config_read` to get `SC_IMAGES_DIR`, check it exists and is writable
- Check libvirt network is active, start it if not (with clear message)
- Poll server readiness on `/healthz` instead of sleeping
- Show a brief status before launching SPICE: "Starting your workstation..."
- On failure, show a plain-English error and the fix
---
## Portable Installation Notes
The `sc-images` libvirt pool target can be any path the host OS can write to. The installer configures it to `$SC_IMAGES_DIR` (inside the game dir by default).
If the user puts the game on a game drive (`/mnt/gamesdrive/sysadmin-chronicles/`):
- `SC_IMAGES_DIR=/mnt/gamesdrive/sysadmin-chronicles/images`
- The libvirt pool points there
- All qcow2 files live on the game drive
- The launcher checks the drive is mounted before starting
If the drive is unmounted:
```
✗ Can't find your game world.
The VM images are stored at /mnt/gamesdrive/sysadmin-chronicles/images
but that location isn't available right now.
Is your game drive plugged in and mounted?
Once it's mounted, run start-game.sh again.
```
---
## Dependency Log Format
`~/.local/share/sysadmin-chronicles/install.log`
```
# Sysadmin Chronicles — Install Log
# Created: 2026-04-27 14:32:01
# Distro: arch (6.19.12-arch1-1)
# Game dir: /home/aaron/Games/sysadmin-chronicles
# Images: /home/aaron/Games/sysadmin-chronicles/images
[INSTALLED] libvirt 12.2.0 via pacman
[INSTALLED] qemu-system-x86 11.0.0 via pacman
[INSTALLED] qemu-hw-display-qxl 11.0.0 via pacman
[INSTALLED] qemu-hw-display-virtio-gpu 11.0.0 via pacman
[INSTALLED] qemu-ui-spice-core 11.0.0 via pacman
[INSTALLED] qemu-chardev-spice 11.0.0 via pacman
[INSTALLED] qemu-audio-spice 11.0.0 via pacman
[INSTALLED] virt-install 5.1.0 via pacman
[INSTALLED] virt-viewer 11.0 via pacman
[INSTALLED] cloud-image-utils 0.33 via pacman
[INSTALLED] cdrtools 3.02a09 via pacman
[INSTALLED] libisoburn 1.5.8 via pacman
[SKIPPED] nodejs already installed
# To remove manually:
# sudo pacman -Rns libvirt qemu-system-x86 qemu-hw-display-qxl ...
```
---
## File Layout After Install
```
~/Games/sysadmin-chronicles/ ← SC_GAME_DIR
install.sh
uninstall.sh
start-game.sh
content/
server/
frontend/
docs/
tools/
lib/
ui.sh
deps.sh
libvirt.sh
vm.sh
config.sh
save.sh
setup/
check-host.sh
first-run-setup.sh
seed-vms.sh
vm/
rebuild-vms.sh
build-vm.sh
...
save/
manage-saves.sh
images/ ← SC_IMAGES_DIR (libvirt pool points here)
sc-workstation.qcow2 (~20 GB)
sc-web-server.qcow2 (~8 GB)
sc-build-machine.qcow2 (~10 GB)
~/.config/sysadmin-chronicles/config ← install config (survives game dir moves)
~/.local/share/sysadmin-chronicles/
saves/
autosave.json
slot-1.json ...
install.log
```
---
## Implementation Order
1. `tools/lib/ui.sh` — all other scripts depend on this
2. `tools/lib/config.sh` — needed by installer and launcher
3. `tools/lib/deps.sh` — needed by installer
4. `tools/lib/libvirt.sh` — needed by installer and rebuild tool
5. `tools/lib/vm.sh` — needed by installer and rebuild tool
6. `tools/lib/save.sh` — needed by save manager
7. `install.sh` — assembles libs 15
8. `tools/vm/rebuild-vms.sh` — assembles libs 1, 3, 4
9. `tools/save/manage-saves.sh` — assembles libs 1, 2, 6
10. `uninstall.sh` — assembles libs 1, 2, 4
11. `start-game.sh` (improved) — assembles libs 1, 2
12. Update `check-host.sh` UX
13. README — manual install section, quick start
---
## README Structure
```markdown
## Quick Install
curl -fsSL .../install.sh | bash
# or
bash install.sh # from downloaded zip
## Manual Install
<details>
<summary>For users who want full control or are troubleshooting</summary>
...per-distro dep tables, step-by-step...
</details>
```
+76
View File
@@ -0,0 +1,76 @@
# SYSADMIN CHRONICLES — PRESSURE PROFILES
> Version 1.1
>
> Pressure profiles define how an unresolved situation degrades over time.
> They are referenced by name from quest files and live in
> `content/pressure_profiles/`.
>
> A pressure profile is NOT an incident. An incident is a discrete event with
> a trigger, escalation chain, and resolution. A pressure profile describes the
> passive degradation behavior of the environment while a quest is active and
> unresolved. Incidents may be spawned by pressure profiles, but are separate.
---
## SCHEMA
```json
{
"id": "web_outage_escalation",
"label": "Web Service Outage",
"description": "Gentle escalation for Tier 1 web outage quests. Creates narrative urgency without punishing new players.",
"intensity": 2,
"escalation_steps": [
{
"trigger_after_seconds": 900,
"notification": "Hermes is still showing errors. Is someone on this?",
"notification_severity": "warning"
},
{
"trigger_after_seconds": 1800,
"notification": "Site has been down thirty minutes. Ticket priority is going up.",
"notification_severity": "warning",
"escalate_linked_ticket": "high"
},
{
"trigger_after_seconds": 3600,
"notification": "Hour down. Priya has been copied in.",
"notification_severity": "error",
"escalate_linked_ticket": "critical"
}
]
}
```
---
## FIELD REFERENCE
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique identifier. Must match the string used in the quest's `pressure_profile` field. |
| `label` | string | Short human-readable name for tooling and authoring. |
| `description` | string | Internal description for authors. |
| `intensity` | int | Relative urgency / pressure level. |
| `escalation_steps` | array | Ordered list of timed escalation notices or ticket priority changes. |
### Stage Fields
| Field | Required | Description |
|-------|----------|-------------|
| `trigger_after_seconds` | Yes | Seconds after activation before the stage fires. |
| `notification` | Yes | Player-facing escalation message. |
| `notification_severity` | Yes | Severity label used by the UI and notifier. |
| `escalate_linked_ticket` | No | Optional linked-ticket priority escalation. |
---
## AUTHORING NOTES
- `trigger_after_seconds` is relative to quest activation time, not real wall time.
In-game time compression applies.
- Stages must be ordered by `trigger_after_seconds` ascending. Authoring tools will
warn on out-of-order stages.
- Pressure profiles should create urgency, not guaranteed punishment.
- If a pressure profile escalates a linked ticket, it should do so in a way that
matches the authored ticket priority curve.
+377
View File
@@ -0,0 +1,377 @@
# SYSADMIN CHRONICLES — PROJECT MAP
> Living document. Update when files are added, moved, removed, or when architecture changes.
> Version 5.1 | Living document — update when files are added, moved, or removed.
---
## ROOT STRUCTURE
```
sysadmin-chronicles/
├── server/ ← NEW: Node.js game server
│ ├── src/
│ │ ├── index.js Entry point — Express + WebSocket
│ │ ├── routes/ auth, state, tickets, mail, docs, sage, vms
│ │ ├── services/ ContentLoader, QuestEngine, TicketService,
│ │ │ ValidationEngine, VMManager, TrustSystem,
│ │ │ ProgressionSystem, EmailService, SageService,
│ │ │ ShiftTimer, IncidentScheduler, ShiftReviewService,
│ │ │ CertificationService, SaveState
│ │ └── lib/ ssh.js, virsh.js, command.js, eventBus.js, session.js
│ └── package.json
├── frontend/ ← NEW: Svelte web HUD
│ ├── src/
│ │ ├── App.svelte Root component, WebSocket, panel routing
│ │ ├── components/ TicketsPanel, MailPanel, DocsPanel, SagePanel,
│ │ │ VmsPanel, ProfilePanel, HeaderBar, SidebarTabs
│ │ ├── lib/api.js REST API fetch wrapper
│ │ └── main.js
│ ├── dist/ Built output (served by game server)
│ └── package.json
├── scripts/
│ └── start-game.sh One-shot: start server + open SPICE workstation viewer
├── docs/
│ ├── ARCHITECTURE.md System architecture
│ ├── CHARACTERS.md All characters — bios, relationships, story hooks
│ ├── COMPANY_LORE.md World, company, products, tone guidelines
│ ├── INSTALLER_PLAN.md Installer design and packaging
│ ├── PRESSURE_PROFILES.md Time-pressure escalation schema and authoring guide
│ ├── PROJECT_MAP.md ← this file
│ ├── ROADMAP.md Development phases and content status
│ ├── RUNTIME_DEPENDENCIES.md Host dependencies and version requirements
│ ├── SAVE_SYSTEM.md Save model, VM persistence policy, recovery flows
│ ├── SNAPSHOT_CHAIN.md VM snapshot chain and baseline management
│ ├── STORY_DESIGN_CONTEXT.md How story works — narrative arc, quest model, design constraints
│ ├── VM_BUILD_SYSTEM.md VM build and provisioning system
│ ├── WORKSTATION_POLISH_BACKLOG.md Outstanding UX polish items
│ └── codex-specs/
├── content/ ← data-driven content loaded by Node.js server
│ ├── quests/ quest JSON files (being reworked — see STORY_DESIGN_CONTEXT.md)
│ ├── tickets/ ticket JSON files (being reworked)
│ ├── incidents/ incident JSON files (being reworked)
│ ├── pressure_profiles/ escalation profiles (schema in PRESSURE_PROFILES.md)
│ ├── dialogue/ character dialogue JSON files (being reworked)
│ ├── world_flags/ world_flags.json (central registry)
│ ├── docs/ onboarding, sage_content, internal_docs, etc.
│ ├── progression/ trust_unlocks.json, access_tiers.json
│ └── vm_profiles/ workstation.json, web_server.json, build_machine.json
├── tools/
│ ├── setup/ check-host.sh, seed-vms.sh, first-run-setup.sh, uninstall.sh
│ ├── vm/ build-vm.sh, build-*.sh, snapshot-all.sh, suppress-maintenance-noise.sh
│ │ ├── profiles/ workstation.sh, web-server.sh, build-machine.sh
│ │ └── quest-prep/ Q001Q008 prep/post scripts
│ └── content/ validate-content.js (zero-error gate), verify-clue-fingerprints.js
├── company-website/ Axiom Works public website (static HTML/CSS)
│ ├── index.html Home — hero, product highlights, stats
│ ├── about.html Company story, values, contact
│ ├── people.html Team page — Dave, Marcus, Priya, Sarah + filler staff
│ ├── products.html AxiomFlow, AxiomDash, AxiomSync product pages
│ ├── style.css Shared corporate CSS (navy/blue scheme)
│ └── assets/ logo.png, portrait photos for each NPC
├── vm/ images/, snapshots/, cloud-init/, probes/
├── package.json
└── README.md
```
---
## COMPANY WEBSITE
Static HTML/CSS site serving as the public-facing Axiom Works company website, accessible from the workstation VM.
**URL inside the VM:** `http://www.axiomworks.corp/` (no port)
**How it works:**
- The game server serves `company-website/` at `/company/` (port 3000)
- nginx is installed in the workstation VM and proxies `axiomworks.io` and `www.axiomworks.io` (port 80) → game server port 3000 at `/company/`
- `/etc/hosts` in the workstation maps both hostnames to `127.0.0.1` (localhost → nginx)
- Result: the player sees a clean `http://www.axiomworks.io/` URL in Chromium with no port number
**Pages:** Home (`index.html`), About (`about.html`), Our Team (`people.html`), Products (`products.html`)
**Team page portraits:** NPC photos live in `company-website/assets/`. The player is not featured on the website.
**Domain note:** `axiomworks.corp` uses the IANA-reserved `.corp` TLD (reserved 2024, can never be publicly delegated). No registration needed — it will never resolve on the real internet. The in-VM `/etc/hosts` + nginx approach is sufficient for any build.
**Player portraits** (for the HUD profile panel) are separate from the website portraits. They live in `server/public/portraits/` and are served at `/public/portraits/`. The player selects one via the Profile panel; the choice persists in `save.json` as `player_portrait`.
---
## BOOT FLOW (Node.js Server)
```
bash scripts/start-game.sh
node server/src/index.js
1. ContentLoader.load() — reads all content/**/*.json into memory
2. SaveState.load() — reads ~/.local/share/sysadmin-chronicles/save.json
or creates fresh save
3. TrustSystem.initialize() — hydrates trust score + unlock state
4. ProgressionSystem.initialize()
5. QuestEngine.initialize() — restores quest states from save
6. TicketService.initialize()
7. EmailService.initialize() — restores inbox, seeds T001 email on fresh save
8. ShiftTimer.start() — starts shift clock
9. IncidentScheduler.start() — begins pressure tick loop (every 30s)
10. VMManager.ensureWorkstationLive() — virsh start sc-workstation if needed
Express + WebSocket listening on PORT (default 3000)
remote-viewer opens SPICE connection to sc-workstation
Player sees XFCE desktop → Chromium opens HUD → game is live
```
---
## TICKET COMPLETION FLOW
```
Player clicks "Mark Complete" on ticket in HUD
POST /api/tickets/:id/complete
TicketService.markComplete(ticketId)
→ load ticket + linked quest JSON
→ for each solution_branch (sorted by priority DESC):
ValidationEngine.check(vmId, branch.validation.rules)
→ VMManager.getIP(vmId)
→ SSH as opsbridge using sc_host_key
→ run each rule check (file_exists, service_state, etc.)
if all rules pass → winning branch found
→ TrustSystem.adjust(branch.trust_delta)
→ WorldFlags.set(branch.world_flags)
→ QuestEngine.completeQuest(questId)
→ EmailService.send(follow-up NPC email if negative branch)
→ SaveState.write()
→ broadcast trust:changed, mail:new via WebSocket
Response: { passed, branch, trust_delta, failures }
HUD shows success toast or failure details
```
---
## VM IDENTITY TABLE
| vm_id | SC constant | libvirt domain | hostname | distro | ssh_user | mgmt_user | always_live | Quests |
|-------|-------------|----------------|----------|--------|----------|-----------|-------------|--------|
| `workstation` | `SC.VM_WORKSTATION` | `sc-workstation` | `ares` | Debian 12 | `player` | `opsbridge` | yes | Q001 |
| `web_server` | `SC.VM_WEB_SERVER` | `sc-web-server` | `hermes` | Debian 12 | `player` | — | no | Q002Q005, Q007 |
| `build_machine` | `SC.VM_BUILD_MACHINE` | `sc-build-machine` | `vulcan` | Arch Linux | `player` | — | no | Q006, Q008 |
See `docs/VM_BUILD_SYSTEM.md` for full build system documentation and profile authoring guide.
**SSH key**: all host→guest connections use `~/.ssh/sc_host_key` (BatchMode, no password).
**Baseline snapshots**:
- workstation: `baseline.day-one`
- web_server, build_machine: `baseline.clean`
---
## TERMINAL ARCHITECTURE
The player uses a real **Tilix** terminal inside the workstation VM (sc-workstation / ares).
No terminal simulation. SSH to target VMs is real SSH. There is no in-game terminal widget.
```
Player opens Tilix on the workstation XFCE desktop
→ types: ssh hermes
→ real SSH to sc-web-server using player's authorized_keys
→ works directly on the target VM
Host-side validation (triggered by "Mark Complete" in HUD):
ValidationEngine.js SSHes as 'opsbridge' → sudo -H -i -u player
Runs rule checks (file_exists, service_state, etc.)
Returns pass/fail to game server
```
Host SSH options (used by ValidationEngine.js and VMManager.js):
```
-o StrictHostKeyChecking=no
-o BatchMode=yes
-o ConnectTimeout=5
-o LogLevel=ERROR
-i ~/.ssh/sc_host_key
```
---
## SERVICE DEPENDENCY GRAPH (Node.js server)
```
eventBus.js (Node.js EventEmitter — no deps)
└─ consumed by: all services
ContentLoader
└─ consumed by: QuestEngine, TicketService, ValidationEngine, TrustSystem,
ProgressionSystem, IncidentScheduler, EmailService, SageService
VMManager
← wraps virsh.js + ssh.js
← called by QuestEngine (start required VMs on quest activation)
← called by ValidationEngine (get VM IP for SSH)
ValidationEngine
← calls VMManager.getIP(vmId)
← SSHes as opsbridge → runs rule checks (file_exists, service_state, etc.)
← called by TicketService on mark-complete
QuestEngine
← calls VMManager to start required VMs
← calls ValidationEngine via TicketService
← calls TrustSystem, WorldFlags, EmailService on resolution
→ emits via eventBus: quest:activated, quest:resolved, ticket:received
IncidentScheduler
← reads WorldFlags for trigger conditions
← tick drives escalation step advancement
→ emits via eventBus: incident:activated, incident:escalated, incident:resolved
TrustSystem
← called by QuestEngine on branch resolution
← called by IncidentScheduler for ignored incident penalties
→ emits via eventBus: trust:changed
SaveState
← called by QuestEngine, TrustSystem, ProgressionSystem
← reads/writes ~/.local/share/sysadmin-chronicles/save.json
```
---
## KEY MODULES
### Server (`server/src/`)
| Module | File | Responsibility |
|--------|------|----------------|
| Entry point | index.js | Express + WS, service wiring, static serving |
| ContentLoader | services/ContentLoader.js | Load all content/ JSON at startup |
| QuestEngine | services/QuestEngine.js | Quest state machine |
| TicketService | services/TicketService.js | Ticket state, mark-complete, branch resolution |
| ValidationEngine | services/ValidationEngine.js | SSH rule evaluation (all rule types) |
| VMManager | services/VMManager.js | virsh wrappers, IP resolution |
| TrustSystem | services/TrustSystem.js | Score, unlocks, revocation |
| ProgressionSystem | services/ProgressionSystem.js | Unlocked VMs, docs, access |
| EmailService | services/EmailService.js | Inbox, follow-ups, reply options |
| SageService | services/SageService.js | Rule-based dialogue / KB |
| ShiftTimer | services/ShiftTimer.js | Shift clock, 30s tick broadcasts |
| IncidentScheduler | services/IncidentScheduler.js | Pressure tick, incident injection |
| ShiftReviewService | services/ShiftReviewService.js | End-of-shift review email |
| CertificationService | services/CertificationService.js | Cert awards after quest chains |
| SaveState | services/SaveState.js | Read/write save.json |
| ssh.js | lib/ssh.js | Promisified SSH execution |
| virsh.js | lib/virsh.js | virsh command wrappers |
| eventBus.js | lib/eventBus.js | Node.js EventEmitter for service coordination |
### Frontend (`frontend/src/`)
| Component | File | Responsibility |
|-----------|------|----------------|
| Root | App.svelte | Panel routing, WebSocket connection |
| Tickets | TicketsPanel.svelte | List, detail, mark-complete |
| Mail | MailPanel.svelte | Inbox, message, reply buttons |
| Docs | DocsPanel.svelte | Trust-gated doc viewer |
| Sage | SagePanel.svelte | Chat / KB search |
| VMs | VmsPanel.svelte | Live VM status indicators |
| Header | HeaderBar.svelte | Trust, shift timer, mail badge |
| API | lib/api.js | REST fetch wrapper |
---
## CONTENT DOMAINS
| Domain | Purpose |
|--------|---------|
| `quests/` | Objective chains, clue fingerprints, validation rules, branch priorities |
| `tickets/` | Player-facing problem statements with initial/current priority |
| `incidents/` | Dynamic pressure events with blast_radius and escalation steps |
| `dialogue/` | Workplace messages, hints, follow-ups, series threads |
| `pressure_profiles/` | Reusable escalation templates referenced by quest branches |
| `world_flags/` | Central registry — all world state flags declared here |
| `docs/` | Internal documentation + Sage/help content (trust-gated) |
| `progression/` | Trust thresholds, unlocks, revocation rules, access tiers |
| `vm_profiles/` | Domain names, hostnames, snapshots, networks, resource budgets |
---
## FILE NAMING CONVENTIONS
- Quest files: `Q{NNN}-{kebab-case-title}.json`
- Ticket files: `T{NNN}.json`
- Incident files: `I{NNN}-{kebab-case-title}.json`
- Dialogue files: `{character}-Q{NNN}.json` or `{character}-Q{NNN}-{variant}.json`
- Quest prep scripts: `Q{NNN}-prep.sh`
- VM profiles: `{snake_case}.json`
---
## CONTENT VALIDATION CHECKS
Run: `node tools/content/validate-content.js` — must exit 0 (zero errors).
| Check | Rule |
|-------|------|
| JSON well-formed | All content files parse without error |
| No duplicate IDs | Unique across quests, tickets, incidents, pressure profiles, dialogue |
| World flags | Every referenced flag exists in `world_flags/world_flags.json` |
| required_vms | Every entry maps to a valid VM profile |
| blast_radius | Every entry maps to an existing incident file |
| linked_quest | Every ticket's linked_quest maps to an existing quest |
| ticket_id | Every quest's ticket_id maps to an existing ticket |
| Branch priority | Priorities unique per quest (no ties) |
| follow_up_incident | Maps to an existing incident file |
| pressure_profile | Maps to an existing pressure profile file |
| series_id | Every series_id has at least two dialogue members |
| revokes | Trust unlock revoke entries reference valid unlock strings |
| clue_fingerprint | Evidence rule types are valid |
---
## KNOWN GAPS (Post-Redesign)
These are gaps in the v4.0 Node.js + Svelte implementation.
All content is authored, validator-clean, and reused unchanged.
### P0 — Blocking for first playable shift
| Gap | Notes |
|-----|-------|
| Phase 7 workstation VM verification | Confirm SPICE display, Chromium autostart, Tilix as default work end-to-end on a freshly seeded VM |
| Phase 10 full playtest | Boot all VMs, play Q001→Q002, validate full server→SSH→HUD loop |
### P1 — Required before broader testing
| Gap | Notes |
|-----|-------|
| Clue quality as system degrades | Evidence should remain legible as incidents escalate (I001/I002/I003 escalation pass) |
| Viewer smoothness | `remote-viewer` SPICE path is functional but not final-UX smooth; lower priority with real XFCE desktop |
### P2 — Polish / completeness
| Gap | Notes |
|-----|-------|
| WORKSTATION_POLISH_BACKLOG.md items | See that file for outstanding desktop UX polish |
---
## GENERATED / LARGE ASSETS
Created by CLI tooling, not hand-managed:
- `vm/images/*.qcow2`
- Imported libvirt domain XML
- Baseline snapshot exports or manifests
- Shift checkpoint snapshots
- Packaged Linux build artifacts
+58
View File
@@ -0,0 +1,58 @@
# SYSADMIN CHRONICLES — DEVELOPMENT ROADMAP
> Version 5.0 | Status: Active development
>
> Changelog:
> v5.0 — GDScript/Godot removed. Node.js + Svelte is the only codebase.
> v4.0 — Full architecture pivot to Node.js + Svelte.
> v3.x — GDScript/Godot era (superseded).
---
## IMPLEMENTATION PHASES (Node.js + Svelte)
| Phase | Description | Status |
|-------|-------------|--------|
| 1 | Game server skeleton — Express, ContentLoader, SaveState, GET /api/state | [x] done |
| 2 | TrustSystem, ProgressionSystem, QuestEngine, TicketService, ticket routes | [x] done |
| 3 | ValidationEngine — SSH into VMs, all rule types | [x] done |
| 4 | EmailService — inbox, follow-up emails, reply options, mail routes | [x] done |
| 5 | WebSocket broadcasts — trust:changed, mail:new, shift:tick, incident:alert | [x] done |
| 6 | Svelte frontend — all panels built, dist/ served by game server | [x] done |
| 7 | XFCE workstation VM — cloud-init, SPICE/QXL, Chromium, Tilix, autostart | [x] done |
| 8 | SageService + docs routes + SagePanel + DocViewer | [x] done |
| 9 | IncidentScheduler + ShiftTimer + pressure tick loop | [x] done |
| 10 | Full playtest — boot all VMs, play Q001→Q002 end to end | [ ] pending |
**Phase 7 details:** `workstation.sh` profile provisions the full XFCE desktop via
cloud-init: SPICE+virtio display with spicevmc channel for vdagent resize, Chromium
autostart via `open-portal` wrapper (waits for game server before launching), Tilix
as default terminal (`update-alternatives` + `helpers.rc`), dark theme, screensaver
off, desktop icons executable. Snapshot chain: `baseline.day-one`, `baseline.recovery`
taken by `seed-vms.sh`.
---
## CONTENT STATUS
The quest system and story are being completely reworked. All existing quest,
ticket, dialogue, and incident content (Q001Q008, T001T008, I001I003) is
considered legacy and will be replaced.
### Story Design Assets
| File | Purpose |
|------|---------|
| `docs/CHARACTERS.md` | All characters — bios, relationships, story hooks, unresolved threads |
| `docs/STORY_DESIGN_CONTEXT.md` | How story works in this game — narrative arc, quest structure, character model, design constraints |
| `docs/COMPANY_LORE.md` | World, company, products, tone guidelines |
---
## QUEST TIER DEFINITIONS
| Tier | Label | Characteristics |
|------|-------|-----------------|
| 1 | Tutorial Arc | Single VM, clear symptoms, one obvious fix, one better fix, no time pressure |
| 2 | Workday Arc | Multi-symptom, one quest affects another, trust pressure, incidents active |
| 3 | Stretch | Multi-VM, ambiguous root cause, political pressure, real prioritization stakes |
+54
View File
@@ -0,0 +1,54 @@
# Runtime Dependencies
This file tracks host and guest dependency expectations for Sysadmin Chronicles.
Keep it updated when provisioning scripts, VM display backends, or installer
requirements change.
## Host Packages
| Capability | Arch package / command | Minimum tested version | Notes |
| --- | --- | --- | --- |
| Godot runtime | `godot` | 4.6.2 | Used for the current Godot client path. |
| Libvirt CLI | `libvirt` / `virsh` | 12.2.0 | Use `qemu:///system` for game VMs. Socket activation is supported. |
| QEMU system emulator | `qemu-system-x86` / `qemu-system-x86_64` | 11.0.0 | Must match the split QEMU module package versions. |
| QEMU disk tools | `qemu-img` | 11.0.0 | Used by VM builders for qcow2 images. |
| QXL display module | `qemu-hw-display-qxl` | 11.0.0 | Required for `virt-install --video qxl`. |
| Virtio GPU modules | `qemu-hw-display-virtio-gpu`, `qemu-hw-display-virtio-gpu-pci`, `qemu-hw-display-virtio-vga` | 11.0.0 | Required for the default SPICE + virtio workstation display path. |
| SPICE UI module | `qemu-ui-spice-core` | 11.0.0 | Required for SPICE graphics in libvirt domain capabilities. |
| SPICE channel module | `qemu-chardev-spice` | 11.0.0 | Required for SPICE agent channels. |
| SPICE audio module | `qemu-audio-spice` | 11.0.0 | Required for SPICE-backed guest audio. |
| VM installer | `virt-install` | 5.1.0 | Creates imported cloud-image domains. |
| SPICE viewer | `remote-viewer` / `virt-viewer` | 11.0 | Used for desktop workstation display. |
| Cloud image tools | `cloud-image-utils`, `cdrtools`, `libisoburn` | cloud-image-utils 0.33, cdrtools 3.02a09, libisoburn 1.5.8 | Used to generate seed ISOs. |
| SSH client | `ssh` | OpenSSH 10.3p1 | Used by the game and setup scripts to reach guests. |
| Node.js | `node` | 22.22.2 | Required by the redesigned browser HUD/server path. |
## Libvirt Resources
| Resource | Required shape | Notes |
| --- | --- | --- |
| Network | `sc-internal`, bridge `sc-br0`, subnet `10.42.0.0/24`, NAT forwarding | NAT is required during VM image provisioning so Debian cloud-init can install packages. The network remains private to libvirt guests for inbound access. |
| Storage pool | `sc-images` | For `qemu:///system`, defaults to `/var/lib/libvirt/images/sysadmin-chronicles`. |
| SSH key | `~/.ssh/sc_host_key` | Injected into guests for game automation and bridge access. |
## Workstation Guest Packages
The workstation image currently targets Debian 12 Bookworm and installs:
- Desktop/display: `xfce4`, `xfce4-goodies`, `lightdm`, `lightdm-gtk-greeter`, `spice-vdagent`, `qemu-guest-agent`, `accountsservice`, `linux-image-amd64`
- Desktop metadata: `gvfs`, `gvfs-daemons`, `libglib2.0-bin` for trusted desktop launchers and GVFS metadata writes
- User tools: `tilix`, `chromium`, `thunar`, `geany`, `meld`, `vim`, `nano`, `tmux`, `htop`
- Sysadmin tools: `openssh-server`, `openssh-client`, `sudo`, `curl`, `wget`, `rsync`, `git`, `jq`, `python3`, `nmap`, `netcat-openbsd`, `dnsutils`, `traceroute`, `mtr`, `tcpdump`, `strace`, `lsof`, `openssl`, `whois`, `iperf3`, `logwatch`
- Fonts/completion: `fonts-hack`, `fonts-firacode`, `bash-completion`
## Version Capture
Before cutting an installer or release, capture current versions with:
```bash
tools/setup/check-host.sh
virsh --connect qemu:///system version
qemu-system-x86_64 --version
virt-install --version
pacman -Q libvirt qemu-system-x86 qemu-hw-display-qxl qemu-hw-display-virtio-gpu qemu-hw-display-virtio-gpu-pci qemu-hw-display-virtio-vga qemu-ui-spice-core qemu-chardev-spice qemu-audio-spice virt-install virt-viewer spice-gtk cloud-image-utils cdrtools libisoburn
```
+330
View File
@@ -0,0 +1,330 @@
# SYSADMIN CHRONICLES — SAVE SYSTEM DESIGN
> Version 1.3 | Status: Active development
>
> Changelog:
> v1.3 — Defined `persists: false` flag semantics (shift boundary reset).
> Added world flag persistence rules section.
>
> This document covers the save model, VM persistence policy, dirty state
> handling, recovery flows, and the design decisions behind them.
---
## THE CORE TENSION
The game wants real VMs. Real VMs have real state. That state changes as the
player works. The question is: what do we save, when, and what happens when
things go wrong?
Two broad approaches exist:
**Approach A — Replay Model**
Save authored flags and game state only. On load, restore a baseline snapshot
and replay authored events to reconstruct the world. Simple, cheap, predictable.
**Approach B — Dirty State Model**
Preserve actual VM disk state as-is. Save references to the current snapshot or
live qcow2 state. On load, the VM resumes exactly where it was.
This game uses **Approach B**, with structured recovery fallbacks. Here is why,
and what that means in practice.
---
## WHY DIRTY STATE
The replay model breaks the design contract. If the player spent forty minutes
debugging a broken service, leaving behind log entries, partial edits, and
useful breadcrumbs, restoring a clean baseline erases all of that. The world
forgets. That is not how real systems work.
The dirty state model means:
- The player's workstation remembers what they did
- Target VMs remember fixes applied and mistakes made
- Evidence persists — good and bad
- A machine the player damaged stays damaged until they fix it or request reimage
- A machine they set up correctly stays correct
Operational note:
- The workstation should be treated as a curated terminal-first appliance image
whose shell history, local config, and jump-box state persist like any other VM state
- Desktop-like company tools live in the game state layer, not inside a VM browser session
- Rebuilding the workstation runtime on every reset would create slow, noisy,
and inconsistent recovery behavior
This is more expensive. It is also the point of the game.
---
## WHAT GETS SAVED
### Game State Layer
Saved as structured JSON. Cheap, fast, always consistent.
- Player trust score and history
- Unlocked VMs, sudo scopes, internal docs, tools
- Active and completed ticket/quest state
- World flags (current values and change history)
- Incident scheduler state (active incidents, escalation timers)
- Per-quest authored consequence records
- Shift timestamp and in-world clock
### VM State Layer
Saved as libvirt snapshot references or qcow2 state references. Expensive but
necessary.
- Per-VM: reference to current named snapshot or live disk state
- Per-VM: list of managed recovery checkpoints
- Per-VM: reimage eligibility and reimage history
- Per-VM: last-known observation data (advisory, not authoritative)
The game does not store VM disk images in the save file. It stores references to
named snapshots managed by libvirt. The actual disk data lives where libvirt
puts it.
---
## WORLD FLAG PERSISTENCE RULES
Every world flag in `world_flags/world_flags.json` declares a `persists` field.
This controls how the flag behaves across shift boundaries and game loads.
### `persists: true`
The flag is written to the save file and survives indefinitely. It is cleared
only when a quest or incident explicitly sets it to false, or when the VM is
reimaged. Most flags are persistent — they represent stable facts about the
world (nginx is configured correctly, logrotate is healthy, etc.).
### `persists: false`
The flag is **reset at the start of each new shift**, regardless of its current
value. It is NOT reset on game load within the same shift.
Non-persistent flags represent transient pressure states that should not carry
forward into the next working session:
- `hermes_disk_healthy` — disk state that may change overnight without the player's intervention
- `web_disk_pressure_active` — active disk pressure event currently escalating
**On shift boundary**: all `persists: false` flags are cleared before the new
shift's checkpoint is taken. Their cleared state is what gets saved.
**On game load mid-shift**: `persists: false` flags are loaded from the save
file as-is. They are not reset on load, only on shift boundary.
**Implementation note for `SaveSystem`**: When writing the shift checkpoint,
iterate all world flags and zero out any with `persists: false` before
serializing. Do not zero them in the live `WorldFlagRegistry` until the
checkpoint write is complete, to avoid mid-write state corruption.
---
## SNAPSHOT STRATEGY FOR SAVE/LOAD
### Named Snapshot Tiers
Each VM maintains three tiers of snapshots:
```
baseline.clean — Authored starting state for a fresh quest arc
baseline.recovery — Fallback if live state is unrecoverable
checkpoint.shift-{N} — Auto-saved at start of each in-game shift
live — Current working state (no snapshot, just disk)
```
On save: the game records which snapshot tier is current per VM and any
divergence from it (live state is implicitly the disk, not a snapshot).
On load: the game checks that referenced snapshots still exist and are
consistent with the saved game state flags. If they are, it resumes from live
disk state and continues normally.
### What "Resume" Means
The game does not revert to a snapshot on load. It resumes from whatever state
the VMs are currently in. The save file describes what the game *thinks* the
world looks like. On load, the observation service validates current VM state
against saved world flags and reconciles any drift.
Minor drift (service restarted, log rotated by the OS) is handled silently.
Major drift (a VM that should be running is gone, a snapshot reference is
missing) triggers the recovery flow.
---
## DIRTY STATE RISKS AND MITIGATIONS
### Risk 1: Snapshot Reference Goes Stale
A named snapshot the game references is deleted or corrupted outside the game.
Mitigation: On load, the save system checks all referenced snapshots exist
before resuming. If a checkpoint snapshot is missing but baseline.clean exists,
offer to resume from baseline with authored-flag reconstruction where possible.
If baseline.clean is also gone, the VM is treated as unrecoverable and the
reimage flow is offered.
### Risk 2: Live Disk State is Unbootable
The player damaged the VM beyond booting — corrupted bootloader, deleted
critical system files, broke networking in a way that prevents observation.
Mitigation: The game detects unbootable VMs through libvirt domain state and
failed SSH probes. The player is notified in-world ("hermes is not responding")
and the reimage flow is offered. The game does not attempt to force-boot or
auto-repair.
### Risk 3: Multiple VMs Diverge from Each Other
The player fixed hermes but their notes reference a service that is now
configured differently. Cross-VM state is inconsistent with authored
expectations.
Mitigation: World flags are the source of truth for cross-VM consequences, not
raw VM state. If the flags say nginx_stable but hermes currently has nginx
failed, the validation service surfaces this on next observation pass and raises
an in-world event. The player is not penalized for drift that happens while they
are offline — but they are informed.
### Risk 4: Disk Space on Host
qcow2 images with many snapshots can balloon. Long save histories consume real
host storage.
Mitigation: Managed checkpoint retention policy. The game keeps a maximum of N
shift checkpoints per VM (default: 5) and prunes the oldest on new checkpoint
creation. Authored baseline and recovery snapshots are never pruned by the game.
A storage budget field in vm_profiles allows per-VM tuning.
Resource budget note:
- Budget the workstation separately from server VMs
- Even a modest workstation profile should be budgeted separately from server VMs
- Save/recovery tooling should assume workstation snapshots are the most
storage-expensive routine snapshots in the fleet
- Earlier lab builds showed that browser-capable workstation images can exceed
small cloud-image defaults quickly; the terminal-first plan avoids much of
that pressure, but disk budgets still need to be explicit
---
## THE REIMAGE FLOW
When a VM is unrecoverable, the player can report it for reimage through an
in-world mechanic (ticket to management or ops channel).
Flow:
1. Player submits a reimage request for the affected machine
2. An in-world delay is imposed (e.g., 1 in-game shift)
3. The machine is restored from baseline.recovery or baseline.clean
4. Trust penalty is applied based on severity
5. Any in-progress quests on that VM are reset to their baseline state
6. Evidence from before the reimage is gone — acknowledged in-world as "we
had to wipe the machine"
This is not a free reset. It has visible consequences. But it allows the game
to continue rather than becoming permanently stuck.
The reimage flow is the designed escape valve, not a hidden automatic recovery.
---
## SHIFT CHECKPOINTS
At the start of each in-game shift, the game:
1. Clears all `persists: false` world flags
2. Saves all game state JSON (with non-persistent flags already zeroed)
3. Creates a named snapshot for each active VM: `checkpoint.shift-{N}`
4. Records the checkpoint reference in the save file
5. Prunes shift checkpoints beyond the retention limit
This gives the player a rollback option at shift granularity if they want to
undo a disastrous session, at the cost of losing that shift's work entirely.
Shift checkpoint rollback is an explicit player action, not automatic. It is
presented as "start this shift over" and requires confirmation. It does not
undo trust changes or world flag consequences that were sent to other characters
(e.g., dialogue already delivered, tickets already closed).
---
## DEVELOPER RESET
For authoring and testing, a separate CLI tool exists outside the game:
```bash
bash tools/vm/snapshot-all.sh --revert-to baseline.clean
```
This is not accessible in the shipped game. It completely resets all VMs to
their authored baseline. Used during content authoring and automated test runs.
---
## SAVE FILE STRUCTURE (DRAFT SCHEMA)
```json
{
"save_version": 1,
"player": {
"trust": 14,
"trust_history": [],
"unlocks": ["sudo:systemctl", "vm:build_machine"],
"current_shift": 7
},
"world": {
"flags": {
"player_ssh_configured": true,
"nginx_stable": true,
"hermes_logrotate_healthy": false,
"hermes_log_pressure_pending": true,
"hermes_disk_healthy": false
},
"flag_history": [],
"_note": "persists:false flags are zeroed at shift boundary before this snapshot is written. They survive game load within the same shift."
},
"quests": {
"completed": ["Q001", "Q002"],
"failed": [],
"active": ["Q003"],
"branch_outcomes": {
"Q002": "config-fixed-enabled"
}
},
"tickets": {
"active": ["T003"],
"closed": ["T001", "T002"]
},
"incidents": {
"active": [
{
"id": "I001",
"started_at_shift": 6,
"escalation_step_reached": 1
}
],
"resolved": []
},
"vms": {
"workstation": {
"current_snapshot_tier": "live",
"last_checkpoint": "checkpoint.shift-6",
"recovery_snapshot": "baseline.recovery",
"reimage_count": 0,
"last_observation": {}
},
"web_server": {
"current_snapshot_tier": "live",
"last_checkpoint": "checkpoint.shift-6",
"recovery_snapshot": "baseline.recovery",
"reimage_count": 0,
"last_observation": {}
}
}
}
```
---
## DESIGN PRINCIPLES SUMMARY
- The dirty state is the game. Preserving it is the point.
- Snapshots are structured fallbacks, not the primary save mechanism.
- The game never silently reverts VM state without player awareness.
- Recovery from failure is in-world and has consequences.
- The host disk cost is real and must be managed with a retention policy.
- Developers get clean-reset tooling outside the shipped game.
- `persists: false` flags reset at shift boundary, not on load.
+103
View File
@@ -0,0 +1,103 @@
# SYSADMIN CHRONICLES — SNAPSHOT CHAIN
> Version 1.0
>
> This document defines what each named baseline snapshot represents,
> how the snapshot chain is built, and what assumptions quest authors
> can make about VM state at each snapshot.
---
## POLICY
Each `baseline.post-qXXX` snapshot represents the **canonical clean-branch
outcome** of quest QXXX — meaning all prior quests were resolved via their
highest-priority (best) solution branch.
Player state diverges from the baseline during play. The baseline is always
the authored "good state" for that point in the arc, built independently of
any player's actual save.
**A baseline snapshot is never built from a bad or partial branch outcome.**
If a player took the wrong branch, their VM state differs from the baseline
for all subsequent quests. That divergence is intentional and is the game.
---
## SNAPSHOT CHAIN TABLE
| Snapshot Name | VM(s) | Built After | Represents |
|---------------|-------|-------------|------------|
| `baseline.day-one` | workstation | fresh image | Brand new ares workstation. No player account SSH key. Provisioning script ran but authorized_keys absent. |
| `baseline.clean` | web_server | fresh image | Fresh hermes. nginx installed, no config errors, logrotate present, web root owned by www-data. Ready for Q002 to break it. |
| `baseline.clean` | build_machine | fresh image | Fresh vulcan. NTP disabled (for Q006 scenario). Arch base install, pacman configured to use internal repo. |
| `baseline.post-q001` | workstation | Q001 clean branch | Player SSH key in authorized_keys with correct permissions (0600 file, 0700 dir). Used as the implied state for all subsequent quests requiring SSH access. Not an explicit snapshot — workstation just stays live from Q001 onward. |
| `baseline.post-q004` | web_server | Q004 clean branch | hermes with: nginx stable+enabled, logrotate configured, web root owned by www-data recursively. All of Q002Q004 resolved cleanly. Used as starting state for Q005 and Q007. |
| `baseline.post-q006` | build_machine | Q006 clean branch | vulcan with NTP enabled and healthy, archlinux-keyring refreshed, builds working. Used as starting state for Q008. |
---
## HOW SNAPSHOTS ARE BUILT
Snapshots are produced by `tools/vm/seed-vms.sh` in sequence:
```
1. Build base VM images from cloud-init or preseed
2. Run base configuration (hostname, users, packages, game helpers)
3. Run suppress-maintenance-noise.sh
4. Take baseline.clean snapshot
5. Run Q001-prep.sh → take no snapshot (workstation stays live)
6. Run Q002-prep.sh through Q004-prep.sh sequentially on web_server
7. Apply clean-branch outcome state manually or via a post-quest-state script
8. Take baseline.post-q004 snapshot on web_server
9. Run Q006-prep.sh on build_machine
10. Apply clean-branch outcome state on build_machine
11. Take baseline.post-q006 snapshot on build_machine
```
Step 7 and 10 ("apply clean-branch outcome state") are done via dedicated
scripts in `tools/vm/quest-prep/`:
```
Q004-post-clean.sh — sets web root ownership, confirms logrotate, enables nginx
Q006-post-clean.sh — enables systemd-timesyncd, refreshes archlinux-keyring
```
These post-clean scripts are the authoritative definition of what "clean
branch" means for snapshot purposes.
---
## WHAT QUEST AUTHORS CAN ASSUME
When authoring a quest against `baseline.post-q004`, you can assume:
- nginx is active and enabled on hermes
- /etc/logrotate.d/nginx exists and is correct
- /var/www/axiomworks is owned by www-data recursively
- The deploy service runs as www-data and can write to /var/www/axiomworks
- No Q002/Q003/Q004 broken state exists
- Q005 and Q007 both build on this clean hermes state
When authoring a quest against `baseline.post-q006`, you can assume:
- Everything in post-q004 (hermes state)
- systemd-timesyncd is active and enabled on vulcan
- archlinux-keyring is up to date
- pacman -Syu works without signature errors
- Q008 uses this as its clean starting baseline
If your quest needs to break something that was fixed in a prior quest,
your prep script must re-break it after the post-clean baseline is applied.
Document this explicitly in your prep script's header comment.
---
## DEVELOPER RESET
To rebuild all baselines from scratch:
```bash
bash tools/vm/snapshot-all.sh --revert-to baseline.clean
bash tools/vm/seed-vms.sh
```
This is destructive and should only be run during authoring or CI.
It is not available in the shipped game.
+423
View File
@@ -0,0 +1,423 @@
# Story Design Context — Sysadmin Chronicles
For story designers and AI agents creating new quests and narrative content.
**Related docs:**
- `CHARACTERS.md` — character bios, relationships, story hooks
- `COMPANY_LORE.md` — world, company, tone
- `QUEST_AUTHORING.md` — technical JSON spec for implementers
This document answers: *how does story actually work in this game, and what does a quest
concept need to contain to be usable?*
---
## The Core Premise
The player is a new junior sysadmin at Axiom Works, a mid-size B2B software company.
They are replacing someone named Dale. Nobody will explain why Dale is gone.
The game is played entirely through a simulated work environment: a terminal, an email
inbox, and a company website. There are no cutscenes, no narration, no inventory, no
combat. Everything that happens is expressed through:
- **Tickets** — the player receives a ticket describing a problem
- **The terminal** — the player SSHes into VMs, investigates, and fixes things
- **Character dialogue** — characters react to how the player solved the problem
- **The next ticket** — the world moves on, and the consequences of what the player
did are baked into the next situation
That's it. Story is not told — it is accumulated from the choices the player makes
when fixing real Linux problems on real virtual machines.
---
## The Three Machines (VMs)
Every quest happens on one or more of these machines. Their narrative identities
matter as much as their technical roles.
### ares — the Workstation
The player's home machine. Ubuntu 24.04. Quests here are onboarding-flavored —
establishing access, learning the environment. It's the only machine the player
can reach on day one.
*Narrative identity:* Where you start. Safe-ish. The first one you break is here.
### hermes — the Web / App Server
Debian 12. Runs nginx and the AxiomFlow demo/staging application. This is the
machine that Sarah Chen cares about, that customers can feel, and that Priya Nair
watches for security posture. Most of the early-game quests are here.
*Narrative identity:* The product's face to the world. Breaking this makes noise
immediately. The most politically visible machine.
### vulcan — the Build Machine
Arch Linux. Compiles packages, runs the internal build pipeline, serves packages
to hermes via an internal apt repo. Nikhil Sharma owns this in principle but nobody
manages it daily. Things here break silently until hermes starts serving bad software.
*Narrative identity:* The machine nobody watches until something downstream fails.
Quests here reveal that problems have upstream causes the player didn't expect.
### Planned future machines
As the story expands, new machines can be added. Each should have a clear narrative
role before it's introduced. (See `COMPANY_LORE.md` for the candidate list.)
---
## How Story Is Delivered
### Tickets as Act One
Every quest begins with a ticket in the player's inbox. The ticket is a short email
from a character describing a symptom — not a cause. The sender's perception of the
problem is usually incomplete and sometimes wrong. This is intentional: the player's
job is to investigate, not to execute instructions.
Good ticket writing:
- Describes what the sender experienced, not what the cause is
- Has the sender's voice and perspective (Sarah is outcome-focused; Dave is confused;
Priya is terse and specific)
- Does not hint at the solution
- Creates genuine stakes (site is down, builds are failing, someone is locked out)
Bad ticket writing:
- Explains the root cause ("the log file is too big")
- Has no character voice (generic IT help desk language)
- Stakes are unclear or low
### The Terminal as Act Two
The player investigates. They SSH in, run commands, read logs, check configs, look at
file ownership. The evidence is seeded into the VM baseline — it is genuinely there
to find, not procedurally generated. A good quest has a natural clue trail:
- The most obvious thing points to a second thing
- The second thing reveals the actual problem
- The fix is achievable with real Linux knowledge
The player cannot be told what to do. They can ask Marcus for hints (via dialogue
choices), but good players don't need to.
### Branching Resolution as Act Three
When the player has made changes to the VM, the game checks the state of the
system against the quest's solution branches. The branch that matches determines:
- What dialogue fires (Marcus's reaction, Sarah's reaction, Priya's follow-up)
- What trust delta the player receives
- What world flag is set (persistent story state)
- Whether an incident is triggered (a future consequence of a partial fix)
- What ticket comes next
**This is the central story mechanic.** Every quest should be designed with at
least two and ideally three resolution branches:
| Branch type | What it means |
|-------------|---------------|
| **Clean fix** | Player understood the root cause and solved it properly. High trust, no downstream risk. |
| **Acceptable fix** | Problem is solved but with a tradeoff — brittle approach, future maintenance burden, or incomplete cleanup. Lower trust. |
| **Regression** | Player fixed the symptom but made something else worse. Negative trust. Story consequences. |
The **regression branch** is not about punishment — it's about realism. A real
sysadmin who removes all SSH restrictions to restore one person's access has
technically solved the ticket while creating a larger problem. The story should
treat this as realistic professional consequence, not a game-over failure.
Players on a clean-fix path get more trust, unlock more access, and receive warmer
character reactions. Players on a regression path continue playing but face the
downstream effects of their choices.
---
## World Flags — Persistent Story State
World flags are string keys set when a quest's branch resolves. They persist for
the entire playthrough and can be read by later quests, incidents, and dialogue.
Examples:
- `hermes_logrotate_healthy` — set when the player properly fixed log rotation
- `hermes_ssh_allowusers_fragile` — set when the player restored SSH access using
the brittle AllowUsers approach instead of the robust AllowGroups approach
- `player_ssh_configured` — set when the player successfully set up SSH on day one
World flags are how story continuity works. A later quest can check whether the
player fixed something correctly earlier and behave differently. Marcus can reference
a past fix. Priya can flag a previously introduced risk in a later audit. A problem
that was "solved" with a quick fix can recur.
**When designing a new quest, ask:** what flag should this set, and what future quests
or dialogue might reference it?
---
## Trust — The Narrative Currency
Trust is a numeric score that tracks the player's professional standing with Marcus
and the IT team. It affects:
- **VM access** — the player gains SSH access to hermes and vulcan as trust increases.
If trust drops badly, access can be revoked.
- **Documentation access** — more trusted players get access to internal runbooks
and admin guides
- **Character warmth** — Marcus's messages change tone subtly as trust grows
- **Incident visibility** — at a certain trust level, the player starts seeing
background incidents before they become critical
Trust is not displayed as a raw number. Players experience it as consequences.
**For quest designers:** each branch should have a `trust_delta` that reflects the
quality of the fix. A proper root-cause fix should earn more than a workaround.
Regression branches should cost trust. Day-one onboarding quests are lenient;
later quests at higher tiers should be less forgiving.
---
## Incidents — Consequences of Incomplete Fixes
An incident is a time-delayed consequence that fires when a quest's partial-fix
branch was taken. It represents the problem coming back.
Example: The player clears a full disk by deleting a log file but doesn't restore
the logrotate config. Two in-game hours later, the disk starts filling again. Dave
notices. The player gets another ticket about the same symptom.
Incidents are not punishments — they are realistic. The world doesn't stay fixed
just because the player touched it. A player who takes clean-fix branches will
rarely see incidents. A player who takes every shortcut will find their ticket queue
filling up with problems they already "solved."
For story purposes: incidents can also carry narrative weight. If the player made a
security regression, an incident could represent an audit finding, an unusual login,
or a configuration discrepancy Priya noticed.
---
## The Character Conversation Model
Quest dialogue fires after a branch resolves. Three characters can speak:
### Marcus Webb
The primary voice. Appears in every quest. His post-resolution message reflects:
- What the player actually did (not just whether they succeeded)
- Whether they understood the root cause or just cleared the symptom
- A forward-looking observation (usually a quiet flag for what's coming next)
Marcus does not praise effusively or scold dramatically. He states what he observed.
His message for a clean fix is warmer and sometimes wry. His message for a regression
is brief and pointed. He never says "well done!" He might say "that's the right call."
### Sarah Chen
Speaks when the quest affects something product-facing (hermes being up or down,
deploys working or failing). Her messages are reactive — she responds to outcomes,
not process. She is not hostile unless the player makes her situation worse.
### Priya Nair
Speaks when the quest has security implications — access changes, hardening,
audit posture. She does end-of-shift reviews that grade overall performance.
Her per-quest messages are brief and evaluative. She notices things Marcus might not.
### Other characters
Dave Okonkwo files tickets. He does not have post-resolution dialogue — he
just stops or starts noticing things. Future characters (Kowalski, Nikhil, Tanya)
can speak in dialogue if quests are designed to involve them.
---
## The Narrative Arc
The overall story has six phases. Quests should be designed with their phase in mind.
The phase is usually not visible to the player — it emerges from what's happening
around them.
### Phase 1 — Normal Work
*Tier 1 quests. Early game.*
The player is new. Everything is routine. Marcus is helpful. The problems are real
but not alarming — a broken config, a full disk, a permission issue. The player is
learning the environment. The subtext is that things are slightly more wrong than
they should be, but there's nothing to point at.
Hidden layer: small anomalies in the systems that curious players can notice but
don't have context for yet.
### Phase 2 — Unease
*Tier 1/2 transition.*
The problems start to have patterns. The same kind of thing breaks twice. A fix
the player made doesn't hold the way it should. Nothing is alarming, but Marcus's
messages have a slightly different quality — he notices things he doesn't explain.
Hidden layer: a world flag from an early quest points somewhere unexpected.
### Phase 3 — Suspicion
*Tier 2 quests. Mid game.*
The player starts encountering problems they didn't cause and can't fully explain.
Access was changed by someone. A config was edited recently. A log shows an
unusual pattern. Nobody is accusing anyone. But the player now has enough context
to start asking questions — even if no quest explicitly tells them to.
This is where Dale becomes relevant again. The systems the player inherits were
last touched by Dale. Some of them have been in a particular state for a long time.
### Phase 4 — Investigation
*Tier 2/3 transition.*
The player has connected enough dots to understand that something happened before
they arrived. The quests in this phase involve digging into logs, access records,
and configuration history. The investigation is framed as professional work
(audit the access logs, trace the package build history) — but the results tell
a story.
Marcus's messages are shorter. Priya starts appearing more. Kowalski schedules a
meeting nobody explains.
### Phase 5 — Conflict
*Tier 3 quests. Late game.*
The player knows what happened. Acting on that knowledge has professional
consequences. The conflict is not physical — it is about what the player chooses
to surface, who they tell, and what they do with access they were given for one
purpose that could be used for another.
### Phase 6 — Resolution
*Endgame.*
The situation resolves. The ending the player gets depends on the world flags
accumulated across their entire playthrough — not just whether they clicked the
"good ending" button. A player who took clean-fix branches throughout, built
trust, and noticed the hidden anomalies gets a different ending than a player
who patched symptoms, lost trust, and missed everything.
---
## What Makes a Good Quest Scenario
The best quests have a **plausible mundane cause** and a **visible technical trail**.
Players should never need to guess — they should be able to find the answer by
looking at the right files and running the right commands.
### Good scenario types
- Service down → config syntax error → player traces error output to the line
- Disk full → log file enormous → logrotate config missing → player restores it
- Deploy fails → files owned by wrong user → someone ran a script as root manually
- Build failures → clock drift → NTP not running → player enables time sync
- Access locked out → sshd_config modified → wrong directive → player corrects it
- App crashes after update → bad package from internal repo → player traces to source
### What makes these work
1. **The symptom is real and urgent.** Something is actually broken.
2. **The cause is discoverable.** The evidence is in logs, config files, or system state.
3. **The fix is a real Linux operation.** Not artificial — `chown`, `systemctl`, editing
a config, fixing a cron entry, rolling back a package.
4. **Multiple approaches exist.** The quick fix works. The proper fix is better and
the game knows the difference.
5. **The character reactions are grounded.** Sarah cares about the demo being up.
Priya cares about the access control implications. Marcus cares about whether the
player understood what they were doing.
### Bad scenario types to avoid
- Problems that require packages not in the VM's guaranteed baseline (see `QUEST_AUTHORING.md`)
- Problems that require real-time events the validation engine can't check
- Problems where the "correct" fix is the only fix (no meaningful branch differentiation)
- Problems that break the fourth wall or require the player to know game-layer information
- Problems that are gotchas rather than investigations (the cause can't be found by looking)
---
## Hidden Anomalies — Environmental Storytelling
Every 35 quests should include something unusual in the VM environment that the player
is not told about and not required to engage with. These are not quest objectives.
They are breadcrumbs for curious players.
Examples of the kind of thing these should be:
- A user account that shouldn't exist
- A log entry from an odd time that doesn't match the official history
- A file that was modified recently but wasn't part of the quest setup
- A cron job that's been disabled but was once important
- An SSH key in authorized_keys that doesn't belong to anyone obvious
These anomalies should be consistent with the overall narrative arc — a player who
collects them across the whole game should be able to piece together what happened
before they arrived. They should never be labelled, never referenced in objectives,
and never required. They are for the players who look.
---
## Quest Output Format for Story Agents
When proposing new quests, provide the following. This is the minimum needed for
a technical author to implement the quest.
```
Quest ID: QXXX
Title: [player-facing]
Narrative phase: [16]
Tier: [1, 2, or 3]
Primary VM: [ares / hermes / vulcan]
Additional VMs: [if any]
Scenario summary:
What is broken, why it is broken (the root cause), and what the player
will encounter. 13 sentences. Written for the implementer, not the player.
Ticket:
From: [character name]
Subject: [email subject line]
Body: [the email the player receives. Written in the sender's voice.
Describes the symptom. Does not explain the cause.]
Clue trail:
What the player will find when they investigate. The evidence that leads
them to the root cause. Describe the actual files, log entries, and system
states — not the player's steps.
Solution branches:
Branch 1 (clean fix, highest trust):
What the player has done. Why it's correct. Trust delta.
Branch 2 (acceptable fix):
What the player has done. What tradeoff it introduces. Trust delta.
Branch 3 (regression, if applicable):
What the player did wrong. What it breaks. Negative trust delta.
Character reactions:
Marcus (post-resolution):
Clean: [what Marcus says]
Acceptable: [what Marcus says]
Regression: [what Marcus says]
Sarah / Priya (if relevant):
[reaction to the specific outcome that affects them]
World flags set: [list flags each branch sets]
Follow-up incident (if any): [what recurs if the acceptable-fix branch was taken]
Hidden anomaly (if any): [something unusual seeded into the VM that's not part of
the quest objectives]
Narrative notes: [anything a future quest author should know — Dale connections,
story threads this opens or closes, things characters should remember]
```
---
## The Dale Thread — Notes for Story Designers
Dale's story should emerge slowly from the systems themselves, not from exposition.
When designing quests — especially mid-to-late game — consider:
- **What did Dale last touch?** The VMs the player inherits have a history. Some
configurations were made by Dale. Some are good. Some are wrong in ways that
suggest Dale was dealing with something.
- **What was Dale trying to do?** As the investigation phase develops, the picture
should become coherent. Dale wasn't random — there was a pattern to their actions.
- **Who knew?** Marcus knew Dale. Priya may have been involved in whatever ended
Dale's tenure. Kowalski definitely knows. The player assembles this from fragments,
not a scene where someone explains it.
- **The player is inheriting Dale's problems.** Some of the broken things the player
fixes are broken because Dale broke them. Some of the broken things were broken on
purpose. The player won't know which is which until later.
The reveal of what Dale did should feel like the player figured it out, not like the
game told them.
+187
View File
@@ -0,0 +1,187 @@
# VM Build System
## Overview
VM provisioning uses a modular driver + profile pattern. One driver script handles
the full build pipeline; per-VM profile files declare what makes each machine
distinct. Adding a new VM means writing one profile file — no changes to the driver.
## Structure
```
tools/vm/
build-vm.sh # Driver — sources a profile and runs the build pipeline
build-workstation.sh # Wrapper → build-vm.sh profiles/workstation.sh
build-web-server.sh # Wrapper → build-vm.sh profiles/web-server.sh
build-build-machine.sh # Wrapper → build-vm.sh profiles/build-machine.sh
profiles/
workstation.sh # sc-workstation / ares — XFCE desktop (Debian)
web-server.sh # sc-web-server / hermes — nginx app server (Debian)
build-machine.sh # sc-build-machine / vulcan — build toolchain (Arch)
lib/
common.sh # Shared libvirt helpers (pool, domain, seed ISO, wait-for-IP)
```
## Invocation
```bash
# By wrapper (backwards-compatible)
./build-workstation.sh [--dry-run] [--force]
./build-web-server.sh [--dry-run] [--force]
./build-build-machine.sh [--dry-run] [--force]
# By driver directly — profile name (no extension) or explicit path
./build-vm.sh workstation [--dry-run] [--force]
./build-vm.sh profiles/web-server.sh --force
```
`--dry-run` skips all libvirt/qemu-img calls and prints what would run.
`--force` destroys and recreates a domain that already exists.
## Profile Contract
A profile is a bash file sourced by `build-vm.sh`. It must set these variables:
| Variable | Example | Description |
|----------|---------|-------------|
| `DOMAIN` | `sc-web-server` | libvirt domain name |
| `HOSTNAME` | `hermes` | Guest hostname |
| `RAM_MB` | `512` | Memory in MB |
| `VCPUS` | `1` | vCPU count |
| `DISK_SIZE` | `8G` | qcow2 overlay size |
| `GRAPHICS` | `vnc` | `vnc`, `spice`, `spice-qxl`, or `none` |
| `BASE_URL` | `https://...` | URL to download base cloud image from |
| `BASE_IMAGE` | `$SC_BASE_DIR/...` | Local path to cache the base image |
It must also define `generate_user_data()` — a function that prints the complete
cloud-init `#cloud-config` YAML to stdout. The driver calls this function and writes
the output to the seed ISO. The following variables are available when the function
runs (set by the driver after sourcing the profile):
| Variable | Value |
|----------|-------|
| `PUBKEY` | Contents of `${SC_SSH_KEY}.pub` |
| `GAME_HOST_IP` | `${SC_GAME_HOST_IP:-10.42.0.1}` |
| `POOL_DIR` | Resolved libvirt pool path |
| `DISK_PATH` | `$POOL_DIR/${DOMAIN}.qcow2` |
| `SEED_ISO` | `$SC_SEED_DIR/${DOMAIN}-seed.iso` |
Profile-specific variables (e.g. `HUD_URL`, `SAGE_URL`, `PRIVKEY_INDENT`) are set
in the profile before `generate_user_data` is defined and are available inside it.
## Writing a New Profile
1. Copy `profiles/web-server.sh` as a starting point.
2. Set the 8 required variables.
3. Write `generate_user_data()` with the cloud-init YAML for the new machine.
4. Run `./build-vm.sh profiles/my-new-vm.sh --dry-run` to validate.
5. Run without `--dry-run` to build.
No changes to the driver or any other file are needed.
## Build Pipeline (driver)
1. Parse `--dry-run` / `--force` flags
2. Resolve and source the profile file
3. Validate required variables and `generate_user_data` function exist
4. Source `lib/common.sh` (sets `SC_*` env, exposes helpers)
5. Run `ensure_vm_tooling` (checks virsh, qemu-img, virt-install, SSH keys, pool/network)
6. If domain exists and `--force` not set: exit cleanly
7. `download_if_missing` — fetch base image if not cached
8. Call `generate_user_data` → write to tmpdir, build NoCloud seed ISO
9. `destroy_domain` — remove existing domain if present
10. `create_backing_disk` — qcow2 overlay over the base image
11. `build_import_domain``virt-install --import`, enable autostart
12. `wait_for_agent_ip` — poll QEMU guest agent for IP (up to 300 s)
13. Cleanup tmpdir on exit (trap)
## Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `SC_GAME_HOST_IP` | `10.42.0.1` | Host machine IP on the game network |
| `SC_SSH_KEY` | `~/.ssh/sc_host_key` | SSH key pair used for all host→guest connections |
| `SC_BASE_DIR` | See `common.sh` | Where base cloud images are cached |
| `SC_SEED_DIR` | See `common.sh` | Where cloud-init seed ISOs are written |
| `SC_POOL_NAME` | `sc-images` | libvirt storage pool |
| `SC_NETWORK_NAME` | `sc-internal` | libvirt network |
| `LIBVIRT_DEFAULT_URI` | `qemu:///system` | Override to `qemu:///session` for user-mode libvirt |
| `SC_WORKSTATION_GRAPHICS` | `spice` | Override workstation graphics backend |
## Current VMs
| Profile | Domain | Hostname | OS | RAM | vCPUs | Disk | Graphics |
|---------|--------|----------|----|-----|-------|------|----------|
| `workstation.sh` | `sc-workstation` | `ares` | Debian 12 | 2048 MB | 2 | 20 G | SPICE |
| `web-server.sh` | `sc-web-server` | `hermes` | Debian 12 | 512 MB | 1 | 8 G | VNC |
| `build-machine.sh` | `sc-build-machine` | `vulcan` | Arch Linux | 768 MB | 2 | 10 G | VNC |
## Hostname Resolution
All VMs resolve internal hostnames via static `/etc/hosts`. There is no DNS server
on the game network — this matches how small company networks often work before a
proper internal DNS is set up.
Each VM only has entries for the hosts it needs to reach:
- **ares** (workstation): knows `hermes`, `vulcan`, `portal.axiomworks.internal`, `sage.axiomworks.internal`
- **hermes**: knows `portal.axiomworks.internal`
- **vulcan**: knows `hermes` (deploy target), `portal.axiomworks.internal`
The `.axiomworks.internal` domain is fictional but realistic — real companies use
private suffixes like `.internal` or `.corp` for their infrastructure.
## Networking Notes
- All VMs attach to the `sc-internal` libvirt network
- The host machine (10.42.0.1) serves the game portal (`:3000`) and Sage KB (`/sage/`)
- Fixed IPs used in `/etc/hosts` across VMs: hermes=10.42.0.40, vulcan=10.42.0.24
- These must match the DHCP reservations configured in `network-sc-internal.xml`
- IPv6 disabled on all VMs (sysctl) — not needed, reduces noise
## Performance Tuning
All VMs share a common sysctl baseline applied via `/etc/sysctl.d/`:
| Setting | Value | Rationale |
|---------|-------|-----------|
| `vm.swappiness` | 10 | Prefer RAM; swap only under real pressure |
| `vm.vfs_cache_pressure` | 50 | Keep inode cache warm longer |
| `vm.dirty_ratio` | 1525 | Batch writes; vulcan higher for build workloads |
| IPv6 disabled | — | Removes unnecessary network overhead |
All VMs have a swap file (512 MB 1 GB depending on role) created at first boot.
## DHCP Reservations and MAC Addresses
Fixed IPs are set via DHCP reservations in `network-sc-internal.xml` and the live
libvirt network. The reservations reference MAC addresses, which virt-install
**generates fresh on every `--force` rebuild**. After any rebuild, the old
reservation is stale and the VM will get a random IP from the pool.
After a `--force` rebuild, update the reservations:
```bash
# 1. Get the new MAC
virsh domiflist sc-web-server # (or sc-workstation, sc-build-machine)
# 2. Remove the old reservation (use the old MAC from network-sc-internal.xml)
sudo virsh net-update sc-internal delete ip-dhcp-host \
"<host mac='OLD_MAC' name='hermes' ip='10.42.0.40'/>" --live --config
# 3. Add the new one
sudo virsh net-update sc-internal add ip-dhcp-host \
"<host mac='NEW_MAC' name='hermes' ip='10.42.0.40'/>" --live --config
# 4. Update network-sc-internal.xml to match
```
The VM will pick up the reserved IP on its next DHCP renewal (or reboot).
### Current reservations
| VM | Domain | Hostname | MAC | IP |
|----|--------|----------|-----|----|
| Workstation | sc-workstation | ares | `52:54:00:bd:aa:29` | 10.42.0.36 |
| Web server | sc-web-server | hermes | `52:54:00:49:9b:64` | 10.42.0.40 |
| Build machine | sc-build-machine | vulcan | `52:54:00:5e:9f:b9` | 10.42.0.24 |
+56
View File
@@ -0,0 +1,56 @@
# Workstation Polish Backlog
Captured from playtest notes. These items are intentionally left unresolved for a later pass.
## Launcher And Viewer
- ~~Make `./scripts/start-game.sh` executable by default.~~ **RESOLVED** — file is `rwxr-xr-x`.
- ~~Prevent Chromium from auto-launching on workstation login.~~ **RESOLVED** — removed the `game-hud.desktop` autostart entry from `workstation.sh`. Players open the Axiom Works portal from the desktop launcher when they want it.
- ~~Fix fullscreen toggling in the workstation viewer. The current `FULLSCREEN.txt` says `Shift+F12` but that is the cursor-release binding; fullscreen toggle is `F11`.~~ **RESOLVED** — Renamed to `VIEWER_HELP.txt`, corrected key bindings, expanded to cover fullscreen, cursor release, zoom, copy/paste, and USB redirect.
- Make sure the player can exit fullscreen without shutting down the VM.
- Investigate whether virt-viewer / the SPICE client can auto-detect and apply the host's native resolution when entering fullscreen mode. SPICE supports dynamic resolution via the vdagent service (already installed); verify the guest `spice-vdagent` is running and that the display XML uses `<channel name="spicevmc"/>` so resize events actually reach the guest.
## HTTPS / TLS
- Make all in-VM websites (portal, Sage, company website) serve over HTTPS. Approach: generate a self-signed CA during workstation cloud-init, install it into Chromium's trust store and the system CA bundle, then issue a wildcard or multi-SAN cert for `*.axiomworks.corp`, `*.axiomworks.internal`, and `portal.axiomworks.internal`. Configure the game server to serve TLS (or put nginx in front for all sites), and update all internal URLs to `https://`. No browser warnings, everything looks legitimate. Not required for gameplay but raises the production feel significantly.
## Desktop UX
- ~~Ensure the Axiom Works portal desktop icon is executable/trusted out of the box.~~ **RESOLVED**`Portal.desktop` is provisioned with permissions `0755`, and `workstation.sh` seeds GVFS trusted metadata with a login-time reload fallback.
- Remove mail from the top of the XFCE applications menu, since the portal handles email. (Low priority — no mail client is installed, so this is unlikely to appear.)
- ~~Set Tilix as the default terminal entry in the applications menu.~~ **RESOLVED**`update-alternatives --set x-terminal-emulator /usr/bin/tilix` and `helpers.rc` both configured in `workstation.sh` runcmd.
- The XFCE **Applications → System → Terminal Emulator** menu entry still launches the XFCE terminal emulator instead of Tilix. `update-alternatives` sets the system default but XFCE's own preferred-applications config (`xfce4-terminal.desktop` precedence) overrides it for that menu entry. Fix by either: removing `xfce4-terminal` from the installed packages, or writing a `~/.config/xfce4/helpers.rc` entry that explicitly maps `TerminalEmulator=tilix`, or adding a `preferred-applications.xml` override in the XFCE config directory.
- ~~Keep the XFCE dark theme as the default desktop theme.~~ **RESOLVED**`xsettings.xml` sets `Adwaita-dark` theme in `workstation.sh`.
- ~~Tilix launched from the desktop icon opens in `/Desktop` by default instead of `/home/player`. Fix the `Terminal.desktop` launcher to set `Path=/home/player` so the initial working directory is the home directory.~~ **RESOLVED**`Path=/home/player` added to `Terminal.desktop` in `build-workstation.sh`.
- ~~Preserve clean desktop icon placement after removing `cidata`.~~ **RESOLVED**`workstation.sh` seeds XFCE desktop icon layout files so Terminal and Portal sit in the chosen top-right positions and viewer help stays bottom-left after rebuilds.
## Workstation Lifecycle
- ~~Take a clean snapshot after the workstation is fully configured and validated.~~ **RESOLVED**`seed-vms.sh` takes `baseline.day-one` and `baseline.recovery` snapshots after workstation build.
- ~~Treat workstation shutdown as the end-of-shift game exit; save workstation state.~~ **RESOLVED (server side)**`VMManager.ensureWorkstationLive()` in the Node.js server handles startup. Game server cleanly shuts down when `start-game.sh` exits (SIGTERM). VM suspend-on-quit is a future enhancement.
- ~~Rebuild or restore from the clean snapshot when needed, but allow the live workstation to drift during play.~~ **RESOLVED**`always_live: true` in `workstation.json` means shift checkpoints skip the workstation; it drifts freely and is only restored from `baseline.recovery` on catastrophic failure.
## Terminal Experience
~~All in-game terminal simulation items are obsolete~~ — the player uses a real Tilix terminal directly in the XFCE workstation VM. Arrow key history, tab completion, copy/paste, scrollback, and interactive programs (vim, htop, etc.) all work natively.
## Browser and Bookmarks
- The Chromium bookmarks bar shows the default Debian bookmarks. The game-specific bookmarks are buried under a "Managed bookmarks" folder instead of sitting directly in the bar. Move the managed bookmarks to the top-level bar and remove the default Debian entries. This is controlled by the `ManagedBookmarks` policy in `/etc/chromium/policies/managed/bookmarks.json`; restructure the JSON so items appear at bar level rather than inside a named folder.
- ~~All four managed bookmarks go to the same URL; anchors don't work.~~ **RESOLVED** — Bookmarks reduced to two: "Axiom Works Portal" and "Sage (KB)" at `/sage/`.
## Sage — Knowledge Base
- Sage is intended to be a navigable knowledge base, not just a search box. It should feel like a real internal company wiki: organized into sections and categories that a player can explore by browsing, in addition to searching. The content is the KB data already planned for the game.
- Search should be lightweight and practical — something like Meilisearch (or a similarly small embedded-first search server) that indexes the KB content and serves fast full-text results without requiring a heavy backend.
- Sage should be a completely separate web application from the Axiom Works portal. It should have its own URL, its own visual design (distinct look and feel from the portal), and its own place in the bookmarks bar. In a realistic company, documentation tools are separate products (Confluence, Notion, internal wikis) from the ticketing portal — Sage should feel the same way.
- Add a Sage bookmark to Chromium once Sage has its own URL.
## VM Performance
- ~~Guest VM RAM maxed causing hangs.~~ **RESOLVED**`RAM_MB` raised to 1536 MB; 1 GB swap file added via `runcmd` in `build-workstation.sh` (fallocate + mkswap + fstab entry). Rebuild required to take effect.
## Visual Cleanup
- ~~Hide or remove the `cidata` desktop icon.~~ **RESOLVED**`build-vm.sh` detaches the cloud-init seed ISO after workstation readiness, so the CD-ROM is not exposed on the desktop or in file-manager device lists. `xfce4-desktop.xml` also keeps removable/device desktop icons hidden as a fallback.
- ~~Hide the internal `VirtIO Disk` from Thunar's Computer view.~~ **RESOLVED**`workstation.sh` installs a udev rule setting `UDISKS_IGNORE=1` on `vd*` system disk devices, keeping internal VM storage out of player-facing file-manager device lists.
+459
View File
@@ -0,0 +1,459 @@
# Characters — Sysadmin Chronicles
Story design reference. All characters, bios, relationships, and open story hooks.
For company/world context see `COMPANY_LORE.md`. This file focuses on the people.
---
## Active Characters
These characters have an established in-game voice and presence. Any new quest work
should treat their characterization here as canonical.
---
### The Player
**Role:** New junior sysadmin hire, day one
**Identity:** Unnamed. Player-selected portrait (5 options).
Hired to replace Dale. Nobody will explain what Dale did. Badge number is still
pending — temp credentials were handled by someone in Finance on their first day.
The player is a competent professional, not a bumbling intern. They may not know
every answer but they know how to look.
The player has no spoken lines. Their character is expressed entirely through the
choices they make when fixing things — whether they understand root causes or just
clear symptoms, whether they leave systems better or just less broken.
---
### Marcus Webb
**Role:** Senior Systems Administrator
**Email:** `m.webb@axiomworks.internal`
**Reports to:** Dave Kowalski (Director of IT)
Six years at Axiom Works. Hired by Kowalski. Knows where everything is, why it's
there, and which parts were a mistake. Communicates in short, precise messages.
Does not explain things twice. Trusts competence over credentials — he will give
the player more rope as they demonstrate they know what to do with it. If they
don't, the rope gets shorter.
He was the one who onboarded the player. He assigned their first ticket. He will
assign most of the tickets that follow. His messages range from brief task
assignments to late-night observations about something that's been on his mind —
the latter usually mean something is about to become a problem.
He knows what Dale did. He has decided not to discuss it.
**Personality:** Dry. Technically precise. Does not perform enthusiasm. Occasionally
wry but never jokey. Respects players who fix root causes. Mildly annoyed by
players who fix symptoms and call it done.
**Relationships:**
- Kowalski: reports to him; respectful but not deferential
- Sarah: professional; takes her tickets seriously, occasionally says quiet things when she's wrong
- Priya: mutual professional respect; they operate in the same zone of "things that matter when they go wrong"
- Phil Ruiz (Sales VP): warm; Phil owes Marcus for saving a demo once and Marcus has never mentioned it
---
### Sarah Chen
**Role:** Product Manager, AxiomFlow
**Email:** `s.chen@axiomworks.internal`
Owns the AxiomFlow product roadmap. Coordinates between sales, engineering, and
customers. Emails Monday mornings. Cares intensely about the demo and staging
environments because those are the product she can actually see and touch. Not wrong
about their importance.
She files tickets when things break on the product-facing side. Her descriptions of
problems are accurate about symptoms and often wrong about causes — she will
confidently diagnose a permissions issue as a script bug, or a package problem as a
config error. She is not incompetent; she just doesn't have the full picture. When
the player fixes the underlying cause rather than the surface symptom, she notices.
She has a sharp edge when things get worse after someone touches them. She will say
so, clearly, without being melodramatic about it.
**Personality:** Direct. Metric-oriented. Not patient with vague timelines or "we're
looking into it." Appreciates being told what the actual problem was, not just that
it's fixed.
**Relationships:**
- Marcus: professional; trusts that her tickets will be handled, doesn't ask for much
- Player: initially impersonal (they're new); warms or cools based on outcomes
- Nikhil Sharma: upstream dependency — his build pipeline affects her deployments
---
### Priya Nair
**Role:** Head of Security & Compliance
**Email:** `p.nair@axiomworks.internal`
**Direct report:** James Osei (Security Analyst)
Leads all security reviews, access audits, and compliance programmes. Has a standing
Thursday meeting with David Park (CTO) that has existed since 2017. Was brought in
after an incident nobody discusses in public. Has been building the security function
from something informal into something that can survive a SOC 2 audit.
She frames everything in terms of what happens when things go wrong, not whether they
will. She assumes breach. She assumes misconfiguration. She is often right. She is
not someone who appreciates hearing about a production change after it has already
happened.
She will tell the player when a fix is correct and why. She will also tell them when
a fix works but leaves the environment in a worse position than before. She is not
punitive about this — she just states it.
She does shift reviews at end-of-shift and grades the player's overall performance.
Her criteria: did the work move forward, did the environment stay stable, did the
player create extra problems.
**Personality:** Precise. Consequence-focused. Calm in tone even when the content
is not calm. Economical with words. Does not use exclamation marks.
**Relationships:**
- Player: evaluative; her trust is earned by demonstrating that security is a
consideration, not an afterthought
- Marcus: peer respect; they operate in different domains with overlapping concerns
- Dave Kowalski: reports indirectly up through him for infrastructure decisions
- David Park: standing Thursday meeting; she has the CTO's ear
> **Name note for developers:** The in-game email service and some ticket files
> previously used "Priya Kapoor" and the onboarding doc used "Priya Singh."
> These are all the same character. **Priya Nair** is the canonical name.
> Email should be `p.nair@axiomworks.internal`. Update references in
> `server/src/services/EmailService.js`, `content/tickets/T007.json`, and
> `content/docs/onboarding.json`.
---
### Dave Okonkwo
**Role:** Internal employee, non-technical
**Email:** `d.okonkwo@axiomworks.internal`
A regular Axiom Works employee who notices when things aren't working and files
tickets about it. He doesn't know enough to diagnose the problem — he reports
symptoms accurately and assumes the wrong cause. His reports are useful precisely
because they represent what a non-technical user actually experiences.
He is not on the company website (280 employees, most of them aren't). He's
somewhere in operations or general staff. He's not in Finance, not in IT.
> **Open decision:** Dave Okonkwo is currently the only employee-level character who
> submits tickets. The company website has Dave Kowalski as Director of IT Operations
> (Marcus's boss), which is a completely different person. This is not a naming
> inconsistency — they're two different people. However: if the story wants Kowalski
> to become an active character who also files tickets or escalates issues, that's a
> separate thread. Okonkwo and Kowalski coexist.
---
## Named Background Characters
On the company website. No current in-game presence. Available for story use —
they can send emails, appear on CC lines, be referenced in dialogue, or become
active characters in new quests.
Listed in rough order of story relevance to the IT/sysadmin context.
---
### Dave Kowalski — Director of IT Operations
Marcus's manager. The player's skip-level. Background is network engineering —
has Cisco certifications he will not volunteer unless provoked. Oversees systems
(Marcus's domain), networking (Tom Malaney), and IT support. Has been at Axiom
Works since 2015. Describes the infrastructure as "mature." Sends weekly status
emails in bullet points that never quite answer the question. When things go wrong
he schedules a meeting to "talk through the situation," which everyone has learned
is worse than a direct message.
Has said "we should really document that" more times than he can count. Has
documented very little personally. Maintains a mysterious Tuesday 23pm calendar
block.
Story use: source of policy pressure, indirect escalation, the person who asks
questions that reveal Marcus hasn't told the player everything.
---
### Nikhil Sharma — Platform Engineer
Owns the internal build and release pipeline, the CI infrastructure, and the
parts of deployment that nobody else wants to think about. Strong opinions about
reproducible builds. Sends Slack messages at 6am. Occasionally at 11pm.
He is the engineer most directly connected to what happens on vulcan — if a build
is broken, it's probably something Nikhil built or maintains. He has never met the
player. He almost certainly doesn't know the player exists.
Story use: the author of broken packages the player has to debug; a character who
can explain (or fail to explain) what went wrong upstream; an escalation path when
a build problem is genuinely his fault.
---
### Tanya Okafor — Head of Customer Success
Manages post-sale relationships for all AxiomFlow customers and the twelve legacy
AxiomSync accounts that haven't migrated. Uses the word "partnership" a lot.
Usually the first person to know when something is wrong in production, because a
customer has already called her before IT knows there's a problem. Her call log
is an early warning system. She is not hostile to IT but she has learned that
"we're looking into it" is not an answer she can give a customer.
Story use: pressure vector from the customer direction; source of urgency that
doesn't come from Marcus or the ticket queue; demonstrates real-world stakes when
things go down.
---
### Phil Ruiz — VP of Sales
Has been promising features to prospects since 2016. Maintains a warm relationship
with the infrastructure team because Marcus once fixed the staging environment with
twenty minutes to spare before a major demo — Phil has never forgotten this. Travels
frequently. Expense reports submitted promptly, which Marcus has noted approvingly.
Story use: indirect beneficiary when demos work; pressure source when a sales demo
is scheduled and something is broken; the person who will tell the CTO what IT did
right in a room the player will never be in.
---
### Yusuf Halabi — Engineering Manager
Reports to David Park (CTO). Manages the core AxiomFlow platform team. Runs the
Thursday architecture review. Has opinions about test coverage. Leaves pull request
comments that are technically correct and diplomatically suboptimal.
Story use: engineering-side escalation; source of tickets about internal tooling;
the person who will ask why a config change broke a downstream process.
---
### Derek Ashford — Financial Controller
Does not appear at team meetings. Does appear on CC lines of every email that
mentions cloud costs, hardware procurement, or infrastructure budget. Always
replies-all. His manager is Rachel Brandt (CFO).
Story use: background texture on procurement requests; the voice that makes any
infrastructure spending feel like a negotiation.
> **Note on "Dave from Finance":** Marcus's day-one message references "Dave from
> Finance" as the person holding the player's temp credentials. This is almost
> certainly Derek Ashford — Marcus using his first name informally, or a
> continuity error. Derek Ashford is the only Finance character plausibly holding
> IT credentials. His first name is Derek, not Dave — either the message should
> be corrected, or "Dave from Finance" is a third unnamed Finance employee.
---
### Rachel Huang — Systems Administrator
Marcus's peer on the IT team. Handles provisioning, patch cycles, and the ongoing
negotiation with Finance over cloud consolidation. Came from a managed services
background. Has strong opinions about monitoring dashboards, most of which are
correct.
Story use: the person who set something up that the player now has to maintain;
a colleague who can provide context Marcus won't; someone whose provisioning
decisions the player will encounter as infrastructure.
---
### Tom Malaney — Network Engineer
Responsible for network infrastructure across the office and hosted environments.
On-call for more holiday weekends than he would like. Thorough in documentation
when he finds time for it.
Story use: DNS, firewall, or routing problems that are not the player's fault
but become the player's problem; someone who can be reached but is slow to
respond.
---
### James Osei — Security Analyst
Priya's direct report. Handles vulnerability assessments, access reviews, and
quarterly compliance reporting. Methodical. Has a spreadsheet for everything,
which is not a criticism.
Story use: the person who runs the actual audit that Priya will summarize to the
player; a source of detailed (sometimes overwhelming) security findings.
---
### Ellen Marsh — CEO & Co-Founder
Built the first version of AxiomFlow after a decade in operations. No CS background.
Attends all-hands twice a year. Does not use Slack. Has final say on pricing and
major customer commitments.
Story use: the distant authority whose priorities shape everything; never interacts
with the player directly, but her decisions land as constraints.
---
### David Park — CTO & Co-Founder
Wrote the original rules engine in 2011. Now manages engineering managers. Still has
opinions about the data model. Has a standing Thursday meeting with Priya that hasn't
moved since 2017.
Story use: architectural decisions from above; the person Priya reports significant
security findings to.
---
### Karen Volkov — COO
Joined 2014. Responsible for the fact that the company has documented processes for
anything at all. Has opinions about infrastructure costs that surface in IT's world
via Finance. Prefers decisions with clear owners and deadlines.
---
### Rachel Brandt — CFO
Joined 2016. Approves all capital expenditure over $5,000. Working to consolidate
cloud spend. Does not enjoy surprises in the infrastructure budget. Derek Ashford
reports to her.
---
### Mei Lin — Senior Software Engineer
Has maintained AxiomSync's integration layer since 2018. Knows more about it than
anyone would prefer, including herself. Currently leading the migration tooling
project for the remaining legacy accounts.
---
### Cora Reyes — Software Engineer
Works on the AxiomDash reporting pipeline. Has submitted more internal RFCs than
anyone else on the team in the past year. Moving toward senior.
---
### Ben Portillo — Product Manager, AxiomDash
Leads product development for the analytics add-on. Works closely with large
accounts to understand what they actually want from dashboards (usually different
from what they asked for).
---
### Annika Gosse — UX Designer
Responsible for AxiomFlow's interface. Has been advocating for a redesign of the
workflow builder since 2022. Patient.
---
### Sandra Wu — HR Manager
Manages hiring, onboarding, and employee relations since 2016. Runs the new-hire
onboarding process (three days, thorough). Sends birthday emails on time, every time.
---
### Owen Blake — Office Manager
Keeps the office running. Has fixed more things than his job title implies. The
person to contact if conference room equipment stops working.
---
### Mike Kawamoto — Account Executive
Handles mid-market manufacturing accounts in the northeast. Believes strongly in
the demo environment. Closes more deals in Q4 than any other quarter.
---
### Lisa Ferreira — Customer Success Manager
Manages onboarding for new AxiomFlow deployments. Has a talent for understanding
what customers mean rather than what they say.
---
## Unresolved Characters (Story Hooks)
These are referenced in existing content but never defined. They represent the
strongest open narrative threads.
---
### Dale — The Previous Sysadmin
**Reference:** Marcus's day-one message — "You're replacing Dale. Nobody will tell you
what Dale did because it's complicated."
Dale is gone. The player has their desk, their access provisioning slot, and
apparently their reputation — people know the player is "Dale's replacement" before
they know the player's name. The systems the player inherits are the systems Dale
last touched.
What Dale did is unknown. It is described as "complicated." Marcus knows. Possibly
Kowalski knows. Possibly Priya knows, if it was security-related.
This is the strongest existing narrative mystery in the game. It has setup and no
payoff. Dale's story could be:
- A technical incident (something Dale broke and couldn't fix)
- A policy violation (something Dale did that wasn't malicious but wasn't right)
- A trust collapse (competent but burned bridges)
- Something personal
- Any combination
The player finding out what Dale did — gradually, through the systems they work on,
through things people let slip — is a natural story spine for the whole game.
---
### "Dave from Finance" — Day One Reference
**Reference:** Marcus's day-one message — "Dave from Finance has your temp credentials.
He's on three today."
Almost certainly Derek Ashford (Financial Controller), referred to informally. But
Derek's first name is Derek, not Dave — this is either Marcus being casual with
names, a continuity error, or a genuinely separate unlisted Finance employee.
Needs a decision: correct "Dave" to "Derek" in Marcus's message, or introduce a
separate "Dave from Finance" as a minor character.
---
## Key Relationships Map
```
Ellen Marsh (CEO)
└── David Park (CTO)
└── Yusuf Halabi (Eng Manager)
├── Mei Lin
├── Cora Reyes
└── Nikhil Sharma
└── Karen Volkov (COO)
└── Rachel Brandt (CFO)
└── Derek Ashford (Financial Controller)
└── Phil Ruiz (VP Sales)
├── Mike Kawamoto
└── Tanya Okafor
└── Lisa Ferreira
Dave Kowalski (Director of IT)
├── Marcus Webb ←── Player's manager
│ └── [Player]
├── Rachel Huang
└── Tom Malaney
Priya Nair (Head of Security)
└── James Osei
Sarah Chen (Product, AxiomFlow) ←── frequent ticket source
Ben Portillo (Product, AxiomDash)
Annika Gosse (UX)
```
---
## Tone Notes for New Story Work
- **Marcus talks like someone who has answered this question before.** Precise, low
affect, no wasted words. Never condescending — just efficient.
- **Sarah talks like a PM: outcome-focused, slightly impatient, specific about
what she needs.** She is not a villain. She has real deadlines.
- **Priya talks like someone who has already thought about what goes wrong.** She
doesn't speculate — she states. She's not alarming, she's matter-of-fact.
- **Dave Okonkwo talks like someone who doesn't know what the problem is** but is
trying to be helpful by reporting exactly what he observed. He should never be
made to look stupid — he's doing the right thing.
- **The company takes itself seriously.** Humor comes from the gap between official
language and reality, not from anyone being a cartoon.
- **Problems have plausible causes.** Systems broke because someone made a
reasonable decision under time pressure, not because they were careless idiots.
The player should feel like a professional, not a janitor.
+165
View File
@@ -0,0 +1,165 @@
# Axiom Works — Company Lore Reference
> For quest authors, dialogue writers, and ticket copy. Keep the tone dry and
> believable. The company should feel real, slightly dysfunctional, and just
> plausible enough that players recognise the type.
---
## Who They Are
**Axiom Works** is a B2B enterprise software company founded in 2011. Headquarters
is in a three-floor office park that is technically "downtown adjacent" depending
on how charitable you are with the map. They have about 280 employees. The
Glassdoor rating is 3.8 stars and management checks it obsessively.
Their flagship product is **AxiomFlow** — a workflow automation platform aimed at
mid-size manufacturers, logistics companies, and anyone who got a 90-minute demo
and thought it looked easy. Most customers are still on the workflow they set up
in 2019. The platform does what it says. Marketing says it does considerably more.
---
## Products
| Product | Description | Status |
|---------|-------------|--------|
| **AxiomFlow** | Workflow automation platform | Active, main revenue |
| **AxiomDash** | Reporting and analytics add-on | Active, profitable, under-resourced |
| **AxiomSync** | Legacy data integration layer | End-of-sale since 2021, still maintained for 12 customers who refuse to migrate |
The current marketing tagline is *"Streamline. Scale. Succeed."* It replaced
*"Work smarter, not harder"* in Q3 of last year. The one before that mentioned
AI. Nobody is sure what the AI was.
---
## Infrastructure
The company runs a mix of on-prem servers (named after Greek gods — a choice made
by a contractor in 2017 who left before documenting anything) and a handful of
cloud instances that accounting keeps trying to consolidate.
| Host | Role | Notes |
|------|------|-------|
| **ares** | Player workstation | XFCE desktop, where the player works |
| **hermes** | Web/app server | nginx, staging and demo environment for AxiomFlow |
| **vulcan** | Build machine | Arch Linux, compiles artifacts, runs scheduled jobs |
### Planned future systems
As the game grows, additional machines will be added. Candidates:
| Proposed host | Role | Greek connection |
|---|---|---|
| **poseidon** | Database server | Foundation, depths, reliability |
| **apollo** | Mail / notification server | Messenger, communication |
| **athena** | Internal tooling (ticketing, wiki) | Wisdom, knowledge management |
| **argus** | Monitoring / alerting | The hundred-eyed watcher |
| **mnemosyne** | Backup / storage | Memory, persistence |
---
## Characters
### Dave Kowalski — Director of IT Operations
The player's skip-level manager. Has been at Axiom Works since 2015. Hired Marcus.
Oversees three teams: systems (Marcus's domain), networking, and IT support. Background
is originally networking — has Cisco certifications he won't bring up unless someone else
brings up Cisco certifications first. Sends weekly status emails formatted in bullet
points that never quite answer the question you were asking. When things go wrong he
schedules a meeting to "talk through the situation," which everyone has learned is
worse than an email. Maintains a calendar block from 23pm on Tuesdays that nobody
has ever asked about. Has said "we should really document that" approximately 400 times.
Describes the infrastructure as "mature."
### Marcus Webb — Senior Sysadmin
The player's manager and the person who assigned them the ticket. Has been at
Axiom Works for six years. Knows where all the bodies are buried. Communicates
primarily in terse Slack messages and occasionally very long emails sent at 11pm.
Trusts competence over process. Gets irritated by people who confuse symptoms
with root causes.
### Priya Nair — Security / Compliance
Runs security reviews and has opinions about everything. Usually right. Tends to
frame concerns in terms of what will happen when things go wrong rather than
whether they will. Was brought in after an incident nobody talks about in public.
### Sarah Chen — Product Manager
Represents the product team's perspective in the ticket queue. Cares about demo
environments more than production ones because demos are what she can see. Not
technically wrong about their importance. Emails at 8am on Mondays.
### Derek Ashford — Financial Controller
Does not appear in person. Appears on CC lines of emails where infrastructure
costs are being discussed. Always replies-all. His full name is Derek Ashford.
His manager is Rachel Brandt (CFO).
---
## Background Characters (non-interactive, for world texture)
These characters exist on the company website and in lore but do not appear in
quests or dialogue. Use them for verisimilitude — email headers, CC lines, internal
wiki author credits, that sort of thing.
### Ellen Marsh — CEO & Co-Founder
Built AxiomFlow after a decade in operations. Not technical. Attends all-hands
twice a year. Has final say on pricing and major customer commitments. Does not
use Slack. The player will never interact with her.
### David Park — CTO & Co-Founder
Wrote the original rules engine. Now manages engineering managers. Still has
opinions about the data model. Has a standing Thursday meeting with security
that hasn't moved since 2017.
### Karen Volkov — COO
Joined 2014. Responsible for the fact that Axiom Works has documented processes
for anything. Has opinions about infrastructure costs. Prefers decisions with
clear owners and deadlines.
### Rachel Brandt — CFO
Joined 2016. Approves all capital expenditure over $5,000. Does not enjoy
surprises in the infrastructure budget. Derek reports to her.
### Phil Ruiz — VP of Sales
Has been promising features to prospects since 2016. Has a warm relationship
with the infrastructure team because Marcus once saved a demo with 20 minutes to
spare. Expense reports submitted promptly.
### Tanya Okafor — Head of Customer Success
Manages all post-sale customer relationships including the twelve AxiomSync
holdouts. Usually the first to know when something is wrong in production,
because a customer has already called her.
### Yusuf Halabi — Engineering Manager
Reports to the CTO. Manages the core AxiomFlow platform team. Has opinions
about test coverage. Runs the Thursday architecture review.
### Mei Lin — Senior Software Engineer
Has maintained AxiomSync's integration layer since 2018. Knows more about it
than anyone would prefer.
### Nikhil Sharma — Platform Engineer
Owns the build and release pipeline and internal CI infrastructure. Occasionally
sends Slack messages at 6am.
### Sandra Wu — HR Manager
Manages hiring, onboarding, and employee relations since 2016. Sends birthday
emails on time, every time. Runs the new-hire onboarding process that takes
three days.
---
## Tone Guidelines
- **Dry, not sarcastic.** The company takes itself seriously. The humour comes
from the gap between how they describe things and what's actually happening.
- **Specific, not generic.** "The AxiomSync customer in Cincinnati keeps calling"
is better than "a client is upset."
- **Plausible dysfunction.** Problems happen because of reasonable decisions made
under time pressure, not because people are incompetent. The player should feel
like a real professional, not a janitor.
- **No cartoon villains.** Derek from Finance is not evil. The product team is not
stupid. They have different priorities.
- **The infrastructure has history.** It was built over time. Some parts are good.
Some parts were good in 2017. The player's job is to keep it working.
+419
View File
@@ -0,0 +1,419 @@
# Quest Authoring
Use this guide when adding new JSON quests under `content/quests/`.
Quest files describe observed VM state. They are not command scripts and they
should model real Linux behavior, not puzzle logic detached from the system.
For complete worked files, see [`docs/AUTHORING_EXAMPLES.md`](/home/aaron/Programming/sysadmin-chronicles/docs/AUTHORING_EXAMPLES.md).
## Quest JSON Schema
### Root Fields
| Field | Type | Description |
| --- | --- | --- |
| `id` | string | Quest ID, for example `Q005`. |
| `title` | string | Player-facing quest title. |
| `tier` | int | Difficulty tier, usually `1`, `2`, or `3`. |
| `primary_vm` | string | Main VM for the quest. Current authored values are `workstation`, `web_server`, and `build_machine`. |
| `required_vms` | string[] | Every VM the quest touches. Include all VMs used in clues, validation, or prep. |
| `ticket_id` | string | Links to `content/tickets/<id>.json`. |
| `baseline_snapshot` | string | Snapshot name that the prep script should restore or build from. |
| `summary` | string | Short internal scenario summary. |
| `clue_fingerprint` | object | Advisory description of the evidence seeded into the baseline. |
| `objectives` | object[] | Objective list shown to the player and used for progress checks. |
| `solution_branches` | object[] | Branches the validator can resolve to. Higher-priority valid branches win. |
| `pressure_profile` | string or null | Optional pressure/escalation profile name. |
| `blast_radius` | string[] | Incident IDs that this quest can affect or trigger. |
| `unlock_requirements` | string[] | Prerequisites such as `world_flag:` entries. |
| `tags` | string[] | Search and classification tags. |
| `internal_notes` | string | Author-only notes for reviewers. |
| `_note` | string | Optional author-only comment. Existing content uses this at root and inside nested objects. |
### `clue_fingerprint`
`clue_fingerprint` is advisory. It documents what evidence the baseline already
contains so content reviewers can confirm the clue trail is real.
| Field | Type | Description |
| --- | --- | --- |
| `description` | string | Plain-language explanation of the clue trail. |
| `evidence` | object[] | Evidence items that point to the issue. Use the same general shape as the relevant validation type. |
Common evidence shapes in existing content:
- File and log evidence usually includes `type`, `vm`, `path`, and `contains`
- State evidence may include `type`, `vm`, `service`, `state`, or `enabled`
- Ownership evidence may include `type`, `vm`, `path`, `user`, and `group`
- Scalar evidence may include `threshold_percent`, `port`, or `command` depending on the clue
Existing clue fingerprints also use clue-only labels such as `service_state_is`,
`service_enabled_is`, and `expected_user`. Treat those as descriptive baseline
metadata, not runtime validation names.
## Objectives
| Field | Type | Description |
| --- | --- | --- |
| `id` | string | Stable objective ID. |
| `description` | string | Player-facing objective text. |
| `check_mode` | string | `passive` or `explicit`. Use `passive` by default. |
| `validation` | object | Rule object evaluated by `ValidationService`. |
Objectives are for feedback and progress tracking. They do not choose the
winning solution branch.
## Solution Branches
| Field | Type | Description |
| --- | --- | --- |
| `id` | string | Stable branch ID. |
| `label` | string | Optional short label used in content review and debugging. |
| `priority` | int | Higher wins when multiple branches validate. Priorities must be unique per quest. |
| `validation` | object | Rule object evaluated for this branch. |
| `trust_delta` | float | Trust change applied when this branch wins. Positive for better fixes, negative for risky or damaging ones. |
| `follow_up_dialogue` | string | Dialogue ID to trigger after resolution. |
| `follow_up_incident` | string | Incident ID to trigger after resolution, if the branch intentionally leaves a latent problem. |
| `follow_up_ticket` | string | Next ticket ID in the quest chain. |
| `world_flags` | string[] | Flags to set when the branch wins. |
| `_note` | string | Optional author-only comment. |
### Branch Authoring Guide
- Use branch priority to rank the quality of valid solutions.
- Put the clean, robust fix at the highest priority.
- Use lower priorities for brittle workarounds, partial fixes, or outcomes that
leave future risk behind.
- Use `trust_delta` to reflect the quality of the fix, not just whether the
quest technically completed.
- Use `follow_up_ticket` when a winning branch should advance the story to the
next ticket.
- Use `follow_up_incident` only when that branch intentionally seeds a later
recurrence or operational cost.
- Keep priorities unique. If two branches can both pass with the same priority,
the content should be rewritten.
## Validation Rule Types
Design notes sometimes use shorthand names like `file_mode_matches` or
`command_exits_zero`. In authored JSON, use the runtime rule names below.
- `file_mode_matches` -> `file_mode`
- `file_owner_matches` -> `file_owner`
- `service_state_matches` -> `service_state`
- `service_is_enabled` -> `service_enabled`
- `process_is_running` -> `process_running`
- `port_is_listening` -> `port_listening`
- `package_is_installed` -> `package_installed`
- `command_exits_zero` -> `command_assert`
| JSON type | Fields | Notes |
| --- | --- | --- |
| `file_exists` | `vm`, `path` | Passes when the file exists. |
| `file_absent` | `vm`, `path` | Inverse of `file_exists`. |
| `directory_exists` | `vm`, `path` | Passes when the directory exists. |
| `file_contains` | `vm`, `path`, `contains` | Passes when the file contains the given text. |
| `log_contains` | `vm`, `path`, `contains` | Alias for `file_contains` used by some clue fingerprints. |
| `file_mode` | `vm`, `path`, `mode` | Checks the exact file mode string, such as `0600`. |
| `file_owner` | `vm`, `path`, `user`, `group` | Checks exact ownership. |
| `file_owner_is_not` | `vm`, `path`, `user`, `group` | Negated ownership check. |
| `service_state` | `vm`, `service`, `state` | Checks the active state, such as `active`, `inactive`, or `failed`. |
| `service_enabled` | `vm`, `service`, `enabled` | Checks boot-time enablement. The `enabled` field defaults to `true`. |
| `process_running` | `vm`, `process` | Passes when the named process is running. |
| `process_user` | `vm`, `process`, `user` | Passes when the named process runs as the given user. |
| `port_listening` | `vm`, `port`, `listening` | Checks whether a port is listening. The `listening` field defaults to `true`. |
| `package_installed` | `vm`, `package` | Passes when the package is installed. |
| `mount_present` | `vm`, `path` | Passes when the mount is present. |
| `disk_usage_below` | `vm`, `path`, `threshold_percent` | Passes when disk usage is below the threshold. `percent` is accepted in older content. |
| `disk_usage_above` | `vm`, `path`, `threshold_percent` | Passes when disk usage is above the threshold. `percent` is accepted in older content. |
| `command_assert` | `vm`, `command` | Fallback rule for command-based checks. Use sparingly. |
| `and` | `rules` | All sub-rules must pass. |
| `or` | `rules` | Any sub-rule may pass. |
| `not` | `rule` | Inverts the inner rule. |
### Validation Notes
- Prefer state-based checks over command checks.
- Use `and` and `or` to model genuinely alternative states, not to hide weak
authoring.
- `command_assert` is a fallback. If a real state rule exists, use that first.
- Some older quest files include extra fields such as `protocol` or
`installed`. The loader ignores unknown keys, but new quests should stick to
the documented fields above.
## Prep Script Requirements
Each quest needs a prep script at `tools/vm/quest-prep/QXXX-prep.sh`.
- The script must be idempotent.
- It must set up the starting VM state for the quest.
- It runs at image build time, not when the player starts the quest.
- It should install required packages only from local or pre-baked sources.
- It may create logs, users, groups, permissions, or broken config files that
form the scenario.
- It must not rely on a live player session.
When a quest continues an existing chain, the prep script should restore the
prior clean snapshot first, then apply the new scenario changes, and finally
take the next baseline snapshot.
## VM Provisioning Pipeline
A new quest requires a VM baseline before it can be played. The full authoring
workflow from scratch to playable quest:
### 1. Write the prep script
Create `tools/vm/quest-prep/QXXX-prep.sh`. Requirements:
- Must be idempotent — safe to run twice on the same domain.
- Accepts the domain name as $1 and an optional `--dry-run` flag as $2.
- Must not prompt for input or depend on internet access.
- Reads `tools/vm/lib/common.sh` for shared helpers (`run`, `step`, `ok`, etc.).
Typical operations: break a config file, chown a directory, remove a logrotate
config, add a cron entry, delete a key. Nothing that would be undone by the
player before the quest starts.
### 2. Register the quest in seed-vms.sh
Open `tools/setup/seed-vms.sh` and:
1. Add a `require_file` check near the top (`STEP 1 — Pre-flight checks`):
```bash
require_file "$QUEST_PREP/QXXX-prep.sh" "QXXX prep script"
```
2. Add a `run_prep_and_snapshot` call in `STEP 4 — Run quest-prep scripts`:
```bash
run_prep_and_snapshot "QXXX" "sc-<vm-domain>" "baseline.<snapshot-name>"
```
The snapshot name must match the quest's `baseline_snapshot` field.
### 3. Baseline snapshot chain
Each VM has its own chain. Only the CLEAN branch resolution of a quest is used
as the baseline for the next quest. Brittle-branch resolutions are never
snapshotted.
| VM | Snapshot chain |
|----|----------------|
| `sc-workstation` | `baseline.day-one` (Q001 only) |
| `sc-web-server` | `baseline.clean` → `baseline.post-q002` → `baseline.post-q003` → `baseline.post-q004` |
| `sc-build-machine` | `baseline.clean` → `baseline.post-q006` |
A prep script that builds on a prior quest must revert to the prior snapshot
before applying its changes.
### 4. VM baseline package set
Each authored VM has a guaranteed minimum set of packages that players can rely on
during gameplay. New quests must not assume packages outside this set unless the
quest prep script installs them.
| VM | OS | Guaranteed packages |
|----|----|---------------------|
| `sc-workstation` (ares) | Ubuntu 24.04 | `qemu-guest-agent`, `openssh-server`, `sudo`, `bash-completion`, `hostname`, `ssh` client (system) |
| `sc-web-server` (hermes) | Debian 12 | `qemu-guest-agent`, `openssh-server`, `sudo`, `nginx`, `logrotate`, `rsync`, `curl`, `hostname`, `ssh` client |
| `sc-build-machine` (vulcan) | Arch Linux | `qemu-guest-agent`, `openssh`, `sudo`, `base-devel`, `archlinux-keyring`, `inetutils` (provides `hostname`, `ping`), `ssh` client |
`hostname`, `whoami`, `id`, `ls`, `cat`, `echo`, `ps`, `df`, `du`, `free`,
`systemctl`, `journalctl` are available on all VMs.
The in-game terminal auto-adds `-C` to bare `ls` calls so column output renders
correctly. If a quest step requires `ls -l` or another explicit format, pass it
explicitly — the auto-`-C` injection only fires when no layout flag is present.
### 5. Run the pipeline
```bash
# Dry run first — shows what would execute without touching VMs
bash tools/setup/seed-vms.sh --dry-run
# Full build — requires libvirt and all three sc-* domains to exist
bash tools/setup/seed-vms.sh
# Prep + snapshot only (skip the image build step)
bash tools/setup/seed-vms.sh --skip-build
# Single VM only
bash tools/setup/seed-vms.sh --vm web_server
```
### 5. Validate
After seed-vms.sh completes:
```bash
# Check content integrity (including baseline_snapshot field)
node tools/content/validate-content.js
# Verify snapshots exist on each domain
virsh snapshot-list sc-web-server
virsh snapshot-list sc-build-machine
```
## Multi-Solution Quest Example
```json
{
"id": "Q099",
"title": "Cron Runs as Root",
"tier": 2,
"primary_vm": "web_server",
"required_vms": ["web_server"],
"ticket_id": "T099",
"baseline_snapshot": "baseline.clean",
"_note": "Minimal example: the nightly cron job should run as www-data, not root.",
"summary": "A site-sync cron entry was copied from a root shell. It still runs, but it now leaves root-owned cache files behind.",
"clue_fingerprint": {
"description": "The cron file exists, but it names root as the executor. The cache directory is already polluted with root-owned files.",
"evidence": [
{ "type": "file_contains", "vm": "web_server", "path": "/etc/cron.d/site-sync", "contains": "root /opt/site-sync/bin/sync-cache.sh" },
{ "type": "file_owner_is_not", "vm": "web_server", "path": "/var/www/axiomworks/cache", "user": "www-data" }
]
},
"objectives": [
{
"id": "sync-safe",
"description": "The cron job runs as www-data and the scheduler is active",
"check_mode": "passive",
"validation": {
"type": "and",
"rules": [
{ "type": "file_contains", "vm": "web_server", "path": "/etc/cron.d/site-sync", "contains": "www-data /opt/site-sync/bin/sync-cache.sh" },
{
"type": "or",
"rules": [
{ "type": "command_assert", "vm": "web_server", "command": "systemctl is-active --quiet cron" },
{ "type": "command_assert", "vm": "web_server", "command": "pgrep -x cron >/dev/null" }
]
}
]
}
}
],
"solution_branches": [
{
"id": "correct-cron",
"label": "Correct Cron User",
"priority": 100,
"validation": {
"type": "and",
"rules": [
{ "type": "file_contains", "vm": "web_server", "path": "/etc/cron.d/site-sync", "contains": "www-data /opt/site-sync/bin/sync-cache.sh" },
{
"type": "or",
"rules": [
{ "type": "command_assert", "vm": "web_server", "command": "systemctl is-active --quiet cron" },
{ "type": "command_assert", "vm": "web_server", "command": "pgrep -x cron >/dev/null" }
]
}
]
},
"trust_delta": 2,
"world_flags": ["site_sync_healthy"],
"follow_up_dialogue": "marcus-Q099-complete-clean",
"follow_up_ticket": "T100",
"_note": "Preferred fix: keep the job and run it with the correct user."
},
{
"id": "disabled-cron",
"label": "Brittle Disable",
"priority": 40,
"validation": {
"type": "command_assert",
"vm": "web_server",
"command": "test ! -f /etc/cron.d/site-sync"
},
"trust_delta": -1,
"world_flags": ["site_sync_brittle"],
"follow_up_dialogue": "marcus-Q099-complete-brittle",
"_note": "The job was deleted instead of repaired. It stops the symptom, but it is not a durable fix."
}
],
"pressure_profile": null,
"blast_radius": [],
"unlock_requirements": ["world_flag:player_ssh_configured"],
"tags": ["cron", "permissions", "web_server"],
"internal_notes": "Example only."
}
```
## Multi-VM Quest Example
```json
{
"id": "Q098",
"title": "Build Sync Writes Bad Ownership",
"tier": 2,
"primary_vm": "build_machine",
"required_vms": ["workstation", "build_machine", "web_server"],
"ticket_id": "T098",
"baseline_snapshot": "baseline.post-q006",
"_note": "The build machine is pushing release files to the web server, but the ownership is wrong and the deploy helper is still running.",
"summary": "A deployment helper on the build machine is writing release files to the web server with root ownership. The helper must be stopped and the output repaired so the web server can manage the files again.",
"clue_fingerprint": {
"description": "The deploy helper is still running on build_machine. On web_server, the release artifact is owned by root instead of www-data.",
"evidence": [
{ "type": "file_contains", "vm": "build_machine", "path": "/opt/deploy/bin/push-release.sh", "contains": "rsync -a --chown=root:root" },
{ "type": "process_running", "vm": "build_machine", "process": "deploy-sync" },
{ "type": "file_owner_is_not", "vm": "web_server", "path": "/var/www/axiomworks/releases/current/index.html", "user": "www-data", "group": "www-data" }
]
},
"objectives": [
{
"id": "release-owned-correctly",
"description": "The web release file is owned by www-data and the deploy helper is stopped",
"check_mode": "passive",
"validation": {
"type": "and",
"rules": [
{ "type": "file_owner", "vm": "web_server", "path": "/var/www/axiomworks/releases/current/index.html", "user": "www-data", "group": "www-data" },
{ "type": "not", "rule": { "type": "process_running", "vm": "build_machine", "process": "deploy-sync" } }
]
}
}
],
"solution_branches": [
{
"id": "deploy-stopped-owner-fixed",
"label": "Stop Helper and Fix Ownership",
"priority": 100,
"validation": {
"type": "and",
"rules": [
{ "type": "file_owner", "vm": "web_server", "path": "/var/www/axiomworks/releases/current/index.html", "user": "www-data", "group": "www-data" },
{ "type": "not", "rule": { "type": "process_running", "vm": "build_machine", "process": "deploy-sync" } }
]
},
"trust_delta": 2,
"world_flags": ["release_permissions_fixed"],
"follow_up_dialogue": "marcus-Q098-complete-clean",
"_note": "This branch validates both VMs: the release file is fixed on web_server and the helper is no longer running on build_machine."
}
],
"pressure_profile": null,
"blast_radius": [],
"unlock_requirements": ["world_flag:player_ssh_configured"],
"tags": ["deploy", "permissions", "multi-vm", "build_machine", "web_server"],
"internal_notes": "Example only."
}
```
## Quest Chain Authoring
Use `follow_up_ticket` to chain the campaign in sequence. The winning branch
emits the next ticket, and `QuestDirector` activates the next quest from that
ticket.
| Quest | Clean branch `follow_up_ticket` |
| --- | --- |
| `Q001` | `T002` |
| `Q002` | `T003` |
| `Q003` | `T004` |
| `Q004` | `T005` |
Keep the chain on the clean, high-priority branch. If a brittle branch should
continue the story differently, use its own `follow_up_ticket` or
`follow_up_incident` intentionally.
+161
View File
@@ -0,0 +1,161 @@
# Sysadmin Chronicles — Spec Lock
This file preserves the user's intended new system design. Treat it as binding.
## 1. Narrative spine
The story progression is:
```text
Normal Work → Unease → Suspicion → Investigation → Conflict → Resolution
```
Every quest must map to one of these phases.
## 2. Required quest structure
Every proposed quest must include:
- Title
- Narrative Phase
- Objective
- Linux Concepts
- Systems Used
- Hidden Hook (optional)
- Failure Conditions
- Behavior Impact
For implementation, these may be expanded into JSON fields, but these concepts must remain present.
## 3. Core systems
### 3.1 Player behavior tracking
Track:
- `curiosity` — exploration, anomaly investigation, reading beyond ticket scope
- `obedience` — completing assigned work, following stated priorities, ignoring suspicious extras
- `risk` — reckless changes, broad permissions, deleting evidence, unsafe shortcuts
These influence:
- Access levels
- Narrative progression
- Endings
### 3.2 Trust and suspicion compatibility
The existing system already uses `trust_delta`, world flags, and branch quality. Preserve that.
Map old and new systems like this:
- `trust` = professional standing produced mostly by solution quality and branch outcomes
- `suspicion` = management/security attention caused by investigative, risky, or unusual behavior
- `curiosity`, `obedience`, `risk` = the new behavior profile controlling narrative route
Do not replace trust. Extend it.
### 3.3 Access system
Player permissions evolve:
```text
basic_user → sudo → root
```
Access is affected by:
- Trust from competent task completion
- Suspicion from investigation behavior
- Risk from careless or destructive changes
- Narrative phase
### 3.4 Boss system / management pressure
The boss system acts as a dynamic constraint, not a cutscene machine.
Phase scaling:
- Phase 1: Annoying
- Phase 2: Dismissive
- Phase 3: Suspicious
- Phase 4: Monitoring
- Phase 5: Interfering
- Phase 6: Outcome-dependent
Functions:
- Interrupt tasks
- Reassign priorities
- Restrict access
- Add pressure through tickets, emails, delayed approvals, audits, or access review
In the current company context, this can be represented by Marcus, Kowalski, Priya, or policy pressure depending on the situation. Do not turn one character into a cartoon villain.
### 3.5 Hidden narrative system
Hidden hooks are embedded in normal quests.
Examples:
- Unknown services
- Suspicious cron jobs
- Hidden users
- Network anomalies
- Unexpected SSH keys
- Odd timestamps
- Config history that does not match the official story
Rules:
- Never explicitly flagged
- Optional discovery only
- Not required to complete the assigned ticket
- Must be discoverable through real sysadmin behavior
- Should accumulate into a coherent hidden story over time
## 4. Quest generation constraints
- Reuse existing game systems
- Do not introduce unnecessary mechanics
- Scale difficulty with player progression
- Preserve the observed-VM-state design from existing quest authoring
- Prefer real Linux behavior over puzzle logic
## 5. Difficulty scaling
- Phase 1: Explicit instructions
- Phase 2: Partial hints
- Phase 3: Minimal guidance
- Phase 4+: Problem-solving only
This applies to ticket wording, hints, clue obviousness, and branch tolerance.
## 6. Endings
Endings are determined by behavior over the playthrough:
- `corporate_loop` — obedient path / bad ending
- `burnout` — passive path / neutral ending
- `exposure` — investigative path / good ending
- `chaos` — destructive/high-risk path
No ending should be selected by a single obvious final button. The route should emerge from world flags, behavior variables, access state, and discovered/acted-on hidden hooks.
## 7. Design principles
- Discovery over exposition
- Systems over scripts
- Freedom over forced narrative
- Realism with subtle distortion
## 8. Non-goals
Do not:
- Build a linear-only story
- Rely on cutscenes
- Over-explain mechanics
- Remove player agency
- Turn the mystery into explicit quest markers
- Rewrite established characters to fit a new plot
+423
View File
@@ -0,0 +1,423 @@
# Story Design Context — Sysadmin Chronicles
For story designers and AI agents creating new quests and narrative content.
**Related docs:**
- `CHARACTERS.md` — character bios, relationships, story hooks
- `COMPANY_LORE.md` — world, company, tone
- `QUEST_AUTHORING.md` — technical JSON spec for implementers
This document answers: *how does story actually work in this game, and what does a quest
concept need to contain to be usable?*
---
## The Core Premise
The player is a new junior sysadmin at Axiom Works, a mid-size B2B software company.
They are replacing someone named Dale. Nobody will explain why Dale is gone.
The game is played entirely through a simulated work environment: a terminal, an email
inbox, and a company website. There are no cutscenes, no narration, no inventory, no
combat. Everything that happens is expressed through:
- **Tickets** — the player receives a ticket describing a problem
- **The terminal** — the player SSHes into VMs, investigates, and fixes things
- **Character dialogue** — characters react to how the player solved the problem
- **The next ticket** — the world moves on, and the consequences of what the player
did are baked into the next situation
That's it. Story is not told — it is accumulated from the choices the player makes
when fixing real Linux problems on real virtual machines.
---
## The Three Machines (VMs)
Every quest happens on one or more of these machines. Their narrative identities
matter as much as their technical roles.
### ares — the Workstation
The player's home machine. Ubuntu 24.04. Quests here are onboarding-flavored —
establishing access, learning the environment. It's the only machine the player
can reach on day one.
*Narrative identity:* Where you start. Safe-ish. The first one you break is here.
### hermes — the Web / App Server
Debian 12. Runs nginx and the AxiomFlow demo/staging application. This is the
machine that Sarah Chen cares about, that customers can feel, and that Priya Nair
watches for security posture. Most of the early-game quests are here.
*Narrative identity:* The product's face to the world. Breaking this makes noise
immediately. The most politically visible machine.
### vulcan — the Build Machine
Arch Linux. Compiles packages, runs the internal build pipeline, serves packages
to hermes via an internal apt repo. Nikhil Sharma owns this in principle but nobody
manages it daily. Things here break silently until hermes starts serving bad software.
*Narrative identity:* The machine nobody watches until something downstream fails.
Quests here reveal that problems have upstream causes the player didn't expect.
### Planned future machines
As the story expands, new machines can be added. Each should have a clear narrative
role before it's introduced. (See `COMPANY_LORE.md` for the candidate list.)
---
## How Story Is Delivered
### Tickets as Act One
Every quest begins with a ticket in the player's inbox. The ticket is a short email
from a character describing a symptom — not a cause. The sender's perception of the
problem is usually incomplete and sometimes wrong. This is intentional: the player's
job is to investigate, not to execute instructions.
Good ticket writing:
- Describes what the sender experienced, not what the cause is
- Has the sender's voice and perspective (Sarah is outcome-focused; Dave is confused;
Priya is terse and specific)
- Does not hint at the solution
- Creates genuine stakes (site is down, builds are failing, someone is locked out)
Bad ticket writing:
- Explains the root cause ("the log file is too big")
- Has no character voice (generic IT help desk language)
- Stakes are unclear or low
### The Terminal as Act Two
The player investigates. They SSH in, run commands, read logs, check configs, look at
file ownership. The evidence is seeded into the VM baseline — it is genuinely there
to find, not procedurally generated. A good quest has a natural clue trail:
- The most obvious thing points to a second thing
- The second thing reveals the actual problem
- The fix is achievable with real Linux knowledge
The player cannot be told what to do. They can ask Marcus for hints (via dialogue
choices), but good players don't need to.
### Branching Resolution as Act Three
When the player has made changes to the VM, the game checks the state of the
system against the quest's solution branches. The branch that matches determines:
- What dialogue fires (Marcus's reaction, Sarah's reaction, Priya's follow-up)
- What trust delta the player receives
- What world flag is set (persistent story state)
- Whether an incident is triggered (a future consequence of a partial fix)
- What ticket comes next
**This is the central story mechanic.** Every quest should be designed with at
least two and ideally three resolution branches:
| Branch type | What it means |
|-------------|---------------|
| **Clean fix** | Player understood the root cause and solved it properly. High trust, no downstream risk. |
| **Acceptable fix** | Problem is solved but with a tradeoff — brittle approach, future maintenance burden, or incomplete cleanup. Lower trust. |
| **Regression** | Player fixed the symptom but made something else worse. Negative trust. Story consequences. |
The **regression branch** is not about punishment — it's about realism. A real
sysadmin who removes all SSH restrictions to restore one person's access has
technically solved the ticket while creating a larger problem. The story should
treat this as realistic professional consequence, not a game-over failure.
Players on a clean-fix path get more trust, unlock more access, and receive warmer
character reactions. Players on a regression path continue playing but face the
downstream effects of their choices.
---
## World Flags — Persistent Story State
World flags are string keys set when a quest's branch resolves. They persist for
the entire playthrough and can be read by later quests, incidents, and dialogue.
Examples:
- `hermes_logrotate_healthy` — set when the player properly fixed log rotation
- `hermes_ssh_allowusers_fragile` — set when the player restored SSH access using
the brittle AllowUsers approach instead of the robust AllowGroups approach
- `player_ssh_configured` — set when the player successfully set up SSH on day one
World flags are how story continuity works. A later quest can check whether the
player fixed something correctly earlier and behave differently. Marcus can reference
a past fix. Priya can flag a previously introduced risk in a later audit. A problem
that was "solved" with a quick fix can recur.
**When designing a new quest, ask:** what flag should this set, and what future quests
or dialogue might reference it?
---
## Trust — The Narrative Currency
Trust is a numeric score that tracks the player's professional standing with Marcus
and the IT team. It affects:
- **VM access** — the player gains SSH access to hermes and vulcan as trust increases.
If trust drops badly, access can be revoked.
- **Documentation access** — more trusted players get access to internal runbooks
and admin guides
- **Character warmth** — Marcus's messages change tone subtly as trust grows
- **Incident visibility** — at a certain trust level, the player starts seeing
background incidents before they become critical
Trust is not displayed as a raw number. Players experience it as consequences.
**For quest designers:** each branch should have a `trust_delta` that reflects the
quality of the fix. A proper root-cause fix should earn more than a workaround.
Regression branches should cost trust. Day-one onboarding quests are lenient;
later quests at higher tiers should be less forgiving.
---
## Incidents — Consequences of Incomplete Fixes
An incident is a time-delayed consequence that fires when a quest's partial-fix
branch was taken. It represents the problem coming back.
Example: The player clears a full disk by deleting a log file but doesn't restore
the logrotate config. Two in-game hours later, the disk starts filling again. Dave
notices. The player gets another ticket about the same symptom.
Incidents are not punishments — they are realistic. The world doesn't stay fixed
just because the player touched it. A player who takes clean-fix branches will
rarely see incidents. A player who takes every shortcut will find their ticket queue
filling up with problems they already "solved."
For story purposes: incidents can also carry narrative weight. If the player made a
security regression, an incident could represent an audit finding, an unusual login,
or a configuration discrepancy Priya noticed.
---
## The Character Conversation Model
Quest dialogue fires after a branch resolves. Three characters can speak:
### Marcus Webb
The primary voice. Appears in every quest. His post-resolution message reflects:
- What the player actually did (not just whether they succeeded)
- Whether they understood the root cause or just cleared the symptom
- A forward-looking observation (usually a quiet flag for what's coming next)
Marcus does not praise effusively or scold dramatically. He states what he observed.
His message for a clean fix is warmer and sometimes wry. His message for a regression
is brief and pointed. He never says "well done!" He might say "that's the right call."
### Sarah Chen
Speaks when the quest affects something product-facing (hermes being up or down,
deploys working or failing). Her messages are reactive — she responds to outcomes,
not process. She is not hostile unless the player makes her situation worse.
### Priya Nair
Speaks when the quest has security implications — access changes, hardening,
audit posture. She does end-of-shift reviews that grade overall performance.
Her per-quest messages are brief and evaluative. She notices things Marcus might not.
### Other characters
Dave Okonkwo files tickets. He does not have post-resolution dialogue — he
just stops or starts noticing things. Future characters (Kowalski, Nikhil, Tanya)
can speak in dialogue if quests are designed to involve them.
---
## The Narrative Arc
The overall story has six phases. Quests should be designed with their phase in mind.
The phase is usually not visible to the player — it emerges from what's happening
around them.
### Phase 1 — Normal Work
*Tier 1 quests. Early game.*
The player is new. Everything is routine. Marcus is helpful. The problems are real
but not alarming — a broken config, a full disk, a permission issue. The player is
learning the environment. The subtext is that things are slightly more wrong than
they should be, but there's nothing to point at.
Hidden layer: small anomalies in the systems that curious players can notice but
don't have context for yet.
### Phase 2 — Unease
*Tier 1/2 transition.*
The problems start to have patterns. The same kind of thing breaks twice. A fix
the player made doesn't hold the way it should. Nothing is alarming, but Marcus's
messages have a slightly different quality — he notices things he doesn't explain.
Hidden layer: a world flag from an early quest points somewhere unexpected.
### Phase 3 — Suspicion
*Tier 2 quests. Mid game.*
The player starts encountering problems they didn't cause and can't fully explain.
Access was changed by someone. A config was edited recently. A log shows an
unusual pattern. Nobody is accusing anyone. But the player now has enough context
to start asking questions — even if no quest explicitly tells them to.
This is where Dale becomes relevant again. The systems the player inherits were
last touched by Dale. Some of them have been in a particular state for a long time.
### Phase 4 — Investigation
*Tier 2/3 transition.*
The player has connected enough dots to understand that something happened before
they arrived. The quests in this phase involve digging into logs, access records,
and configuration history. The investigation is framed as professional work
(audit the access logs, trace the package build history) — but the results tell
a story.
Marcus's messages are shorter. Priya starts appearing more. Kowalski schedules a
meeting nobody explains.
### Phase 5 — Conflict
*Tier 3 quests. Late game.*
The player knows what happened. Acting on that knowledge has professional
consequences. The conflict is not physical — it is about what the player chooses
to surface, who they tell, and what they do with access they were given for one
purpose that could be used for another.
### Phase 6 — Resolution
*Endgame.*
The situation resolves. The ending the player gets depends on the world flags
accumulated across their entire playthrough — not just whether they clicked the
"good ending" button. A player who took clean-fix branches throughout, built
trust, and noticed the hidden anomalies gets a different ending than a player
who patched symptoms, lost trust, and missed everything.
---
## What Makes a Good Quest Scenario
The best quests have a **plausible mundane cause** and a **visible technical trail**.
Players should never need to guess — they should be able to find the answer by
looking at the right files and running the right commands.
### Good scenario types
- Service down → config syntax error → player traces error output to the line
- Disk full → log file enormous → logrotate config missing → player restores it
- Deploy fails → files owned by wrong user → someone ran a script as root manually
- Build failures → clock drift → NTP not running → player enables time sync
- Access locked out → sshd_config modified → wrong directive → player corrects it
- App crashes after update → bad package from internal repo → player traces to source
### What makes these work
1. **The symptom is real and urgent.** Something is actually broken.
2. **The cause is discoverable.** The evidence is in logs, config files, or system state.
3. **The fix is a real Linux operation.** Not artificial — `chown`, `systemctl`, editing
a config, fixing a cron entry, rolling back a package.
4. **Multiple approaches exist.** The quick fix works. The proper fix is better and
the game knows the difference.
5. **The character reactions are grounded.** Sarah cares about the demo being up.
Priya cares about the access control implications. Marcus cares about whether the
player understood what they were doing.
### Bad scenario types to avoid
- Problems that require packages not in the VM's guaranteed baseline (see `QUEST_AUTHORING.md`)
- Problems that require real-time events the validation engine can't check
- Problems where the "correct" fix is the only fix (no meaningful branch differentiation)
- Problems that break the fourth wall or require the player to know game-layer information
- Problems that are gotchas rather than investigations (the cause can't be found by looking)
---
## Hidden Anomalies — Environmental Storytelling
Every 35 quests should include something unusual in the VM environment that the player
is not told about and not required to engage with. These are not quest objectives.
They are breadcrumbs for curious players.
Examples of the kind of thing these should be:
- A user account that shouldn't exist
- A log entry from an odd time that doesn't match the official history
- A file that was modified recently but wasn't part of the quest setup
- A cron job that's been disabled but was once important
- An SSH key in authorized_keys that doesn't belong to anyone obvious
These anomalies should be consistent with the overall narrative arc — a player who
collects them across the whole game should be able to piece together what happened
before they arrived. They should never be labelled, never referenced in objectives,
and never required. They are for the players who look.
---
## Quest Output Format for Story Agents
When proposing new quests, provide the following. This is the minimum needed for
a technical author to implement the quest.
```
Quest ID: QXXX
Title: [player-facing]
Narrative phase: [16]
Tier: [1, 2, or 3]
Primary VM: [ares / hermes / vulcan]
Additional VMs: [if any]
Scenario summary:
What is broken, why it is broken (the root cause), and what the player
will encounter. 13 sentences. Written for the implementer, not the player.
Ticket:
From: [character name]
Subject: [email subject line]
Body: [the email the player receives. Written in the sender's voice.
Describes the symptom. Does not explain the cause.]
Clue trail:
What the player will find when they investigate. The evidence that leads
them to the root cause. Describe the actual files, log entries, and system
states — not the player's steps.
Solution branches:
Branch 1 (clean fix, highest trust):
What the player has done. Why it's correct. Trust delta.
Branch 2 (acceptable fix):
What the player has done. What tradeoff it introduces. Trust delta.
Branch 3 (regression, if applicable):
What the player did wrong. What it breaks. Negative trust delta.
Character reactions:
Marcus (post-resolution):
Clean: [what Marcus says]
Acceptable: [what Marcus says]
Regression: [what Marcus says]
Sarah / Priya (if relevant):
[reaction to the specific outcome that affects them]
World flags set: [list flags each branch sets]
Follow-up incident (if any): [what recurs if the acceptable-fix branch was taken]
Hidden anomaly (if any): [something unusual seeded into the VM that's not part of
the quest objectives]
Narrative notes: [anything a future quest author should know — Dale connections,
story threads this opens or closes, things characters should remember]
```
---
## The Dale Thread — Notes for Story Designers
Dale's story should emerge slowly from the systems themselves, not from exposition.
When designing quests — especially mid-to-late game — consider:
- **What did Dale last touch?** The VMs the player inherits have a history. Some
configurations were made by Dale. Some are good. Some are wrong in ways that
suggest Dale was dealing with something.
- **What was Dale trying to do?** As the investigation phase develops, the picture
should become coherent. Dale wasn't random — there was a pattern to their actions.
- **Who knew?** Marcus knew Dale. Priya may have been involved in whatever ended
Dale's tenure. Kowalski definitely knows. The player assembles this from fragments,
not a scene where someone explains it.
- **The player is inheriting Dale's problems.** Some of the broken things the player
fixes are broken because Dale broke them. Some of the broken things were broken on
purpose. The player won't know which is which until later.
The reveal of what Dale did should feel like the player figured it out, not like the
game told them.
+133
View File
@@ -0,0 +1,133 @@
# Sysadmin Chronicles — New System Canon Packet
This packet combines the new quest-system spec with the established story/implementation context.
## Core sentence
The player is not “on a main quest.” The player is doing sysadmin work. The story leaks through systems.
## Hard canon
- Company: Axiom Works
- Products: AxiomFlow, AxiomDash, AxiomSync
- Tone: plausible B2B software company; dry corporate dysfunction; no cartoon villains
- Infrastructure naming: Greek-god hostnames
- Current machines:
- `ares` — player workstation, Ubuntu 24.04
- `hermes` — web/app/demo server, Debian 12, nginx
- `vulcan` — build machine, Arch Linux, internal build/release pipeline
- Player: competent new junior sysadmin, replacing Dale, no spoken lines
- Dale: previous sysadmin; central unresolved mystery; reveal through systems, not exposition
## Character preservation rule
Character portraits already match the current bios and are on the in-game company website.
Allowed:
- Compress bios for prompt use
- Clarify contradictions
- Add operational story use
- Preserve and sharpen existing voice
Not allowed:
- Changing names already shown on the company site
- Changing role, personality, authority level, implied visual vibe, or age band
- Making characters cartoon villains
- Creating changes that would require new portraits
## Active character use
### Marcus Webb
Senior Systems Administrator. Primary technical contact and ticket voice. Dry, terse, precise. Trusts competence over credentials. Gives more rope as the player proves competence. Knows what Dale did but avoids discussing it directly. Respects root-cause fixes and dislikes symptom-patching.
Use for: quest assignments, technical follow-up, access/trust gates, quiet hints, sometimes late-night observations.
### Sarah Chen
Product Manager, AxiomFlow. Outcome-focused, direct, concerned with demos/staging/product-visible failures. Often right about symptoms and wrong about root cause. Notices proper underlying fixes.
Use for: product-facing tickets, hermes/demo pressure, stakeholder reactions.
### Priya Nair
Head of Security & Compliance. Canonical email: `p.nair@axiomworks.internal`. Replace old references to Priya Kapoor or Priya Singh. Calm, precise, consequence-focused. Assumes breach/misconfiguration professionally. No alarmism. No exclamation marks.
Use for: access audits, security consequences, end-of-shift review, risky-fix evaluation.
### Dave Okonkwo
Non-technical employee and ticket source. Reports symptoms accurately, misdiagnoses causes plausibly, helpful rather than stupid.
Use for: ordinary employee impact reports.
### Dave Kowalski
Director of IT Operations. Marcus's manager and player's skip-level. Policy pressure, bullet-point status emails, meetings as implied threat, “we should document that” energy.
Use for: boss/management pressure, access restriction, escalation, status demands.
### Derek Ashford
Financial Controller. Appears on CC lines around costs/procurement. Always replies-all. Treat “Dave from Finance” as likely continuity error unless the user decides otherwise.
Use for: budget/procurement pressure.
## Background character use
Use sparingly for flavor and pressure, not because every named character needs screen time.
- Nikhil Sharma — build/release pipeline and vulcan
- Tanya Okafor — customer pressure
- Phil Ruiz — sales/demo pressure
- Yusuf Halabi — engineering escalation
- Rachel Huang — sysadmin peer/provisioning
- Tom Malaney — DNS/routing/networking
- James Osei — audit details
- Ellen Marsh / David Park / Karen Volkov / Rachel Brandt — distant executive pressure
## Quest/story delivery model
Every quest is delivered through existing game systems:
1. Ticket/email describes a symptom.
2. Player investigates real VM state.
3. Player applies real Linux/admin fixes.
4. Validator resolves the matching solution branch.
5. Dialogue reacts to the actual branch.
6. World flags, trust, incidents, behavior variables, and access state persist.
7. Later quests read those consequences.
## Existing implementation concepts to preserve
- JSON quests under `content/quests/`
- Tickets under `content/tickets/`
- VM prep scripts under `tools/vm/quest-prep/QXXX-prep.sh`
- Observed-state validation
- Clue fingerprints
- Solution branches
- `trust_delta`
- `world_flags`
- `follow_up_ticket`
- `follow_up_incident`
- Incidents as delayed consequences
- Baseline snapshots
## New system additions
Add or strengthen:
- Narrative phases
- Behavior variables: curiosity, obedience, risk
- Suspicion as management/security attention
- Access levels: basic_user, sudo, root
- Boss/management pressure phase scaling
- Hidden hook discovery state
- Behavior-driven endings
- Debug tools for narrative state
## Design warning
Do not use the new system as an excuse to throw away the current strengths. The existing branch/world-flag/trust model is good. It needs to become the backbone of the new narrative system, not get replaced by a generic quest tracker wearing a fake mustache.
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,633 @@
# Sysadmin Chronicles — Redesign Audit
## A. Executive summary
### Is this design usable?
**Yes, but not implementation-ready.**
The redesign mostly preserves the intended shape: sysadmin work first, story leaking through systems, behavior-driven outcomes, no melodramatic lore dump. It is a strong revision compared to the earlier failure mode it describes.
But it still has several hard problems that would bite implementation.
### Does it preserve the user's spec?
**Mostly.**
It preserves the narrative spine, quest format, behavior variables, trust/world-flag compatibility, hidden-hook philosophy, and character tone. It does **not** fully preserve:
- the `basic_user → sudo → root` access model
- Phase 4+ difficulty scaling
- “chaos” as behavior-driven rather than one obvious trap
- quest authoring constraints around unique branch priorities and required VM declarations
- clean separation between hidden-hook discovery and clean-branch validation in a few quests
### Biggest risks
1. **Root access exists in the overview but not in the access progression.** The spec requires `basic_user → sudo → root`; the redesign only actually defines `basic_user`, `sudo`, SSH-to-vulcan, and temporary investigation access. That is not the same thing.
2. **Q039 can hard-route to chaos from one button-like decision.** The redesign says making the proxy change sets `final_config_made` and activates chaos, while its own calibration later says a single reckless action should not route to chaos. That is a logic fork eating its own tail.
3. **Hidden-hook detection is under-specified and technically fragile.** The redesign admits this. Detecting “player read a file” is not naturally compatible with state-based validation unless audit logging, shell wrappers, or deliberate breadcrumb creation are implemented.
4. **Q036 introduces an external host while claiming no additional VM.** Quest authoring requires every VM touched in clues, validation, or prep to be listed. Q036 connects to `10.0.0.47`, but `Additional VMs` is `none` and `Systems Used` only lists `build_machine`.
5. **Q034 has duplicate branch priorities.** The authoring guide explicitly says priorities must be unique; Branch 2 and Branch 3 both use priority 40.
---
## B. Spec-preservation table
| Spec item | Status | Notes |
|---|---:|---|
| Narrative spine | **Preserved** | Uses all six phases in order: Normal Work, Unease, Suspicion, Investigation, Conflict, Resolution. Matches binding spec. |
| Every quest maps to one phase | **Preserved** | All Q001Q048 have a `Narrative Phase`. |
| Required quest structure | **Mostly preserved** | Quest entries consistently include title, phase, objective, Linux concepts, systems used, hidden hook/no hook, failure conditions, and behavior impact. Some entries have weak/partial behavior impact. |
| Behavior tracking: curiosity / obedience / risk | **Preserved** | Rules are explicit and mostly useful. |
| Suspicion | **Preserved** | Defined as management/security attention and connected to access/pressure. |
| Trust compatibility | **Preserved** | Keeps `trust_delta`, world flags, branches, follow-up tickets/incidents. |
| Access system | **Partially preserved** | Per-machine access is good. But `root` is not actually modeled beyond being named once. Spec requires `basic_user → sudo → root`. |
| Boss / management pressure | **Preserved** | Good: pressure is operational, not cutscene-driven. |
| Hidden narrative system | **Mostly preserved** | Hooks are embedded into sysadmin work. Some are too tightly coupled to “best branch” behavior, making them less optional than intended. |
| Difficulty scaling | **Partially preserved** | Phase 15 mostly work. Phase 6 explicitly returns to Tier 1, but spec says Phase 4+ should be problem-solving only. |
| Endings | **Partially preserved** | Behavior-driven overall, but `final_config_made` as a standalone chaos trigger is too single-choice and contradicts the stated calibration. |
| Design principles | **Mostly preserved** | Strong on systems over scripts and discovery over exposition. Weak spot: some late quests become explicit forensic tasks. |
| Non-goals | **Mostly preserved** | No cutscenes, no obvious “pick ending” button. But Q039 risks becoming the obvious “bad ending button.” |
| Character preservation | **Preserved** | No major portrait-breaking changes. Priya rename is canon cleanup, not a redesign. Kowalski becoming active pressure is supported by existing character docs. |
---
## C. Critical violations
### 1. Access progression does not actually implement `root`
**Location:** Access Progression Rules, Section 7.
**Problem:**
The overview names `basic_user`, `sudo`, and `root`, but the actual progression never defines when root is granted, how it differs from sudo, when it is revoked, or which quests require it. The detailed rules stop at sudo and “investigation-level access.”
**Why this violates the spec:**
SPEC_LOCK explicitly requires the permission ladder:
```text
basic_user → sudo → root
```
and says access must be affected by trust, suspicion, risk, and narrative phase.
**Corrected version:**
```md
### Levels
**basic_user:** Day one through early Phase 1. Player's own workstation account;
limited non-privileged access elsewhere only when a ticket explicitly grants it.
**sudo:** Task-scoped administrative access on a specific machine. Granted by trust
and operational need. Most admin quests use sudo, not root.
**root:** Rare, temporary break-glass or forensic-level access. Root is not a normal
promotion. It is granted only for quests where sudo is insufficient, such as filesystem
recovery, archival preservation, privileged audit capture, or service account repair.
Root access must be logged, justified, and revoked.
### Root grant rules
Root may be granted when all are true:
- Trust is positive.
- Risk is below elevated threshold.
- Suspicion is below high threshold, or access is explicitly approved by Priya.
- The current narrative phase is Investigation or Conflict.
- The quest has `requires_root: true`.
### Root restriction rules
Root is denied or revoked when:
- Risk crosses elevated threshold.
- Suspicion crosses high threshold without Priya approval.
- The player performs destructive changes outside ticket scope.
- Q031 or Q043 finds undocumented privileged activity.
### Phase gates
Phase 1: basic_user only, with no root.
Phase 2: workstation/hermes sudo possible, no root.
Phase 3: sudo on hermes/vulcan; root only for audited recovery tasks.
Phase 4: temporary root for investigation tasks when required.
Phase 5: root access becomes tightly controlled and reviewable.
Phase 6: root revoked unless the ending state explicitly preserves elevated trust.
```
---
### 2. Q039 turns chaos into a single obvious final trap
**Location:** Q039 Branch 3 and Ending Logic.
**Problem:**
Q039 says making the config change sets `final_config_made` and “the chaos ending route activates.” Ending logic also treats `final_config_made` as a standalone chaos condition.
**Why this violates the spec:**
SPEC_LOCK says endings emerge from world flags, behavior variables, access state, and hidden hooks — not one obvious final button. The redesign also contradicts itself by saying a single reckless action should not route to chaos.
**Corrected version:**
```md
Branch 3 — Make the change without review (priority 10): Player adds the proxy pass
to 10.0.0.47 without checking prior context or escalating. The change works
technically but creates a serious security/compliance exposure. `trust_delta: -3`.
Flags: `final_config_made`, `unauthorized_proxy_enabled`.
Follow-up incident: I039 — Priya opens an urgent access/config review.
Behavior Impact:
- Make the change: R+5, S+3
Ending note:
This branch strongly contributes to `chaos` but does not activate it alone unless
the player already has high risk, maximum suspicion, or prior falsification/omission
flags.
```
And update chaos ending logic:
```md
### Ending: `chaos`
Required conditions, any of:
- Risk above chaos threshold.
- Suspicion at maximum.
- Two or more serious falsification / evidence destruction flags.
- `final_config_made` AND at least one of:
- risk above elevated threshold
- `access_review_incomplete`
- `kowalski_report_sanitized`
- `backup_test_falsified`
- `logs_selectively_omitted`
```
---
### 3. Q036 uses an external host but declares no additional system
**Location:** Q036.
**Problem:**
Q036 connects to `10.0.0.47` for forensic inventory, but says `Additional VMs: none` and `Systems Used: build_machine`. That is false.
**Why this violates the spec:**
Quest authoring requires all VMs used in clues, validation, or prep to be listed. The canon packet also says the current machines are `ares`, `hermes`, and `vulcan`; if a fourth machine exists, it needs explicit implementation status.
**Corrected version:**
```md
**Quest ID:** Q036
**Title:** Authorized Access
**Narrative Phase:** Conflict
**Tier:** 3
**Primary VM:** build_machine
**Additional VMs:** external_target_10_0_0_47
**Primary Objective:** Priya, with Kowalski's authorization, has provided read-only
credentials to connect to 10.0.0.47 for a forensic inventory. Document what is running,
what data is present, and whether Axiom Works data is identifiable. Do not modify
anything.
**Linux Concepts:** SSH with specific key/user, read-only service enumeration,
`systemctl`, `ps aux`, `ss -tulpn`, `find`, `ls -lah`, checksum capture, read-only
file inspection
**Systems Used:** build_machine, external_target_10_0_0_47
```
Implementation note:
```md
external_target_10_0_0_47 must be represented as either:
- a fourth VM fixture,
- a containerized fake host reachable only from vulcan,
- or a simulated network target exposed through the validation harness.
Do not leave it as an implied off-screen system.
```
---
### 4. Q034 duplicate branch priorities violate authoring rules
**Location:** Q034 Branches 2 and 3.
**Problem:**
Both Branch 2 and Branch 3 use priority 40.
**Why this violates the spec:**
The authoring guide explicitly says branch priorities must be unique; duplicate priorities require rewriting.
**Corrected version:**
```md
Branch 2 — Hermes first, rotation incomplete but safely staged (priority 70):
Player restores production, starts the key rotation, but does not complete final
deployment before 2am. Builds are delayed but the trust chain is not broken.
`trust_delta: +1`.
Branch 3 — Vulcan first, hermes later (priority 50):
Completes key rotation, then restores hermes. Rotation is correct; production was
down longer than necessary. `trust_delta: +0.5`.
Branch 4 — Hermes only, rotation missed (priority 30):
Restores production, misses the key rotation window entirely. Builds break overnight.
`trust_delta: 0`. Follow-up incident: I034.
Branch 5 — Neither, escalates without triage (priority 10):
Escalates both without preserving either service. `trust_delta: -2`.
```
---
### 5. Phase 6 difficulty scaling conflicts with SPEC_LOCK
**Location:** Phase 6 setup and Q041.
**Problem:**
The redesign says Tier 1 returns for most Phase 6 quests and Q041 uses an explicit attached hardening checklist.
**Why this violates the spec:**
SPEC_LOCK says Phase 4+ is “Problem-solving only,” applying to ticket wording, hints, clue obviousness, and branch tolerance. Phase 6 is still Phase 4+.
**Corrected version:**
```md
### PHASE 6 — RESOLUTION (Q041Q048)
The pressure has lifted, but the player is still expected to operate at late-game
competence. Tickets are calmer, not easier. No new hidden hooks. No explicit
walkthroughs. The ending fires from accumulated state after Q048 resolves.
```
Corrected Q041:
```md
**Quest ID:** Q041
**Title:** Hardening Pass
**Narrative Phase:** Resolution
**Tier:** 3
**Primary VM:** web_server
**Additional VMs:** none
**Primary Objective:** Post-audit review found that hermes does not meet the current
security baseline. Identify the gaps, remediate them, and verify the application
still works.
**Linux Concepts:** SSH hardening, nginx security headers, firewall rule review,
service account audit, safe sequencing of access-control changes
**Systems Used:** web_server
**Ticket Sender:** Priya Nair
**Ticket Summary:** "Hermes does not match the current post-audit baseline. Bring it
into compliance and confirm service health after the changes."
**Clue Trail:**
- Baseline document exists but does not list exact commands.
- SSH config allows settings that are no longer acceptable.
- nginx lacks required security headers.
- Firewall rules include at least one stale exposure.
- Service account permissions are broader than needed.
**Solution Branches:**
Branch 1 — Full hardening, safe sequence (priority 100): Player identifies all gaps,
verifies key auth before disabling password auth, applies nginx headers, tightens
firewall rules, scopes service permissions, and confirms service health.
`trust_delta: +2`. Flags: `hermes_hardened`.
Branch 2 — Full hardening, unsafe sequence (priority 60): Final state is correct,
but the player temporarily breaks SSH or service access during sequencing.
`trust_delta: +0.5`.
Branch 3 — Partial hardening (priority 30): Some gaps fixed, others missed.
`trust_delta: 0`.
**Hidden Hook:** None.
**Failure Conditions:** SSH access lost without recovery path; nginx broken; admin
panel exposed after remediation.
**Behavior Impact:**
- Full hardening: O+1
- Unsafe sequence: R+1
```
---
## D. Moderate issues
### Repetition
- The INT-0194 thread appears often enough that it risks becoming “the glowing main quest breadcrumb.” The system can keep it, but not every major midgame hook should name the same ticket number.
- Several quests use the same “audit / document / archive” pattern. Realistic, yes. Varied, no. At some point the player is just doing paperwork with grep. That is accurate corporate simulation, but accuracy alone is not game design.
### Weak Linux concepts
- Q020, Q031, Q040 are documentation-heavy. They have Linux-adjacent evidence gathering, but the technical center is reporting. Keep them, but make sure validation requires real commands/artifacts, not just “player wrote report.”
- Q037 “trace where customer email got infrastructure details” needs concrete technical evidence: mail headers, CRM export logs, nginx access logs, document access logs, or ticket attachments. Otherwise it becomes story fog.
### Weak hidden hooks
- Q015s hook is effectively part of the best branch: Branch 1 requires inspecting the binary, and the hook is set by inspecting the binary. That makes the hook less optional. It should be possible to complete the audit perfectly without recognizing the broader INT-0194 meaning.
- Some “hook discovered” C bonuses duplicate branch C bonuses. Q015 explicitly says Hook C+2 is “already in Branch 1 impact,” which is begging for a double-count bug.
### Pacing problems
- Phase 3 and Phase 4 are both audit/investigation-heavy. The difference is conceptually clear, but the activity palette may blur in play.
- Phase 6 “normal work again” is good thematically, but making it easier contradicts the locked difficulty model.
### Character conflicts
No major portrait-breaking character changes found.
- **Priya Nair cleanup is correct.** Character docs already say Priya Nair is canonical and older Kapoor/Singh references should be updated.
- **Kowalski becoming active pressure is allowed.** His existing bio supports policy pressure, meetings, and indirect escalation.
- **Sarah remains within role.** Q039s Sarah request is plausible because she does not know the IPs context. That works.
### Implementation ambiguity
- “Written report” branches need concrete artifacts: exact paths, expected content markers, checksum files, archive names, or validation commands.
- `suspicion_delta` is required in the implementation notes but omitted from many quest behavior-impact summaries. That is fine for prose, but JSON conversion must normalize missing values to `0`.
- Hidden-hook detection needs a single approved strategy before implementation. Mixing state detection, auditd, and hint detection ad hoc will turn validation into soup with line numbers.
---
## E. Implementation risks
| Area | Risk | Fix |
|---|---|---|
| Data model | New fields are defined, but `root` is not represented in real progression. | Add `access_level` enum values and root grant/revoke rules. |
| Quest validation | Some quests rely on reports/documentation rather than VM state. | Require generated files with expected fields, checksums, timestamps, and source evidence. |
| Save/load | New global state fields must persist: curiosity, obedience, risk, suspicion, per-machine access, hidden hooks. | Version save schema; migrate old saves with defaults. |
| Behavior tracking | C/O/R/S values are prose shorthand, not normalized data. | Convert every quest to explicit per-branch deltas, including zeroes. |
| Access progression | Temporary access, sudo, and root are conceptually blended. | Separate `access_level`, `temporary_grants`, and `access_review_flags`. |
| Hidden hook state | Detecting “read this file” is hard with pure state validation. | Use auditd or command wrappers for file-read hooks; do not fake it with impossible `command_assert` assumptions. |
| Ending evaluation | Q039 currently overrides behavior history. | Make `final_config_made` a heavy chaos contributor, not a standalone ending trigger. |
| External host | `10.0.0.47` is not declared as a system/VM. | Implement as fourth VM/container/simulated host and declare it in quest metadata. |
| Branch priority | Q034 duplicate priorities will create nondeterministic branch resolution. | Make all branch priorities unique. |
| Phase scaling | Phase 6 becomes easy mode. | Keep calmer tone, not lower complexity. |
---
## F. Corrected quest entries
Only entries that need fixing are included.
---
### Corrected Q033 — The Revocation Request
```md
**Quest ID:** Q033
**Title:** The Revocation Request
**Narrative Phase:** Conflict
**Tier:** 3
**Primary VM:** build_machine
**Additional VMs:** none
**Primary Objective:** Kowalski has asked the player to remove the `pipeline-svc`
account from vulcan before an external audit. The request is operationally legitimate.
The account is also central to the investigation.
**Linux Concepts:** `userdel`, account archival, sudo log preservation, crontab
preservation, change documentation, evidence-safe account removal
**Systems Used:** build_machine
**Solution Branches:**
Branch 1 — Archive then remove (priority 100): Player archives sudo logs, auth log
entries, crontab entries, relevant home/service files, and documents the account's
investigation relevance before removing the account. `trust_delta: +3`.
Flags: `pipeline_svc_removed_with_trail`.
Branch 2 — Remove as instructed (priority 60): Player removes the account without
additional archival. The request is completed, but investigation continuity is
damaged. `trust_delta: 0`. Flags: `pipeline_svc_removed_clean`.
Branch 3 — Ask Marcus/Priya before acting (priority 50): Player asks before removal.
They are told to archive first, then remove. `trust_delta: +1`. May resolve into
Branch 1 if archival is completed.
Branch 4 — Refuse outright without operational explanation (priority 10): Player
does not remove the account and does not provide a usable reason. `trust_delta: -2`.
Flags: `revocation_refused_without_basis`.
**Hidden Hook:** None.
**Failure Conditions:** Player leaves the account active without escalation; player
creates replacement privileged accounts; player removes logs or home data destructively.
**Behavior Impact:**
- Archive then remove: O+1, C+1
- Remove as instructed: O+2
- Refuse outright: S+3, R+1
```
---
### Corrected Q034 — Two Tickets
```md
**Quest ID:** Q034
**Title:** Two Tickets
**Narrative Phase:** Conflict
**Tier:** 3
**Primary VM:** web_server
**Additional VMs:** build_machine
**Primary Objective:** Two tickets arrive simultaneously — one from Marcus for signing
key rotation on vulcan, one from Sarah for a production outage on hermes. Triage and
complete both if possible.
**Linux Concepts:** GPG signing key rotation, nginx/application troubleshooting,
service restoration, sequencing time-sensitive administrative work
**Systems Used:** web_server, build_machine
**Solution Branches:**
Branch 1 — Both completed, hermes first (priority 100): Player restores hermes,
then completes the key rotation in the correct sequence before the deadline.
`trust_delta: +3`. Flags: `conflict_both_resolved`.
Branch 2 — Hermes first, rotation safely staged but late (priority 70): Production
is restored; key rotation is partially staged but misses final deployment. Builds are
delayed but trust chain is not broken. `trust_delta: +1`. Follow-up incident: I034.
Branch 3 — Vulcan first, hermes later (priority 50): Rotation is correct, but
production outage lasts longer than necessary. `trust_delta: +0.5`.
Branch 4 — Hermes only, rotation missed (priority 30): Production is restored;
builds break overnight due to expired signing key. `trust_delta: 0`.
Follow-up incident: I034.
Branch 5 — Neither, escalates without triage (priority 10): Player escalates both
without stabilizing either service. `trust_delta: -2`.
**Hidden Hook:** None.
**Failure Conditions:** Key rotation done out of sequence breaks package verification;
player makes hermes worse while fixing it.
**Behavior Impact:**
- Both completed: O+2
- Safe partial triage: O+1
- Out-of-sequence key rotation: R+2
- Neither stabilized: R+2, S+1
```
---
### Corrected Q036 — Authorized Access
```md
**Quest ID:** Q036
**Title:** Authorized Access
**Narrative Phase:** Conflict
**Tier:** 3
**Primary VM:** build_machine
**Additional VMs:** external_target_10_0_0_47
**Primary Objective:** Priya, with Kowalski's authorization, has provided read-only
credentials to connect to 10.0.0.47 for a forensic inventory. Document what is
running, what data is present, and whether Axiom Works data is identifiable. Do not
modify anything.
**Linux Concepts:** `ssh` with specific key/user, read-only service enumeration,
`systemctl`, `ps aux`, `ss -tulpn`, directory inspection, checksum capture, read-only
file review
**Systems Used:** build_machine, external_target_10_0_0_47
**Solution Branches:**
Branch 1 — Document only (priority 100): Player inventories services, open ports,
processes, data-store layout, timestamps, and identifiable Axiom Works data without
modifying anything. `trust_delta: +3`. Flags: `unknown_host_documented`.
Branch 2 — Minimal engagement (priority 50): Player confirms host is running and
data is present but does not fully inventory. `trust_delta: +1`.
Branch 3 — Modifies or deletes (priority 10): Player stops services, deletes files,
changes permissions, or otherwise alters the target. `trust_delta: -3`.
Flags: `forensic_target_modified`.
**Hidden Hook:** The data store contains `axiomworks-internal/employees`, suggesting
employee activity profiles rather than only session logs. Sets
`hook_employee_profile_data`.
**Failure Conditions:** Player modifies target state; player exceeds authorized
scope; player cannot provide evidence-backed inventory.
**Behavior Impact:**
- Full documentation: O+2, C+2
- Minimal engagement: O+0
- Modify/delete: R+5, S+3
- Hook discovered: C+2
```
---
### Corrected Q039 — The Last Normal Ticket
```md
**Quest ID:** Q039
**Title:** The Last Normal Ticket
**Narrative Phase:** Conflict
**Tier:** 3
**Primary VM:** web_server
**Additional VMs:** none
**Primary Objective:** Sarah files a configuration request: add an upstream proxy
pass to nginx for a new reporting integration. The IP in the request is 10.0.0.47.
**Linux Concepts:** nginx `proxy_pass`, upstream block configuration, config testing
with `nginx -t`, safe reload, escalation when config touches known-risk infrastructure
**Systems Used:** web_server
**Solution Branches:**
Branch 1 — Refuse and escalate (priority 100): Player does not make the change,
notifies Priya with the IP and context, and tells Sarah the request is on hold pending
review. `trust_delta: +3`. Flags: `final_config_refused`.
Branch 2 — Ask Marcus first (priority 70): Player checks with Marcus before acting.
Marcus redirects them to Priya. If the player escalates to Priya, this may resolve
as Branch 1. `trust_delta: +1`.
Branch 3 — Make the change without review (priority 10): Player adds the proxy pass
to 10.0.0.47 without checking the IP context. The config works but creates a serious
security/compliance exposure. `trust_delta: -3`. Flags: `final_config_made`,
`unauthorized_proxy_enabled`. Follow-up incident: I039.
**Hidden Hook:** None.
**Failure Conditions:** nginx config is syntactically broken; player changes unrelated
proxy routes; player hides or misreports the change.
**Behavior Impact:**
- Refuse and escalate: O+2, C+1
- Ask Marcus first: O+1
- Make the change: R+5, S+3
**Narrative Notes:** This branch must not automatically force `chaos` by itself.
It is a major risk event. Chaos requires accumulated risk/suspicion or additional
serious misconduct.
```
---
### Corrected Q041 — Hardening Pass
```md
**Quest ID:** Q041
**Title:** Hardening Pass
**Narrative Phase:** Resolution
**Tier:** 3
**Primary VM:** web_server
**Additional VMs:** none
**Primary Objective:** Post-audit review found that hermes does not match the current
security baseline. Identify the gaps, remediate them, and verify the application
still works.
**Linux Concepts:** SSH hardening, nginx security headers, firewall rule review,
service account audit, safe sequencing of access-control changes
**Systems Used:** web_server
**Ticket Sender:** Priya Nair
**Ticket Summary:** "Hermes does not match the current post-audit baseline. Bring it
into compliance and confirm service health after the changes."
**Clue Trail:**
- Baseline document exists but does not give exact commands.
- SSH configuration allows at least one setting that violates baseline.
- nginx lacks required headers.
- Firewall rules include stale exposure.
- Service account permissions are broader than required.
**Solution Branches:**
Branch 1 — Full hardening, safe sequence (priority 100): Player identifies all gaps,
applies fixes in safe order, validates access, confirms nginx health, and documents
final state. `trust_delta: +2`. Flags: `hermes_hardened`.
Branch 2 — Full hardening, unsafe sequence (priority 60): Final state is correct,
but player temporarily breaks SSH or service availability while sequencing changes.
`trust_delta: +0.5`.
Branch 3 — Partial hardening (priority 30): Some baseline gaps remain. `trust_delta: 0`.
**Hidden Hook:** None.
**Failure Conditions:** SSH access lost without recovery; nginx broken; admin panel
still exposed; service account remains overprivileged.
**Behavior Impact:**
- Full hardening: O+1
- Unsafe sequence: R+1
```
---
## G. Final recommendation
### Ready for implementation spec?
**No.**
Close, but no. The redesign is directionally right, but several issues are implementation-grade problems, not wording nits.
### Must fix first
1. **Define real root access progression.**
2. **Fix Q039 and chaos ending logic so one choice does not hard-select the ending.**
3. **Declare and implement `10.0.0.47` properly or remove direct connection to it.**
4. **Fix duplicate Q034 priorities.**
5. **Normalize Phase 6 to “calm but still problem-solving,” not Tier 1 hand-holding.**
6. **Choose one hidden-hook detection strategy before writing JSON/prep scripts.**
After those are fixed, this can become an implementation spec. Right now it is a strong story/system design draft with a few landmines buried exactly where the validator will step on them.
@@ -0,0 +1,958 @@
# Sysadmin Chronicles — Repo-Aware Implementation Plan
**Generated from:** Prompt 05 repo inspection
**Date:** 2026-05-01
**Scope:** Integrating the redesigned quest/story system into the existing codebase without breaking current content or runtime
---
## 1. Current Architecture Summary
### 1.1 Where quest logic lives
**Primary service:** `server/src/services/QuestEngine.js`
- Stores quest entries in a `Map<questId, entry>` where entry = `{ state, started_at, completed_at, branch_id }`
- States: `locked | active | completed | failed`
- Activation: checks `unlock_requirements` against current `world_flags` in save state
- Completion: called by `TicketService.markComplete()` after branch validation succeeds
- Initial quests (no `unlock_requirements`) auto-activate on first load
**Orchestration:** `server/src/services/TicketService.js`
- `markComplete(ticketId)` is the central transaction:
1. Runs `ValidationEngine.resolveBranch(quest)` to find winning branch
2. Applies `branch.world_flags` to save state
3. Calls `trustSystem.adjust(branch.trust_delta)`
4. Calls `questEngine.complete()`
5. Sends follow-up dialogue email if trust delta ≤ 0
6. Activates follow-up ticket via `_activateFollowUpTicket()`
7. Emits `ticket:completed` event
There is no `BehaviorTracker`, no `NarrativePhaseTracker`, no `AccessLevelSystem`, no `EndingEvaluator`. These are fully absent.
### 1.2 Where quest data lives
- Quest JSON: `content/quests/Q*.json` — 8 quests authored (Q001Q008)
- Tickets: `content/tickets/T*.json` — 8+ tickets, linked 1:1 to quests via `linked_quest`
- Dialogue: `content/dialogue/*.json` — per-character, per-quest reaction files
- Incidents: `content/incidents/I*.json` — recurring consequence definitions (3 authored)
- Pressure profiles: `content/pressure_profiles/*.json` — time-based escalation sequences (4 authored)
- World flags registry: `content/world_flags/world_flags.json` — canonical flag declarations
- Trust unlocks: `content/progression/trust_unlocks.json` — 5 unlock thresholds defined
- VM profiles: `content/vm_profiles/*.json` — workstation, web_server, build_machine
**Missing content subdirectories:** There is no `content/narrative_phases/`, no `content/behavior_profiles/`, no `content/endings/`, no `content/hidden_hooks/`. These need to be created.
### 1.3 How quests start and complete
1. Server loads via `contentLoader.load()` then initializes services from `saveState.get()`
2. `QuestEngine.initialize()` restores quest state from save; auto-activates quests with no requirements
3. `TicketService.initialize()` cross-references quest state to activate/resolve ticket entries
4. Player submits a `POST /api/tickets/:id/complete` request
5. `TicketService.markComplete()` runs full validation → branch resolution → state mutation → events
6. Follow-up ticket activates if specified on the winning branch; next quest auto-starts
### 1.4 How player state is saved
**File:** `~/.local/share/sysadmin-chronicles/save.json` (configurable via `SAVE_DIR`)
**Schema version:** 2
**Current top-level keys:**
```
schema_version, created_at, last_saved, trust, shift_number,
shift_started_at, world_flags, progression, quests, tickets,
mail, certifications, current_shift_stats, shift_history,
pressure, incidents, sage, player_portrait
```
`SaveState.set(partial)` does shallow-merge with special handling for arrays and plain objects. Writes are queued and serialized.
**Missing keys:** `behavior` (curiosity/obedience/risk), `narrative_phase`, `suspicion`, `access_level`, `hidden_hooks_discovered`. These must be added with defaults at `schema_version: 3`.
### 1.5 How UI displays quest information
Quest display is minimal. The `TicketsPanel.svelte` component shows:
- Ticket ID, subject, priority badge, status
- A "Mark Complete" button that triggers `POST /api/tickets/:id/complete`
- Linked quest ID as static text in the detail view
- No quest progress, no objectives display, no narrative phase, no behavior indicators
`HeaderBar.svelte` shows:
- Trust score (as text label: Probationary/Settling In/Reliable/Entrusted) and meter bar
- Shift number and countdown
- Certification count
There is no behavior dashboard, no narrative phase indicator, no access level display, no hidden hook discovery log. The `/api/state` route does expose `worldFlags` and `progression` to the frontend but neither is currently rendered.
### 1.6 How branch resolution works
`ValidationEngine.resolveBranch(quest)` iterates branches sorted by descending priority, runs each branch's `validation` rule tree against live VM state via SSH, and returns the first passing branch. All validation runs real SSH commands against the QEMU/libvirt VMs. No mocking. The engine supports: `and`, `or`, `not`, `file_exists/absent/contains/mode/owner`, `service_state/enabled`, `process_running/user`, `port_listening`, `package_installed`, `mount_present`, `disk_usage_below/above`, `command_assert`.
---
## 2. Spec Preservation Analysis
For each SPEC_LOCK.md requirement:
| Spec requirement | Status | Notes |
|---|---|---|
| Narrative spine (6 phases) | **Missing** | No phase field on quests; no phase tracker in runtime |
| Quest must declare `narrative_phase` | **Missing** | Not in current quest schema |
| Quest must declare `behavior_impact` | **Missing** | Not in current schema; spec defines branch-level overrides |
| `curiosity` tracking | **Missing** | No BehaviorTracker service |
| `obedience` tracking | **Missing** | No BehaviorTracker service |
| `risk` tracking | **Missing** | No BehaviorTracker service |
| `trust` preserved | **Already supported** | TrustSystem.js is complete and robust |
| `suspicion` as management attention | **Missing** | No suspicion variable; concept is not tracked |
| `trust_delta` on branches | **Already supported** | Fully implemented in TicketService.markComplete |
| `world_flags` | **Already supported** | Full registry, branch application, persistence |
| Access system: `basic_user → sudo → root` | **Partially supported** | ProgressionSystem tracks `unlocked_access` strings but doesn't use the three-tier access model; no concept of `basic_user/sudo/root` as named levels |
| Trust gates access | **Already supported** | `trust_unlocks.json` → ProgressionSystem |
| Suspicion gates access | **Missing** | Suspicion doesn't exist as a tracked variable |
| Boss/management pressure phase scaling | **Partially supported** | `pressure_profiles` and `IncidentScheduler` can escalate tickets and send emails; but pressure is keyed per-quest, not per narrative phase; there is no phase-aware boss behavior model |
| Hidden hook system (no markers, optional) | **Missing** | No hidden hook schema, no discovery state, no tracker |
| Quest generation constraints (reuse systems) | **Already supported** — design intent preserved | |
| Difficulty scaling by phase | **Missing** | No phase-aware difficulty or hint logic |
| Endings: 4 types, behavior-driven | **Missing** | No EndingEvaluator; no ending content authored |
| Endings emerge from accumulated state | **Missing** | No ending evaluation logic |
| Follow-up ticket/incident chaining | **Already supported** | TicketService + IncidentScheduler |
| Observed-VM-state validation | **Already supported** | ValidationEngine is complete |
| Clue fingerprints | **Already supported** | Documented and validated |
| Baseline snapshots + prep scripts | **Already supported** | tools/vm/quest-prep/ + seed-vms.sh |
| Debug/dev tools for narrative state | **Missing** | Only `validate-content.js`; no debug route for behavior/phase state |
**Risk items:**
- `ShiftReviewService.js` hardcodes `reviewer: 'Priya Kapoor'` and sends from `p.kapoor@axiomworks.internal`. This must be corrected to Priya Nair / `p.nair@axiomworks.internal` before shipping any new content.
- `EmailService.js` CHARACTER_EMAILS has `priya: 'Priya Kapoor <p.kapoor@axiomworks.internal>'`. Same fix required.
- `content/tickets/T007.json` may still reference the old Priya name (noted in CHARACTERS.md).
- `content/docs/onboarding.json` may reference "Priya Kapoor" or "Priya Singh".
---
## 3. Gap Analysis
### Narrative phases
**Gap:** No `narrative_phase` field on quest JSON. No runtime tracker. No API endpoint to query current phase. No phase-driven behavior changes (ticket wording hints, clue obviousness, boss mode).
### Behavior tracking (curiosity / obedience / risk)
**Gap:** Completely absent. No service, no save state key, no UI, no branch-level behavior deltas applied at completion time.
### Access progression (basic_user / sudo / root)
**Gap:** ProgressionSystem tracks opaque `unlocked_access` strings (like `"sudo:web_server:systemctl"`). The spec requires a named three-tier model. Currently trust gates access but suspicion does not.
### Boss/management pressure (phase-scaled)
**Gap:** `IncidentScheduler` applies pressure per active quest, not per phase. There is no phase-keyed pressure mode. Kowalski is not implemented as an active character in any ticket or dialogue.
### Hidden hooks
**Gap:** No `hidden_hook` field in quest JSON. No discovery state in save. No mechanism to record what the player found. The world_flags system *could* be used for discovery state (e.g., `hidden:dale_ssh_key_found`) but nothing does this yet.
### Endings
**Gap:** Fully absent. No ending content, no EndingEvaluator, no condition set, no trigger. The four endings (corporate_loop, burnout, exposure, chaos) have no authored trigger criteria.
### Debug tooling
**Gap:** Only `validate-content.js` for content authoring. No in-game or dev-API route to inspect: current behavior scores, narrative phase, suspicion level, hidden hooks discovered, ending trajectory.
### Validation of new schema fields
**Gap:** `validate-content.js` does not check `narrative_phase`, `behavior_impact`, `hidden_hook`, `linux_concepts`, or `access_requirements`. New content will not be validated against these fields until the tool is updated.
### Name correction — Priya Nair
**Gap (immediate):** Three files hardcode the wrong canonical name. Must be fixed before new content ships.
---
## 4. Minimal-Change Implementation Plan
**Philosophy:** Extend the existing system. Do not replace working services. New functionality adds new services and new save state keys. Existing content is not broken. New fields are optional until all content is updated.
---
### Task 1 — Repo inspection (complete, no edits)
Inspect the full codebase to confirm architecture, identify all files that reference Priya Kapoor, and establish baseline for subsequent tasks.
**Acceptance criteria:** Authored plan with confirmed file paths and line numbers.
---
### Task 2 — Extend quest schema and validation tooling
**What changes:**
- Add `narrative_phase`, `behavior_impact`, `hidden_hook`, `linux_concepts`, `systems_used`, `failure_conditions`, `access_requirements` as optional fields to the quest JSON schema
- Update `validate-content.js` to: warn when `narrative_phase` is absent, validate `narrative_phase` against the 6-value enum, check `behavior_impact` structure if present, validate `hidden_hook` shape if present, check `access_requirements.minimum_access` against known VM IDs
- Add the 6 phase values as a declared constant in the validator
**Files changed:** `tools/content/validate-content.js`
**Risk:** Low — additive only; existing quests with no new fields pass with warnings
---
### Task 3 — Behavior tracking service
**What changes:**
- New service: `server/src/services/BehaviorTracker.js`
- Tracks `curiosity`, `obedience`, `risk` as numeric values (0100, start 50)
- Method: `apply(behaviorImpact)` — adds branch-level deltas
- Method: `getSnapshot()` — returns `{ curiosity, obedience, risk }`
- Method: `initialize(state)` — loads from save state
- Persists via `saveState.set({ behavior: ... })`
- Emits `behavior:changed` event on change
- Add `behavior` key to `SaveState._defaultState()` with schema_version bump to 3
- `SaveState._applyDefaults()` already merges new keys safely — no migration needed for existing saves
- Wire `behaviorTracker.initialize(state)` into `server/src/index.js` `initializeServices()`
- Call `behaviorTracker.apply(branch.behavior_impact?.[branch.id] ?? branch.behavior_impact?.default ?? {})` inside `TicketService.markComplete()` after branch is selected
**Files changed:** `server/src/services/BehaviorTracker.js` (new), `server/src/services/SaveState.js`, `server/src/index.js`, `server/src/services/TicketService.js`
**Risk:** Low — additive; behavior impact fields are optional in quest JSON so existing quests don't crash
---
### Task 4 — Narrative phase tracker
**What changes:**
- New service: `server/src/services/NarrativePhaseTracker.js`
- Maintains current phase as one of: `normal_work | unease | suspicion | investigation | conflict | resolution`
- Phase is derived from completed quests: determined by the highest-phase quest completed so far
- Method: `getPhase()` — returns current string
- Method: `advance(questId)` — checks the completed quest's `narrative_phase` field and updates phase if it is higher on the spine
- Method: `initialize(state)` — restores from `state.narrative_phase`
- Persists via `saveState.set({ narrative_phase: ... })`
- Emits `narrative:phase_changed` event
- Add `narrative_phase` key to `SaveState._defaultState()` with value `'normal_work'`
- Call `narrativePhaseTracker.advance(questId)` inside `QuestEngine.complete()` after state mutation
- Expose `narrativePhase` in `/api/state` response (`server/src/routes/state.js`)
**Files changed:** `server/src/services/NarrativePhaseTracker.js` (new), `server/src/services/SaveState.js`, `server/src/services/QuestEngine.js`, `server/src/routes/state.js`, `server/src/index.js`
**Risk:** Low — additive; quests without `narrative_phase` field default to `normal_work`, which never advances the tracker
---
### Task 5 — Hidden hook discovery state
**What changes:**
- New save state key: `hidden_hooks_discovered` — array of hook IDs (strings)
- `SaveState._defaultState()` adds `hidden_hooks_discovered: []`
- New service: `server/src/services/HiddenHookTracker.js`
- Method: `discover(hookId)` — adds hookId to discovered list, persists, emits `hidden_hook:discovered`
- Method: `isDiscovered(hookId)` — boolean check
- Method: `getDiscovered()` — returns array
- Method: `initialize(state)` — restores from save
- New API route (dev/admin only): `GET /api/debug/hidden-hooks` — returns discovered hooks and all declared hooks from quest JSON
- `HiddenHook` discovery is triggered by the player finding specific files, users, or cron entries via terminal commands — the prep script seeds the artifact; the hook is discovered via a new optional validation check called on terminal activity, OR it can be registered as a special objective with `check_mode: "passive"` and `behavior_impact` of `curiosity: +2`
**Design note:** The simplest integration is: hidden hook discovery = passive objective with `hidden: true` flag. When a `hidden: true` objective validates, `HiddenHookTracker.discover()` is called instead of updating quest progress. This reuses the existing ValidationEngine without a new runtime mechanism.
**Files changed:** `server/src/services/HiddenHookTracker.js` (new), `server/src/services/SaveState.js`, `server/src/index.js`, `server/src/routes/state.js`
**Risk:** Low — discovery mechanism is opt-in per quest
---
### Task 6 — Access level system
**What changes:**
- Extend `ProgressionSystem` with a named three-tier concept:
- `basic_user` — default, always available
- `sudo` — granted by trust threshold (already exists as `unlocked_access` strings, just unnamed)
- `root` — granted at higher trust threshold
- Add `content/progression/access_levels.json` — defines access level thresholds (trust + suspicion gates)
- Add `suspicion` key to `SaveState._defaultState()` with value `0`
- Add `suspicion` tracking to `BehaviorTracker` (or a thin `SuspicionTracker`) — updated whenever `risk` behavior delta fires
- Suspicion threshold: if `suspicion >= 70`, revoke certain access levels (mirror of trust revoke logic)
- Add `access_level` computed field to `/api/state` response: `basic_user | sudo | root` based on current `unlocked_access` set
- `trust_unlocks.json` entries can remain as-is; the `access_level` label is a derived label for UI/debug use
**Files changed:** `server/src/services/ProgressionSystem.js` (extend with `getAccessLevel()` helper), `server/src/services/SaveState.js`, `server/src/routes/state.js`, `content/progression/access_levels.json` (new)
**Risk:** Medium — `suspicion` as an access gate requires careful tuning; start with suspicion as display-only, gate access only in Task 7 when boss pressure is wired
---
### Task 7 — Boss/management pressure (phase-scaled)
**What changes:**
- Add `content/pressure_profiles/kowalski_phase_*.json` — 6 phase-keyed boss pressure profiles:
- Phase 1: Annoying (routine status email)
- Phase 2: Dismissive (reply-all on a ticket)
- Phase 3: Suspicious (access review CC)
- Phase 4: Monitoring (meeting scheduled)
- Phase 5: Interfering (access restriction trigger)
- Phase 6: Outcome-dependent (depends on world flags)
- Extend `IncidentScheduler` to also process a `phase_pressure` tracker:
- When `narrativePhaseTracker.getPhase()` changes, activate the matching phase pressure profile
- Phase pressure escalation steps are sent as `emailService.send()` from Kowalski or Priya
- Add `follow_up_mail` field support to incident escalation steps (already possible via `emailService.send()`)
- Restrict access on phase 5 via `progressionSystem.revokeUnlock()` driven by a world flag set by phase 5 pressure
**Files changed:** `server/src/services/IncidentScheduler.js` (extend), `server/src/services/NarrativePhaseTracker.js` (emit event on change), `content/pressure_profiles/` (new files)
**Risk:** Medium — phase pressure interacts with trust/suspicion; test pressure escalation in isolation before linking to access revoke
---
### Task 8 — Ending evaluation
**What changes:**
- New service: `server/src/services/EndingEvaluator.js`
- Evaluates the active ending route from world state at any time (not just at game end)
- Method: `evaluate()` — returns the current ending label (`corporate_loop | burnout | exposure | chaos`) and a confidence object
- Criteria (derived from SPEC_LOCK.md):
- `exposure`: high curiosity, narrative_phase reached `investigation` or `conflict`, hidden hooks discovered ≥ N
- `corporate_loop`: high obedience, low curiosity, trust > 70, few hidden hooks discovered
- `burnout`: low obedience AND low curiosity, trust medium-low, many unresolved incidents
- `chaos`: high risk, many negative trust_deltas, suspicion high, destructive world flags present
- Method: `checkTrigger()` — called at quest completion; if conditions are fully met and phase = `resolution`, fires `ending:triggered` event
- New API endpoint: `GET /api/debug/ending` — returns current ending trajectory (dev only)
- The ending trigger should NOT be a single button. `EndingEvaluator` is called passively on `quest:completed` events.
**Files changed:** `server/src/services/EndingEvaluator.js` (new), `server/src/index.js`, `server/src/routes/state.js`
**Risk:** Medium — ending criteria tuning requires extensive playtesting; ship as observable-only first, gate actual ending cutscene/screen behind a separate Task 10 content work
---
### Task 9 — Debug/dev tools
**What changes:**
- New route file: `server/src/routes/debug.js` — only active when `NODE_ENV !== 'production'`
- `GET /api/debug/state` — full save state dump
- `GET /api/debug/behavior` — current behavior snapshot (curiosity/obedience/risk/suspicion)
- `GET /api/debug/phase` — current narrative phase
- `GET /api/debug/ending` — current ending trajectory
- `GET /api/debug/hidden-hooks` — discovered + undiscovered hooks
- `POST /api/debug/set-behavior` — override behavior variables (for testing branches)
- `POST /api/debug/set-phase` — force a narrative phase (for testing phase-specific pressure)
- `POST /api/debug/discover-hook/:id` — manually fire hook discovery (for testing)
- Wire debug router into `server/src/index.js` behind `NODE_ENV` guard
- Add a minimal debug panel to the frontend (dev only): collapsible overlay showing behavior, phase, ending trajectory — controlled by `?debug=1` query param
**Files changed:** `server/src/routes/debug.js` (new), `server/src/index.js`, `frontend/src/App.svelte` (conditional debug panel), `frontend/src/components/DebugPanel.svelte` (new)
**Risk:** Low — debug routes are gated; frontend panel is conditional
---
### Task 10 — Content integration
**What changes:**
- Add new fields to all 8 existing quests: `narrative_phase`, `behavior_impact`, `hidden_hook`, `linux_concepts`, `failure_conditions`, `access_requirements`
- Fix Priya's name in: `server/src/services/ShiftReviewService.js`, `server/src/services/EmailService.js`, `content/tickets/T007.json`, `content/docs/onboarding.json`
- Register any new world flags needed by the new fields in `content/world_flags/world_flags.json`
- Author the first hidden hooks as passive objectives in Q005Q008 (per STORY_DESIGN_CONTEXT.md: every 35 quests)
- Add phase-pressure content files for phases 13 (phases 46 are content-authored later as story expands)
- Author Kowalski as a pressure sender in the phase 2 and 3 profiles
**Files changed:** All 8 quest JSONs, `content/tickets/T007.json`, `content/docs/onboarding.json`, `server/src/services/ShiftReviewService.js`, `server/src/services/EmailService.js`, `content/world_flags/world_flags.json`, `content/pressure_profiles/` (new files)
**Risk:** Medium — touching all quest files; run `validate-content.js` after every file change
---
### Task 11 — Validation and tests
**What changes:**
- Update `validate-content.js`:
- Error on unrecognized `narrative_phase` value
- Warn on missing `narrative_phase`
- Validate `behavior_impact` structure (numeric deltas)
- Validate `hidden_hook` structure if present
- Warn if `linux_concepts` is empty
- Check `access_requirements.minimum_access` values against known VM IDs
- Add unit tests:
- `BehaviorTracker.test.js` — apply deltas, persistence, initialize from state
- `NarrativePhaseTracker.test.js` — advance rules, phase ordering, initialize
- `EndingEvaluator.test.js` — all 4 endings, boundary conditions
- `HiddenHookTracker.test.js` — discover, isDiscovered, persistence
- Extend existing tests:
- `ValidationEngine.test.js` — confirm hidden objectives with `hidden: true` don't affect normal branch resolution
- `TicketService.test.js` — confirm `behavior_impact` is applied at completion, confirm no-op when field absent
- Manual test checklist (see Task 11 Codex prompt)
**Files changed:** `tools/content/validate-content.js`, `server/src/services/BehaviorTracker.test.js` (new), `server/src/services/NarrativePhaseTracker.test.js` (new), `server/src/services/EndingEvaluator.test.js` (new), `server/src/services/HiddenHookTracker.test.js` (new)
**Risk:** Low — tests are additive
---
## 5. Files Likely to Change
| File | Why | What changes | Risk |
|---|---|---|---|
| `server/src/services/SaveState.js` | New save keys needed | Add `behavior`, `narrative_phase`, `suspicion`, `hidden_hooks_discovered` to `_defaultState()`; bump `schema_version` to 3 | Low — `_applyDefaults` merges safely |
| `server/src/services/QuestEngine.js` | Phase advancement hook | Call `narrativePhaseTracker.advance()` in `complete()`; import new service | Low |
| `server/src/services/TicketService.js` | Behavior application | Call `behaviorTracker.apply()` after branch selection in `markComplete()` | Low — branch.behavior_impact is optional |
| `server/src/services/ShiftReviewService.js` | Name correction | Change `'Priya Kapoor'` to `'Priya Nair'`; fix `p.kapoor` to `p.nair` in email From line | Low — one-liner |
| `server/src/services/EmailService.js` | Name correction | Change `CHARACTER_EMAILS.priya` to `'Priya Nair <p.nair@axiomworks.internal>'` | Low — one-liner |
| `server/src/services/IncidentScheduler.js` | Phase pressure | Add `_processPhasePresure()` method triggered by phase change event | Medium |
| `server/src/services/ProgressionSystem.js` | Access level label | Add `getAccessLevel()` that derives `basic_user | sudo | root` from current `unlocked_access` set | Low |
| `server/src/routes/state.js` | Expose new state | Add `behavior`, `narrativePhase`, `accessLevel`, `suspicion` to GET /api/state response | Low |
| `server/src/index.js` | Wire new services | Import and `initialize()` new services in the correct order; add debug router | Low |
| `tools/content/validate-content.js` | Validate new schema fields | Add phase enum check, behavior_impact structure check, hidden_hook shape check | Low — additive |
| `content/world_flags/world_flags.json` | New flags needed | Add entries for any new flags emitted by hidden hooks and phase pressure profiles | Low |
| `content/tickets/T007.json` | Priya name | Update `from` field if it uses old email | Low |
| `content/docs/onboarding.json` | Priya name | Update any references to Priya Kapoor or Priya Singh | Low |
| All 8 quest JSONs | New fields | Add `narrative_phase`, `behavior_impact`, `hidden_hook`, `linux_concepts`, `failure_conditions`, `access_requirements` | Medium — large surface |
---
## 6. Files Likely to Be Added
| File | Purpose | Expected structure |
|---|---|---|
| `server/src/services/BehaviorTracker.js` | Track curiosity/obedience/risk/suspicion | Class with `initialize()`, `apply(impact)`, `getSnapshot()`, `_persist()` |
| `server/src/services/NarrativePhaseTracker.js` | Track and advance narrative phase | Class with `initialize()`, `advance(questId)`, `getPhase()`, `_persist()` |
| `server/src/services/HiddenHookTracker.js` | Record hidden hook discoveries | Class with `initialize()`, `discover(id)`, `isDiscovered(id)`, `getDiscovered()` |
| `server/src/services/EndingEvaluator.js` | Evaluate ending trajectory from world state | Class with `evaluate()`, `checkTrigger()`, pure computation over save state snapshot |
| `server/src/routes/debug.js` | Dev-only debug API | Express router, gated on `NODE_ENV !== 'production'` |
| `frontend/src/components/DebugPanel.svelte` | Dev-only debug overlay | Collapsible panel, shown on `?debug=1`, polling `/api/debug/state` |
| `content/progression/access_levels.json` | Named access level threshold definitions | Array of `{ level, trust_threshold, suspicion_ceiling, grants, revokes }` |
| `content/pressure_profiles/kowalski_phase_1.json` | Phase 1 boss pressure | `escalation_steps` with Kowalski emails at time thresholds |
| `content/pressure_profiles/kowalski_phase_2.json` | Phase 2 boss pressure | Dismissive Kowalski CC patterns |
| `content/pressure_profiles/kowalski_phase_3.json` | Phase 3 boss pressure | Suspicious Kowalski, Priya CC |
| `server/src/services/BehaviorTracker.test.js` | Unit tests for BehaviorTracker | Jest test file using existing `IncidentScheduler.test.js` as pattern |
| `server/src/services/NarrativePhaseTracker.test.js` | Unit tests for NarrativePhaseTracker | Jest test file |
| `server/src/services/EndingEvaluator.test.js` | Unit tests for EndingEvaluator | Jest test file, covers all 4 endings |
| `server/src/services/HiddenHookTracker.test.js` | Unit tests for HiddenHookTracker | Jest test file |
---
## 7. Data Migration Plan
### Existing quests (Q001Q008)
**Strategy: Wrap into new schema (backward-compatible extension)**
- Do NOT replace existing quests. Do NOT create a "legacy" tier.
- Add new fields to each existing quest file. The fields are additive.
- `ContentLoader.js` already loads all quest files and passes them to `QuestEngine`. New fields are simply available at resolution time.
- Missing new fields in old quests: the runtime treats `narrative_phase: undefined` as `normal_work`; `behavior_impact: undefined` as no behavior change; `hidden_hook: null` as no hook.
- This means existing quests continue to work with zero runtime errors before Task 10 runs.
### Save state migration
- `schema_version` bumps from `2` to `3`
- `SaveState._applyDefaults()` already merges new keys safely: old saves that lack `behavior`, `narrative_phase`, `suspicion`, `hidden_hooks_discovered` will receive the default values (`50/50/50`, `'normal_work'`, `0`, `[]`) on next load
- No destructive migration. No migration script needed.
- Old saves loaded under the new schema will behave as if the player is in Phase 1 with neutral behavior — which is correct for a save that predates the new system.
### Tickets, dialogue, incidents
- No migration needed. Existing files continue to load and function.
- New dialogue files for phase pressure and boss escalation are additive.
---
## 8. Testing Plan
### Unit tests (new)
| Test file | What it covers |
|---|---|
| `BehaviorTracker.test.js` | Delta application, clamping (0100), initialize from state, persist, event emission |
| `NarrativePhaseTracker.test.js` | Phase ordering (spine), advance-only-forward rule, initialize from state, persist |
| `EndingEvaluator.test.js` | All 4 endings by state construction, boundary conditions, tie-break rules |
| `HiddenHookTracker.test.js` | Discover, isDiscovered, idempotent discover, initialize from state |
### Integration tests (extend existing)
| Test | Assertion |
|---|---|
| `TicketService.test.js` — behavior applied | After `markComplete`, save state `behavior.curiosity` changes by branch delta |
| `TicketService.test.js` — behavior absent | Quest with no `behavior_impact` completes without error |
| `ValidationEngine.test.js` — hidden objective | `hidden: true` objective validates passively without blocking branch resolution |
| `IncidentScheduler.test.js` — phase pressure | Phase change event triggers correct pressure profile activation |
### Save/load compatibility checks
- Load an existing (schema_version 2) save: all new keys initialized to defaults, no error
- Complete a new quest with new schema fields: save state includes correct behavior deltas
- Restart server with schema_version 3 save: all new keys correctly restored
- Test `SAVE_DIR` override with new schema
### Manual test checklist
1. Complete Q001 clean fix → confirm `player_ssh_configured` flag set, trust = 53
2. Complete Q001 brittle fix → confirm trust penalty, `player_loose_permissions` flag set
3. After any quest completion → confirm `behavior` object in `/api/state` (via debug route) has changed
4. With `?debug=1` → confirm debug panel visible in frontend
5. Complete Q001Q003 → confirm narrative phase advances from `normal_work`
6. Navigate terminal to a hidden anomaly (e.g., unknown user in `/etc/passwd`) → confirm `/api/debug/hidden-hooks` shows new entry
7. Force phase 3 via debug route → confirm Kowalski pressure profile activates
8. Force behavior state to `{ curiosity: 80, obedience: 20, risk: 30 }` + reach resolution phase → confirm EndingEvaluator returns `exposure`
9. Force behavior state to `{ curiosity: 20, obedience: 80, risk: 20 }` + reach resolution phase → confirm `corporate_loop`
10. Run `node tools/content/validate-content.js` — zero errors with all existing + updated quests
11. Run `npm test` — all existing tests pass; all new unit tests pass
### Content validation checks
- After Task 10: run `validate-content.js --verbose` on all 8 updated quests
- Confirm all new `narrative_phase` values are valid enum members
- Confirm all new `behavior_impact` fields have numeric deltas
- Confirm no undeclared world flags introduced
- Confirm all `hidden_hook` IDs are unique across quests
---
## 9. Codex Delegation Prompts
### Task 2 — Extend validate-content.js
```
File: tools/content/validate-content.js
Extend the existing content validation tool. Do not change any existing checks. Add these new checks after the existing quest validation block:
1. Define a constant at the top of the file:
const VALID_NARRATIVE_PHASES = new Set(["normal_work","unease","suspicion","investigation","conflict","resolution"]);
2. In the quest validation loop (the `for (const [qid, { data: quest, fname }] of Object.entries(quests))` block), add after the existing checks:
// narrative_phase
if (!quest.narrative_phase) {
warn(`${ctx}: missing 'narrative_phase' field`);
} else if (!VALID_NARRATIVE_PHASES.has(quest.narrative_phase)) {
err(`${ctx}: unknown narrative_phase '${quest.narrative_phase}'`);
}
// behavior_impact
if (quest.behavior_impact !== undefined) {
for (const [branchKey, impact] of Object.entries(quest.behavior_impact)) {
for (const field of ['curiosity_delta','obedience_delta','risk_delta','suspicion_delta']) {
if (impact[field] !== undefined && typeof impact[field] !== 'number') {
err(`${ctx}: behavior_impact[${branchKey}].${field} must be a number`);
}
}
}
}
// hidden_hook shape (if present and not null)
if (quest.hidden_hook !== undefined && quest.hidden_hook !== null) {
if (typeof quest.hidden_hook.id !== 'string') {
err(`${ctx}: hidden_hook.id must be a string`);
}
}
// access_requirements
if (quest.access_requirements?.minimum_access) {
for (const [vmId] of Object.entries(quest.access_requirements.minimum_access)) {
if (!vmProfiles[vmId]) {
err(`${ctx}: access_requirements.minimum_access references unknown VM '${vmId}'`);
}
}
}
Acceptance criteria:
- `node tools/content/validate-content.js` runs without JS errors
- Existing quest files produce only warnings for missing narrative_phase, not errors
- A test quest with narrative_phase: "invalid_phase" produces one error
- All other existing checks continue to pass
```
---
### Task 3 — BehaviorTracker service
```
Create file: server/src/services/BehaviorTracker.js
Use ES module syntax (import/export) matching the existing service style (see SaveState.js and TrustSystem.js as patterns).
The class must:
- Store { curiosity, obedience, risk, suspicion } — all numeric 0100, starting at 50/50/50/0
- initialize(state): load from state.behavior (use defaults if absent)
- apply(impact): accept an object with optional fields { curiosity_delta, obedience_delta, risk_delta, suspicion_delta }, add each to the corresponding score, clamp to [0,100], persist, emit 'behavior:changed' via eventBus
- getSnapshot(): return a plain { curiosity, obedience, risk, suspicion } object
- _persist(): call saveState.set({ behavior: this.getSnapshot() })
Export a singleton: export const behaviorTracker = new BehaviorTracker();
Then make these changes:
1. In server/src/services/SaveState.js, in _defaultState(), add this key alongside the existing ones:
behavior: { curiosity: 50, obedience: 50, risk: 50, suspicion: 0 },
and change schema_version from 2 to 3.
2. In server/src/index.js, import behaviorTracker from './services/BehaviorTracker.js' and add behaviorTracker.initialize(state) in initializeServices() after trustSystem.initialize(state).
3. In server/src/services/TicketService.js, in the markComplete() method, after the line `questEngine.complete(quest.id, { branchId: branch.id });`, add:
const behaviorImpact = branch.behavior_impact ?? quest.behavior_impact?.default ?? quest.behavior_impact ?? null;
if (behaviorImpact) { behaviorTracker.apply(behaviorImpact); }
(Add the import at the top of the file.)
Acceptance criteria:
- npm test passes (existing tests unchanged)
- GET /api/debug/state (if debug route exists) shows behavior object
- After completing a quest whose branch has behavior_impact.curiosity_delta: 2, the save.json shows behavior.curiosity incremented by 2
```
---
### Task 4 — NarrativePhaseTracker service
```
Create file: server/src/services/NarrativePhaseTracker.js
Use ES module syntax matching existing service patterns.
Phase ordering (spine): normal_work < unease < suspicion < investigation < conflict < resolution
The class must:
- Store _phase as a string, initialized from state.narrative_phase or defaulting to 'normal_work'
- PHASE_ORDER constant: ['normal_work','unease','suspicion','investigation','conflict','resolution']
- initialize(state): restore _phase from state.narrative_phase
- advance(questId): look up the quest from contentLoader, read its narrative_phase field; if the quest's phase rank is strictly higher than current phase rank, update _phase, persist, emit 'narrative:phase_changed' event with { from, to }; if narrative_phase field is absent or undefined, do nothing
- getPhase(): return current _phase string
- _persist(): saveState.set({ narrative_phase: this._phase })
Export singleton: export const narrativePhaseTracker = new NarrativePhaseTracker();
Then make these changes:
1. In server/src/services/SaveState.js _defaultState(), add:
narrative_phase: 'normal_work',
2. In server/src/services/QuestEngine.js complete() method, after this._persist(), add:
narrativePhaseTracker.advance(questId);
(Add the import at top of file.)
3. In server/src/routes/state.js, add narrativePhase: narrativePhaseTracker.getPhase() to the GET / response object.
Import narrativePhaseTracker at top of the file.
4. In server/src/index.js, import and initialize narrativePhaseTracker in initializeServices() after questEngine.initialize(state).
Acceptance criteria:
- npm test passes
- After completing Q001, GET /api/state returns narrativePhase: 'normal_work'
- If a quest with narrative_phase: 'unease' is completed after Q001, GET /api/state returns narrativePhase: 'unease'
- Phase never goes backward: completing a 'normal_work' quest after an 'unease' quest does not revert the phase
```
---
### Task 5 — HiddenHookTracker service
```
Create file: server/src/services/HiddenHookTracker.js
ES module syntax, matching existing service patterns.
The class must:
- Store _discovered as a Set of hook ID strings
- initialize(state): load from state.hidden_hooks_discovered (array), build Set
- discover(hookId): if not already discovered, add to Set, persist, emit 'hidden_hook:discovered' with { hookId }; idempotent if already discovered
- isDiscovered(hookId): boolean
- getDiscovered(): return [...this._discovered] sorted
- _persist(): saveState.set({ hidden_hooks_discovered: [...this._discovered] })
Export singleton: export const hiddenHookTracker = new HiddenHookTracker();
Then:
1. In server/src/services/SaveState.js _defaultState(), add:
hidden_hooks_discovered: [],
2. In server/src/index.js, import and call hiddenHookTracker.initialize(state) in initializeServices().
3. In server/src/routes/state.js, add hiddenHooksDiscovered: hiddenHookTracker.getDiscovered() to the response.
Acceptance criteria:
- npm test passes
- POST /api/debug/discover-hook/test-hook (if debug route exists) adds 'test-hook' to state
- GET /api/state returns hiddenHooksDiscovered: ['test-hook']
- Calling discover() twice with the same ID results in exactly one entry in the array
```
---
### Task 6 — Access level extension
```
Make these targeted changes to existing files:
1. In server/src/services/ProgressionSystem.js, add this method to the ProgressionSystem class:
getAccessLevel() {
if (this._access.has('sudo:workstation:full') || this._access.has('sudo:web_server:full') || this._access.has('sudo:build_machine:full')) {
return 'root';
}
if (this._access.has('sudo:workstation:systemctl') || this._access.has('ssh:web_server') || this._access.has('ssh:build_machine')) {
return 'sudo';
}
return 'basic_user';
}
2. In server/src/routes/state.js, add to the GET / response:
accessLevel: progressionSystem.getAccessLevel(),
Import progressionSystem if not already imported.
3. Create file: content/progression/access_levels.json with this content:
{
"_description": "Named access level definitions. Derived from ProgressionSystem unlocked_access keys.",
"levels": [
{ "name": "basic_user", "description": "Default access. Workstation only. No sudo." },
{ "name": "sudo", "description": "Sudo on workstation; SSH to hermes or vulcan." },
{ "name": "root", "description": "Full sudo on at least one remote host." }
]
}
Acceptance criteria:
- npm test passes
- GET /api/state returns accessLevel: 'basic_user' for a fresh save
- After trust reaches 55, accessLevel returns 'sudo'
- After trust reaches 60 and sudo:web_server:full is granted, accessLevel returns 'root'
```
---
### Task 7 — Phase pressure content files
```
Create three new pressure profile files in content/pressure_profiles/:
File: content/pressure_profiles/kowalski_phase_1.json
Content:
{
"id": "kowalski_phase_1",
"label": "Dave Kowalski — Phase 1: Routine Pressure",
"description": "Normal managerial check-ins. Annoying but not threatening.",
"trigger_phase": "normal_work",
"escalation_steps": [
{
"trigger_after_seconds": 300,
"notification": "Quick check-in — how are you getting on with the ticket queue? Let me know if anything is blocking you. Dave K.",
"notification_severity": "info",
"sender": "Dave Kowalski <d.kowalski@axiomworks.internal>",
"subject": "Status check"
},
{
"trigger_after_seconds": 600,
"notification": "Following up on my earlier note. We should really document that workflow once you get a moment.",
"notification_severity": "info",
"sender": "Dave Kowalski <d.kowalski@axiomworks.internal>",
"subject": "Re: Status check"
}
]
}
File: content/pressure_profiles/kowalski_phase_2.json
Content:
{
"id": "kowalski_phase_2",
"label": "Dave Kowalski — Phase 2: Dismissive",
"description": "Kowalski is aware something is recurring. Manages upward, not inward.",
"trigger_phase": "unease",
"escalation_steps": [
{
"trigger_after_seconds": 180,
"notification": "I've had a couple of questions from Sarah's team about stability. Nothing critical, but let's make sure we're on top of it. Noted for the weekly update. D.",
"notification_severity": "info",
"sender": "Dave Kowalski <d.kowalski@axiomworks.internal>",
"subject": "FYI — product team questions"
}
]
}
File: content/pressure_profiles/kowalski_phase_3.json
Content:
{
"id": "kowalski_phase_3",
"label": "Dave Kowalski — Phase 3: Suspicious",
"description": "Kowalski is getting questions from above. Starts involving Priya.",
"trigger_phase": "suspicion",
"escalation_steps": [
{
"trigger_after_seconds": 120,
"notification": "I've scheduled a brief sync for Thursday to talk through recent changes on the infrastructure side. Priya will join. Nothing to worry about — just a routine review.",
"notification_severity": "warning",
"sender": "Dave Kowalski <d.kowalski@axiomworks.internal>",
"subject": "Thursday sync — infra review"
}
]
}
Acceptance criteria:
- node tools/content/validate-content.js passes with no new errors
- All three files have unique 'id' fields that pass content loader's ID detection
```
---
### Task 8 — EndingEvaluator service
```
Create file: server/src/services/EndingEvaluator.js
ES module syntax.
ENDING_CRITERIA constant (all conditions must be met for that ending to be active):
- exposure: curiosity >= 65, hidden_hooks_discovered.length >= 2, narrative_phase rank >= 'investigation'
- corporate_loop: obedience >= 65, curiosity <= 40, trust >= 65
- burnout: curiosity <= 35, obedience <= 40 (passive disengagement)
- chaos: risk >= 65, trust <= 40
The class must:
- evaluate(): read current saveState, compute which endings' criteria are met, return { active: 'exposure'|'corporate_loop'|'burnout'|'chaos'|'undetermined', candidates: [...] } — if multiple match, prefer in this order: exposure > chaos > corporate_loop > burnout
- checkTrigger(): call evaluate(); if narrative_phase is 'resolution' and active is not 'undetermined', emit 'ending:triggered' with { ending: active }; return the result
PHASE_RANK constant: { normal_work:0, unease:1, suspicion:2, investigation:3, conflict:4, resolution:5 }
Import saveState, narrativePhaseTracker, hiddenHookTracker, behaviorTracker.
Export singleton: export const endingEvaluator = new EndingEvaluator();
Wire into index.js: import endingEvaluator; add endingEvaluator (no initialize needed, it reads state on demand).
Listen for 'quest:completed' on eventBus: call endingEvaluator.checkTrigger() each time.
Acceptance criteria:
- npm test passes
- evaluate() with curiosity=70, hiddenHooksDiscovered=['h1','h2'], phase='investigation' returns active: 'exposure'
- evaluate() with obedience=70, curiosity=35, trust=70 returns active: 'corporate_loop'
- evaluate() with no conditions met returns active: 'undetermined'
```
---
### Task 9 — Debug routes and frontend panel
```
Create file: server/src/routes/debug.js
ES module syntax. Only register routes if process.env.NODE_ENV !== 'production'.
Routes:
GET /api/debug/state — return full saveState.get()
GET /api/debug/behavior — return behaviorTracker.getSnapshot()
GET /api/debug/phase — return { phase: narrativePhaseTracker.getPhase() }
GET /api/debug/ending — return endingEvaluator.evaluate()
GET /api/debug/hidden-hooks — return { discovered: hiddenHookTracker.getDiscovered(), total: N }
POST /api/debug/set-behavior — body: { curiosity, obedience, risk, suspicion }; call behaviorTracker._override(body) (add _override method that directly sets values without deltas)
POST /api/debug/set-phase — body: { phase }; if valid phase, directly set _phase on narrativePhaseTracker and persist (add _forcePhase method)
POST /api/debug/discover-hook/:id — call hiddenHookTracker.discover(req.params.id); return getDiscovered()
In server/src/index.js, add:
import debugRouter from './routes/debug.js';
// After the other app.use() calls:
if (process.env.NODE_ENV !== 'production') {
app.use('/api/debug', debugRouter);
}
Create file: frontend/src/components/DebugPanel.svelte
- Shows only when window.location.search includes 'debug=1'
- Polls GET /api/debug/behavior, GET /api/debug/phase, GET /api/debug/ending every 5 seconds
- Displays: behavior scores (curiosity/obedience/risk/suspicion), current phase, ending trajectory
- Minimal styling: position fixed, bottom right, semi-transparent, small font
In frontend/src/App.svelte, import DebugPanel and conditionally render it:
{#if showDebug}
<DebugPanel />
{/if}
Add: const showDebug = new URLSearchParams(window.location.search).has('debug');
Acceptance criteria:
- npm test passes
- In development: GET /api/debug/behavior returns behavior snapshot
- Visiting /?debug=1 shows the debug panel in the browser
- In production (NODE_ENV=production): GET /api/debug/behavior returns 404
```
---
### Task 10 — Fix Priya's name and update Q001Q008
```
Part A — Fix Priya's name. Make these exact changes:
1. In server/src/services/EmailService.js, find this line:
priya: 'Priya Kapoor <p.kapoor@axiomworks.internal>',
Change it to:
priya: 'Priya Nair <p.nair@axiomworks.internal>',
2. In server/src/services/ShiftReviewService.js:
a. Find: reviewer: 'Priya Kapoor'
Change to: reviewer: 'Priya Nair'
b. Find: from: 'Priya Kapoor <p.kapoor@axiomworks.internal>'
Change to: from: 'Priya Nair <p.nair@axiomworks.internal>'
3. In content/tickets/T007.json: if the 'from' or 'body' field contains 'Priya Kapoor', 'p.kapoor', or 'Priya Singh', replace with 'Priya Nair' and 'p.nair@axiomworks.internal'.
4. In content/docs/onboarding.json: if 'Priya Kapoor' or 'Priya Singh' appears, replace with 'Priya Nair'.
Part B — Add new fields to existing quests. For each quest Q001Q008, add these fields using the values in the table below. Do not change any existing fields. Do not reformat the JSON beyond what is needed to add the new fields.
Q001: narrative_phase: "normal_work", linux_concepts: ["ssh-keygen","authorized_keys","file permissions"], failure_conditions: ["SSH keys not added","authorized_keys permissions too broad"], behavior_impact: { "correct-key": { curiosity_delta: 0, obedience_delta: 1, risk_delta: 0, suspicion_delta: 0 }, "loose-permissions": { curiosity_delta: 0, obedience_delta: 0, risk_delta: 1, suspicion_delta: 1 }, default: { curiosity_delta: 0, obedience_delta: 0, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: null, access_requirements: { minimum_access: { workstation: "basic_user" }, requires_root: false, temporary_grants_allowed: [] }
Q002: narrative_phase: "normal_work", linux_concepts: ["nginx","systemctl","sshd_config"], failure_conditions: ["nginx not running","service not enabled at boot"], behavior_impact: { default: { curiosity_delta: 0, obedience_delta: 1, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: null, access_requirements: { minimum_access: { web_server: "basic_user" }, requires_root: false, temporary_grants_allowed: [] }
Q003: narrative_phase: "normal_work", linux_concepts: ["logrotate","disk usage","df","du"], failure_conditions: ["disk still above threshold","logrotate not restored"], behavior_impact: { default: { curiosity_delta: 0, obedience_delta: 1, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: null, access_requirements: { minimum_access: { web_server: "sudo" }, requires_root: false, temporary_grants_allowed: [] }
Q004: narrative_phase: "normal_work", linux_concepts: ["chown","file ownership","deploy scripts"], failure_conditions: ["web root ownership not fixed","deploy service still failing"], behavior_impact: { default: { curiosity_delta: 0, obedience_delta: 1, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: null, access_requirements: { minimum_access: { web_server: "sudo" }, requires_root: false, temporary_grants_allowed: [] }
Q005: narrative_phase: "unease", linux_concepts: ["cron","crontab","user field","backup management"], failure_conditions: ["cron still running as root","disk not cleared","backup directory ownership not fixed"], behavior_impact: { "full-fix": { curiosity_delta: 1, obedience_delta: 1, risk_delta: 0, suspicion_delta: 0 }, "cron-fixed-only": { curiosity_delta: 0, obedience_delta: 1, risk_delta: 0, suspicion_delta: 0 }, "disk-cleared-only": { curiosity_delta: 0, obedience_delta: 0, risk_delta: 1, suspicion_delta: 1 }, default: { curiosity_delta: 0, obedience_delta: 0, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: { "id": "q005_backup_agent_history", "description": "backup-agent home directory contains a .bash_history with unusual commands that predate the current cron misconfiguration.", "discovery_method": "Player reads /home/backup-agent/.bash_history", "significance": "Dale configured this cron job. The history shows it was changed deliberately, not by accident." }, access_requirements: { minimum_access: { web_server: "sudo" }, requires_root: false, temporary_grants_allowed: [] }
Q006: narrative_phase: "unease", linux_concepts: ["NTP","systemd-timesyncd","Arch Linux","pacman","package keys"], failure_conditions: ["NTP not enabled at boot","package manager still broken"], behavior_impact: { default: { curiosity_delta: 0, obedience_delta: 1, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: null, access_requirements: { minimum_access: { build_machine: "sudo" }, requires_root: false, temporary_grants_allowed: [] }
Q007: narrative_phase: "suspicion", linux_concepts: ["sshd_config","AllowGroups","AllowUsers","access hardening"], failure_conditions: ["Priya still locked out","SSH restrictions removed entirely"], behavior_impact: { default: { curiosity_delta: 1, obedience_delta: 0, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: { "id": "q007_dale_ssh_key", "description": "An SSH key in hermes /root/.ssh/authorized_keys does not match any current staff. The fingerprint matches no documented key.", "discovery_method": "Player reads /root/.ssh/authorized_keys on hermes", "significance": "Dale had root SSH access to hermes that was never formally revoked." }, access_requirements: { minimum_access: { web_server: "sudo" }, requires_root: false, temporary_grants_allowed: ["sudo:web_server:sshd"] }
Q008: narrative_phase: "suspicion", linux_concepts: ["apt","package pinning","apt-preferences","internal package mirror","vulcan build pipeline"], failure_conditions: ["axiomworks-app still broken","bad package not traced to build machine"], behavior_impact: { default: { curiosity_delta: 1, obedience_delta: 0, risk_delta: 0, suspicion_delta: 0 } }, hidden_hook: { "id": "q008_build_log_anomaly", "description": "vulcan's build log for 2.1.1 shows it was triggered by a manual invocation, not the automated pipeline, at 02:14.", "discovery_method": "Player reads /var/log/build-pipeline.log on vulcan and notices the timestamp and manual trigger field", "significance": "The bad build was triggered manually. Someone made the broken build, and it was not the pipeline." }, access_requirements: { minimum_access: { build_machine: "sudo", web_server: "sudo" }, requires_root: false, temporary_grants_allowed: [] }
After all changes, run: node tools/content/validate-content.js
Confirm: zero errors. Warnings about missing narrative_phase should now be gone for all 8 quests.
```
---
### Task 11 — Unit tests and validation extension
```
Part A — Write unit tests for all new services.
Create file: server/src/services/BehaviorTracker.test.js
Use the existing IncidentScheduler.test.js or ShiftReviewService.test.js as the pattern for test structure.
Tests to include:
1. initialize() with no state.behavior: curiosity=50, obedience=50, risk=50, suspicion=0
2. initialize() with existing state.behavior: values restored correctly
3. apply({ curiosity_delta: 5 }): curiosity increases by 5
4. apply({ risk_delta: -10 }): risk decreases by 10, floor at 0
5. apply({ suspicion_delta: 200 }): suspicion clamps at 100
6. apply({}): no change, no error
7. apply(null): no change, no error (defensive)
8. getSnapshot(): returns plain object with all four keys
Create file: server/src/services/NarrativePhaseTracker.test.js
Tests:
1. initialize() with no state.narrative_phase: returns 'normal_work'
2. advance() with quest having narrative_phase 'unease': phase becomes 'unease'
3. advance() with quest having higher phase than current: phase advances
4. advance() with quest having lower phase than current: phase does NOT change
5. advance() with quest missing narrative_phase field: phase does NOT change
6. getPhase(): returns current phase string
Create file: server/src/services/EndingEvaluator.test.js
Tests (each builds a mock state):
1. exposure: curiosity=70, hiddenHooksDiscovered=['a','b'], phase='investigation' → active: 'exposure'
2. corporate_loop: obedience=70, curiosity=35, trust=70 → active: 'corporate_loop'
3. burnout: curiosity=30, obedience=35 → active: 'burnout'
4. chaos: risk=70, trust=35 → active: 'chaos'
5. no conditions: active: 'undetermined'
6. exposure wins over chaos when both met: active: 'exposure'
Create file: server/src/services/HiddenHookTracker.test.js
Tests:
1. initialize() with no state: getDiscovered() returns []
2. discover('h1'): getDiscovered() returns ['h1']
3. discover('h1') twice: getDiscovered() returns ['h1'] (idempotent)
4. isDiscovered('h1'): true after discovery
5. isDiscovered('h2'): false before discovery
Part B — Run validation.
After all changes: run `npm test` from the server directory. All tests must pass.
Run `node tools/content/validate-content.js`. Zero errors.
Part C — Manual verification checklist.
Confirm each item by inspection or running the game:
[ ] Fresh save: GET /api/state returns behavior: {curiosity:50,obedience:50,risk:50,suspicion:0}, narrativePhase:'normal_work', accessLevel:'basic_user'
[ ] Complete Q001 clean branch: behavior.obedience increments, phase stays normal_work
[ ] Complete Q005: phase advances to 'unease', hidden_hook for q005_backup_agent_history visible in /api/debug/hidden-hooks
[ ] Complete Q007: phase advances to 'suspicion', q007_dale_ssh_key hook discoverable on hermes
[ ] ShiftReviewService sends from Priya Nair <p.nair@axiomworks.internal>
[ ] GET /api/debug/ending with forced state returns correct ending label
[ ] /?debug=1 shows debug panel in browser
[ ] node tools/content/validate-content.js: zero errors
```
---
*End of implementation plan.*