Files
sysadmin-chronicles/docs/design/sysadmin_chronicles_full_quest_redesign_REVISED.md
T
44r0n7 0265afa054 chore: bootstrap lean sysadmin-chronicles repo
Import the runnable game code, content, docs, scripts, and repo guidance while leaving local agent state, dependency installs, build output, and backup copies out of the published tree.
2026-05-02 11:49:07 -04:00

159 KiB
Raw Blame History

Sysadmin Chronicles — Full Quest & Story Redesign (REVISED)

Self-revision against SPEC_LOCK.md (binding), CHARACTERS.md, STORY_DESIGN_CONTEXT.md, QUEST_AUTHORING.md, and COMPANY_LORE.md.

Audit findings from v1 corrected in this revision. Changes are not additive — this document supersedes the previous version in full.


Audit Summary (What Changed and Why)

The first draft had the right bones but violated the design's core premise in several places. The clearest pattern of failure: quests were being used to deliver investigation content explicitly rather than letting investigation happen as a byproduct of normal work. Specific problems fixed in this revision:

Replaced or redesigned:

  • Q028 (Dale's archive handed to the player as a directed task) → Q028 is now a backup integrity task where Dale's working directory appears in the restore path
  • Q029 (authenticate a forged report) → Q029 is now a systemd service audit task where the forged report is found in a log directory, not handed to the player
  • Q035 (write an investigation summary for the CTO) → Q035 is now a log retention and archival task; the player's work product IS the investigation record
  • Q038 (write what you believe happened) → Q038 is now a certificate rotation task under pressure; the conflict is operational, not narrative
  • Q041 (read Priya's briefing document) → Q041 is now a production hardening task
  • Q044 (Marcus explains Dale) → cut as a named quest; Dale's story now emerges from system artifacts the player finds; Marcus says less, more precisely
  • Q045 (Kowalski emails the outcome) → Q045 is now a change-freeze and documentation task whose resolution signals the ending; no character summarizes what happened
  • Q046/Q047/Q048 replaced with quests that have real Linux substance

Hook density reduced: Phase 2 had one hook per quest. Hooks are now seeded in roughly every 23 quests across Phase 12, with concentration increasing in Phase 3.

Styx dropped: The styx hostname thread from Q006 had no resolution. Removed. Q006 is revised with a hook that connects to the active investigation arc.

Difficulty scaling corrected: Phase 2 quests that were Tier 1 have been corrected to Tier 2. Ticket wording in Phase 2 is less explicit. Phase 4+ tickets give the problem statement only — no guidance on approach.

Phase 6 given real technical content: Resolution-phase quests now all teach Linux concepts. Narrative delivery happens through the work and its consequences, not through characters explaining what happened.


1. Design Overview

The Core Proposition

The player is doing sysadmin work. The story leaks through the systems they maintain. A player who ignores everything except the tickets will complete the game — they will just complete a different version of it than the player who reads the bash history that wasn't in scope and notices a timestamp that doesn't fit.

This is not a rhetorical distinction. Every system in this redesign follows from it: behavior variables capture what kind of sysadmin the player is, not whether they are "good" at detecting the plot. Trust reflects professional competence. Endings reflect the accumulated profile of both.

How the New System Extends the Existing One

The existing branch/world-flag/trust model is the backbone. It is not replaced.

Preserved from existing implementation:

  • trust_delta per solution branch — reflects quality of the fix
  • world_flags — persistent string keys, set by branch resolution, read by later quests
  • follow_up_ticket and follow_up_incident — chain quests, trigger delayed consequences
  • Solution branch priority — highest valid branch wins
  • Tier-based difficulty (Tier 1, 2, 3)
  • Observed-state validation — not scripted walkthroughs
  • Clue fingerprints as advisory baseline documentation
  • Character dialogue responding to branch outcomes

New system adds (minimally, without unnecessary mechanics):

  • narrative_phase field on each quest — maps to one of six phases; gates pressure profile and difficulty scaling
  • Behavior variables: curiosity, obedience, risk — accumulated alongside trust; govern narrative route and ending
  • suspicion — management/security attention score; distinct from trust; affects access and pressure level
  • Access level per machine: basic_user, sudo, root — evolves with trust and phase; degrades with sustained high risk
  • hidden_hook field on quests — defines a discovery condition and the flag it sets; optional, never required to complete the ticket
  • Ending evaluator — runs at game close; reads all accumulated state; outputs one of four endings

No other new mechanics are introduced. Every new field maps to existing infrastructure patterns (world flags, trust deltas, branch outcomes).

Variable Interaction Model

                  [Quest branch resolves]
                           │
               ┌───────────┼────────────┐
               ▼           ▼            ▼
          trust_delta   world_flags  behavior_impact
               │           │            │
               ▼           ▼            ▼
            trust        narrative   curiosity /
          (access,        routing    obedience /
           warmth,      (later quest  risk /
           incident      content)    suspicion
           visibility)
                                        │
                                        ▼
                                  ending_route

Trust and behavior variables accumulate in parallel. A player with high trust and high curiosity is a different player than one with high trust and high obedience — same professional quality, different narrative destination.


2. Character Usage Guide

All portrait-compatible identity is preserved. The following is operational guidance for quest authors, not character redefinition.

Marcus Webb

Voice: Short. Precise. Does not explain things twice. The second sentence he adds — when he adds one — is always the important one.

Quest role: Primary ticket source (most quests), trust gatekeeper, access grant/ revoke mechanism, ambient signal source in mid-game.

Marcus's messages evolve with trust. Low trust: purely functional assignments. Mid trust: he occasionally adds context that wasn't asked for. High trust: he sometimes sends a message that isn't a ticket at all — an observation, a thing he's noticed, phrased as if the player should already know what to do with it.

He knows about Dale. He will not bring it up directly. If the player finds something Dale-related, Marcus's response will be exact and quiet — never surprised, never explanatory.

Use Marcus for: ticket assignments, clean/acceptable/regression branch responses, access gate messages, quiet mid-game Slack observations, cost-free hints if the player asks (not volunteered). Do not use Marcus to explain the story, praise the player effusively, or become verbose about anything personal.

Sarah Chen

Voice: Direct, outcome-focused, slightly impatient when things are broken. Warms when fixes hold. Cools when fixes don't.

Quest role: hermes and staging tickets, product-pressure source, response calibration for clean vs. symptom fixes.

Sarah's descriptions are accurate about symptoms and often wrong about cause. She describes what she saw, not what caused it. When a fix holds — when the same problem doesn't recur — she notices, and says something. When it does recur, she says something else, shorter.

Use Sarah for: hermes/staging/demo tickets, stakeholder pressure escalations, CC lines on cross-team notes, downstream reactions to fix quality. Do not use Sarah for investigation-phase content — she doesn't have visibility into what the player is finding.

Priya Nair

Canonical email: p.nair@axiomworks.internal. Prior references to Priya Kapoor or Priya Singh are the same person. Those files need updating.

Voice: Precise. Consequence-focused. Calm in tone. No exclamation marks. She states things, she doesn't perform alarm.

Quest role: Shift reviews, access audits, security-consequence notifications, investigation-phase escalation when audit activity surfaces a finding.

Priya reviews every 34 quests. Her reviews note what advanced, what stayed stable, and what the player introduced as new risk. High curiosity plus low risk: she notes methodical investigation. High risk: she flags the access footprint.

In Phase 34, Priya becomes more present because the audits are surfacing things. This is her job, not surveillance of the player specifically. The distinction matters for tone.

Use Priya for: shift reviews, access audits, consequence delivery for regression branches, investigation-phase task assignments (narrowly scoped), security findings from James Osei. Do not use Priya for technical troubleshooting, warmth, or anything casual.

Dave Okonkwo

Voice: Helpful, non-technical, accurate about what he saw, wrong about cause.

Quest role: End-user-experience ticket source for early-phase quests and Phase 2 normalcy anchors.

Dave's tickets are useful because they describe genuine user experience. His hypotheses about the cause are well-intentioned guesses. He should never be made to look stupid — he's filing a ticket correctly for someone without technical training.

Use Dave for: early-phase user-visible failures, texture of the company being a real place. Do not use Dave for anything touching the investigation arc.

Dave Kowalski

Voice: Institutional. Bullet-point emails. Meetings as implied threat. "We should really document that."

Quest role: Management pressure escalation (Phase 3 onward), access restriction trigger, status demand source, policy constraint.

Kowalski is not suspicious of the player — he is managing upward risk. His interventions are institutional responses to things that have surfaced at his level. When he appears directly, something has become his problem. His pressure is applied through: status-demand emails, access review initiation, meeting invites that have known weight, priority-reassignment tickets.

Use Kowalski for: Phase 3+ pressure manifestations, access restriction when suspicion is elevated, escalation when an incident has made noise at director level. Do not make him a villain, do not have him accuse anyone, do not have him explain the plot.

Background Characters

Used sparingly for texture.

  • Nikhil Sharma — CC lines on build/pipeline things; Slack messages at unexpected hours; upstream explanation or blame when something on vulcan is his. He doesn't know the player until the player touches something of his.
  • Derek Ashford — CC lines when infrastructure costs surface.
  • Tom Malaney — Networking problems that are his domain but are slow to resolve.
  • Phil Ruiz — Demo pressure; hermes's political importance made human.
  • James Osei — Audit details that Priya summarizes.
  • Rachel Huang — Peer provisioning; access handoffs when Marcus delegates.

3. Phase-by-Phase Narrative Arc

Phase 1 — Normal Work

Day one onboarding through the first weeks. The work is real work. The company is a real place that functions, mostly. Nothing is obviously wrong.

Quests establish the environment: what the machines are, what they run, who files tickets, how the characters communicate, what competent work looks like. The player builds access through demonstrated competence. Marcus is evaluative. Sarah is brisk. Priya's first shift review is factual and mild.

Difficulty: explicit instructions. Tickets describe what to do with some specificity. The clue trail is direct. Branch tolerance is generous — Tier 1 quests forgive partial fixes with lower trust deltas rather than negative ones.

Hidden layer: Dale's name appears in file ownership and configuration history. His SSH key appears in authorized_keys. His last logrotate config is in a backup directory. None of this is called out. A player who reads the files before acting will find it. Most won't.

Phase end state: Player has basic to moderate access. Trust is positive if clean branches have been taken. A small number of hidden hook flags may be set for curious players. The game looks, so far, like what it says it is.

Phase 2 — Unease

The same job. The same machines. But the texture changes slightly. A problem comes back that was fixed. A service was modified and the modification doesn't have a corresponding ticket. A config that should have been set by the tooling was set by hand, by someone.

Nothing is alarming. But a sysadmin who is paying attention notices these things — the way you notice that a door doesn't close flush, or that a clock is a few minutes fast. Not urgent. Off.

Difficulty: partial hints. Tickets describe the symptom and hint at the location. The cause requires more investigation than in Phase 1. Branch tolerance decreases — symptom-only fixes now carry explicit downstream incidents.

Marcus's messages are the same as always. The occasional extra sentence he adds is slightly harder to read. In Phase 1 his additions were operational context. In Phase 2 they are sometimes observations that don't quite fit the ticket.

Hidden layer: the anomaly pattern continues. The same IP appears in a config and in a log. A cron job has been running for over a year with no ticket. A package in the build history doesn't correspond to any official release. Each item is individually explainable as legacy cruft. Together, for a player who's been collecting them, they aren't.

Phase end state: Behavior variables are diverging. High-curiosity players have world flags for discovered hooks. Obedient players are in good professional standing with nothing unusual in their record. Suspicion is low across the board.

Phase 3 — Suspicion

The pattern becomes harder to ignore if you're the kind of person who would notice it. SSH connections from an IP not in the asset inventory. A user account with no HR record. A backup archive with a timestamp that doesn't align with when backups run. The player is fixing real problems with real tickets — but the root causes are starting to point somewhere.

Difficulty: minimal guidance. Tickets describe the symptom only. No indication of where to look. The clue trail requires following the evidence without being directed. Branch tolerance is stricter — partial fixes carry heavier incident weight.

Management pressure increases. Kowalski's weekly status email asks specific questions. Marcus forwards it without comment. Priya's shift reviews start noting things they didn't note before. None of this is targeted at the player. The audits were already scheduled. The status email was always going to ask those questions.

A player who ignores all of it and fixes tickets continues to do fine work. They are just unaware of what the work is revealing.

Phase end state: The investigation path is now visible to curious players. They have enough fragments to form a partial hypothesis. Obedient players are in good professional standing and have noticed nothing unusual.

Phase 4 — Investigation

For a curious player, the picture is now coherent enough to be disturbing. The quests in this phase involve work that is framed as legitimate operations — audit the access log for compliance, trace the package build history for a deployment issue, verify backup integrity — but the results of doing that work carefully tell a story.

Difficulty: problem-solving only. Tickets state the problem. No clue on approach. The player is expected to know their tools and apply them.

Marcus's messages are shorter now. Not cold — he has always been terse. But the operational context he occasionally added in Phase 2 is absent. He is managing something and the messages reflect that without stating it.

Priya appears more frequently. A quarterly review surfaced something. James Osei sent her something. She is doing her job. Her tickets are narrow and specific — she wants to know exactly one thing, stated precisely.

Kowalski schedules a meeting. The meeting is called a "check-in on access posture." No specifics. Marcus's next message after the meeting's scheduled end time is functionally identical to his previous one — same tone, same brevity. A player paying attention will notice only the timing.

Phase end state: Curious players have a complete or near-complete picture of what happened before they arrived. The exposure ending is now reachable if other variables support it. Obedient players are in good standing, unaware of the arc. High-risk players may be under active monitoring.

Phase 5 — Conflict

The conflict is professional. The player has access granted for one purpose that intersects with information they were not meant to find. The quests are operational — real work that needs doing. But the operational work, done carefully and honestly, has consequences.

A backup restoration reveals something. An access revocation request arrives for an account the player has been investigating. A production ticket requires changing a configuration that, to a player who has been paying attention, is recognizable as the wrong change to make.

The player can always do only what the ticket asks. That is always an available path. The question is whether the player recognizes when the ticket asks for something that, done without scrutiny, would harm something beyond the immediate task.

Marcus says less. Priya is specific and procedural. Kowalski's emails are formal and institutional. The company is managing something. The player is in it.

Phase end state: Ending routes are determined. The final quests in Phase 6 are confirmation, not decision.

Phase 6 — Resolution

The final quests are normal work. Infrastructure tasks. Some are the same kind of task as Phase 1 quests, deliberately — the comparison is the point. The world has moved on. The player is still a sysadmin at Axiom Works.

The ending emerges from the accumulated state of all behavior variables, world flags, trust score, and access history. It is not triggered by a final choice. The player will not be presented with an ending screen that asks them to pick. They will complete a routine task, and the ending will fire based on everything that preceded it.

Difficulty returns to Tier 1 for operational tasks. The pressure has lifted. The tickets are from Sarah and Marcus and sound like Phase 1 tickets.


4. Full Quest Catalog

VMs: workstation (ares, Ubuntu 24.04), web_server (hermes, Debian 12), build_machine (vulcan, Arch Linux).

Behavior impact notation: C = curiosity delta, O = obedience delta, R = risk delta, S = suspicion delta. Values are per-branch where they differ.


PHASE 1 — NORMAL WORK (Q001Q008)

Tier 1 throughout. Explicit instructions. Generous branch tolerance. Hook density: 4 hooks across 8 quests.


Quest ID: Q001 Title: First Day, First Key Narrative Phase: Normal Work Tier: 1 Primary VM: workstation Additional VMs: none Primary Objective: Configure SSH key authentication for the player's account on the workstation before end of day. Linux Concepts: ssh-keygen, ~/.ssh/authorized_keys, directory and file permissions (chmod 700, chmod 600), sshd_config pubkey authentication Systems Used: workstation Ticket Sender: Marcus Webb Ticket Summary: "Your account is active. Before you touch anything else: set up key-based auth on the workstation. Password auth stays on for now but I want your public key in authorized_keys before end of day. Walk yourself through it."

Clue Trail:

  • ~/.ssh/ directory absent or present without authorized_keys
  • sshd_config: PubkeyAuthentication yes, PasswordAuthentication yes
  • Player generates keypair with ssh-keygen, places public key in authorized_keys, sets permissions — .ssh/ to 0700, authorized_keys to 0600

Solution Branches:

Branch 1 — Clean (priority 100): Key present, .ssh/ is 0700, authorized_keys is 0600, SSH auth works. trust_delta: +2. Flags: player_ssh_configured. Follow-up ticket: T002.

Branch 2 — Permissive (priority 50): Key present, permissions wrong (0644 on key file or 0755 on directory). SSH works; not correctly hardened. trust_delta: +0.5. Flags: player_ssh_permissive. Follow-up incident: I001 (Priya's first review notes the permission).

Branch 3 — Incomplete (priority 10): Key absent or authorized_keys missing. trust_delta: -1. Flags: player_ssh_failed. Marcus follows up.

Hidden Hook: A pre-existing entry in ~/.ssh/authorized_keys — the file the player must read and edit — has a line for dale@axiomworks.internal. A player who reads the full file before writing to it will see it. Sets hook_dale_ssh_key_found. Discoverable through: reading the file the task requires touching.

Failure Conditions: Player cannot authenticate via key; permissions so broad sshd refuses pubkey auth entirely.

Behavior Impact:

  • Clean branch: C+0, O+1, R+0
  • Permissive branch: C+0, O+0, R+1
  • Hook discovered: C+1 (reading the file carefully before writing is the behavior)

Narrative Notes: Establishes Marcus's voice and the evaluation frame. The Dale key is the first hook: completely invisible unless the player reads the file rather than overwriting it. No hint it exists. Most players won't find it on day one.


Quest ID: Q002 Title: Disk Running Hot Narrative Phase: Normal Work Tier: 1 Primary VM: web_server Additional VMs: none Primary Objective: Something is wrong with hermes — the AxiomFlow staging application is returning 503 errors. Investigate and fix it. Linux Concepts: df -h, du -sh, systemctl status, /var/log inspection, logrotate, log file management Systems Used: web_server Ticket Sender: Dave Okonkwo Ticket Summary: "The work application has been giving a 503 error since this morning. I tried refreshing and logging out and back in — nothing helps. I think maybe a script crashed? It was fine yesterday afternoon."

Clue Trail:

  • systemctl status nginx — service failed
  • journalctl -u nginx — "no space left on device"
  • df -h — root partition at 93%+
  • du -sh /var/log/nginx/* — access log at 4+ GB
  • /etc/logrotate.d/nginx — absent

Solution Branches:

Branch 1 — Clean (priority 100): Restores /etc/logrotate.d/nginx with a correct rotation config, runs logrotate -f /etc/logrotate.conf to clear the current backlog, confirms nginx is running, disk below 70%. trust_delta: +2. Flags: hermes_logrotate_healthy. Follow-up ticket: T003.

Branch 2 — Manual clear (priority 60): Deletes or truncates the large log file, nginx comes back, logrotate config not restored. Disk clear now; will recur. trust_delta: +0.5. Flags: hermes_logrotate_fragile. Follow-up incident: I002 (log fills again, Sarah files new ticket in Phase 2).

Branch 3 — Destructive (priority 20): Removes all logs or nginx config. Service degraded. trust_delta: -2. Flags: hermes_logs_destroyed. Follow-up incident: I003 (Priya flags log destruction at next review).

Hidden Hook: None in this quest. The clue trail is clean and the root cause is straightforward. This is intentional — not every quest in Phase 1 has a hook.

Failure Conditions: nginx remains down; disk stays over 90%; player creates new problems while fixing.

Behavior Impact:

  • Clean branch: O+1
  • Manual clear: R+0 (acceptable partial fix)
  • Destructive: R+2

Narrative Notes: First hermes quest. Establishes the symptom → cause → root cause investigation pattern. Sarah Chen reacts to branch quality in the follow-up.


Quest ID: Q003 Title: The Locked Room Narrative Phase: Normal Work Tier: 1 Primary VM: web_server Additional VMs: none Primary Objective: Sarah Chen cannot SSH into the staging server's deployment account. She has a hotfix to push before an afternoon demo. Restore her access. Linux Concepts: sshd_config access directives (AllowUsers, AllowGroups), /var/log/auth.log, SSH troubleshooting, user group membership (id, groups) Systems Used: web_server Ticket Sender: Sarah Chen Ticket Summary: "I can't SSH into the staging server. I've tried from two machines and keep getting 'connection refused' or 'permission denied.' I need to push a hotfix before 2pm. Can you look at this now?"

Clue Trail:

  • /var/log/auth.log on hermes: User s.chen not allowed because not listed in AllowUsers
  • /etc/ssh/sshd_config: AllowUsers deploy-user marcus — no s.chen
  • groups s.chen shows she is in the deploy group
  • The config uses AllowUsers per-user instead of AllowGroups by role

Solution Branches:

Branch 1 — Clean (priority 100): Player converts AllowUsers to AllowGroups deploy (or similar role-based approach), restarts sshd, confirms Sarah can authenticate. trust_delta: +2. Flags: hermes_ssh_allowgroups. Follow-up ticket: T004.

Branch 2 — Username append (priority 60): Adds s.chen to the AllowUsers list. Problem solved; next person locked out will need the same treatment. trust_delta: +0.5. Flags: hermes_ssh_allowusers_fragile. Follow-up incident: I004 (another user locked out in Phase 2).

Branch 3 — Unrestricted (priority 10): Removes AllowUsers or AllowGroups entirely. All valid users can SSH. trust_delta: -2. Flags: hermes_ssh_unrestricted. Priya flags this in next review.

Hidden Hook: authorized_keys for the deploy-user account on hermes contains a key with comment dale@ares 2023-09. Discoverable by: reading the deploy-user's authorized_keys as part of investigating the SSH configuration. Sets hook_dale_deploy_key. Connects to Q001's hook for players who found that one.

Failure Conditions: Sarah still locked out; sshd fails to restart after edit; player breaks SSH for themselves.

Behavior Impact:

  • Clean branch: O+1
  • Username append: O+0
  • Unrestricted: R+3
  • Hook discovered: C+1

Narrative Notes: Marcus's clean-branch response: "Good call switching to groups. AllowUsers was always going to be a maintenance problem." The attribution of the AllowUsers config is deliberately vague — it was in place when the player arrived. Sarah's ticket wording ("I've tried from two machines") is accurate, non- technical, real.


Quest ID: Q004 Title: The Build That Won't Narrative Phase: Normal Work Tier: 1 Primary VM: build_machine Additional VMs: none Primary Objective: The nightly AxiomFlow build on vulcan has not produced an artifact in three days. The scheduler shows the job running. Nothing is in the output directory. Find the cause and fix it. Linux Concepts: systemd timers, journalctl, NTP and clock synchronization, timedatectl, systemd-timesyncd, SSL certificate validation dependencies on system clock Systems Used: build_machine Ticket Sender: Marcus Webb Ticket Summary: "Nikhil flagged that nothing has come out of the nightly build in three days. The timer is showing as triggered. Build log is in the usual location. Look at what's actually happening."

Clue Trail:

  • systemctl list-timersaxiomflow-build.timer last triggered correctly
  • /var/log/axiomflow-build/build.log — SSL certificate verification failure against the internal package repository (cert fetch step)
  • timedatectl — system clock is 47 minutes ahead of real time; NTP is not running
  • systemctl status systemd-timesyncd — inactive and disabled
  • Enabling timesyncd, syncing clock, re-running the build — success

Solution Branches:

Branch 1 — Clean (priority 100): Enables and starts systemd-timesyncd, verifies sync with timedatectl show-timesync, triggers a manual build run to confirm artifact output. trust_delta: +2. Flags: vulcan_ntp_healthy. Follow-up ticket: T005.

Branch 2 — One-time sync (priority 50): Uses ntpdate or date -s for a manual clock correction. Clock is correct now; drift will recur without the daemon. trust_delta: +0.5. Flags: vulcan_ntp_fragile. Follow-up incident: I005 (drift recurs in Phase 2, build fails again).

Branch 3 — Bypass SSL (priority 20): Disables SSL certificate verification in the build script rather than fixing the clock. Build succeeds; certificate validation is now bypassed. trust_delta: -2. Flags: vulcan_ssl_bypassed. Priya flags this.

Hidden Hook: Reading the full build log (not just the most recent failure) reveals a historical entry from 8 months ago: a build step called sign-package that no longer exists in the current build script. The step was removed — the removal is not documented anywhere. Sets hook_sign_package_removed. Discoverable by: reading historical log entries as part of diagnosing the build environment.

Failure Conditions: Build continues failing; SSL bypass introduced; NTP configured incorrectly breaks time-dependent services.

Behavior Impact:

  • Clean branch: O+1
  • Bypass SSL: R+3
  • Hook discovered: C+1

Narrative Notes: First vulcan quest. Establishes the machine's character: things break here silently and the downstream effect shows up on hermes. The sign-package removal hook is the beginning of the build pipeline thread. An obedient player reads only the current log. A curious player reads further back.


Quest ID: Q005 Title: Permissions Drift Narrative Phase: Normal Work Tier: 1 Primary VM: web_server Additional VMs: none Primary Objective: The AxiomFlow staging application cannot write to its cache directory. Exports are failing for all users. Identify why the ownership changed and restore correct state. Linux Concepts: chown, chmod, ls -la, process user context (ps aux), service account ownership (www-data), bash history inspection Systems Used: web_server Ticket Sender: Sarah Chen Ticket Summary: "Users in staging can't generate exports — they get a 'permission denied' error. The dev team says they haven't changed anything. It was working Thursday. Something changed on the infrastructure side."

Clue Trail:

  • Application error log: permission denied: /var/www/axiomworks/cache/export
  • ls -la /var/www/axiomworks/cache — directory owned by root:root; previously should be www-data:www-data
  • ps aux | grep axiomflow — application process running as www-data
  • /root/.bash_history — contains a sudo cp -r command run three weeks ago that carried root ownership forward into the cache directory

Solution Branches:

Branch 1 — Clean (priority 100): Runs chown -R www-data:www-data /var/www/axiomworks/cache, confirms application can write, identifies the cp -r as cause, documents root cause in ticket response. trust_delta: +2. Flags: hermes_cache_ownership_correct. Follow-up ticket: T006.

Branch 2 — World-writable (priority 30): Runs chmod o+w /var/www/axiomworks/cache so www-data can write without being owner. App works; directory is now world-writable. trust_delta: -1. Flags: hermes_cache_world_writable. Priya flags in next review.

Branch 3 — Service as root (priority 10): Modifies service unit to run as root. App works; every downstream file is now root-owned. trust_delta: -3. Flags: hermes_app_running_as_root.

Hidden Hook: The sudo cp -r command in /root/.bash_history is timestamped three weeks ago — before the player's start date. The session that ran this command predates the player's account creation. Someone with root access was copying production files before the player arrived. Sets hook_pre_hire_root_session. Discoverable by: checking bash history to trace the ownership change as part of understanding the cause.

Failure Conditions: Application still cannot write to cache; player introduces broader permission regression.

Behavior Impact:

  • Clean branch: O+1
  • World-writable: R+2
  • App-as-root: R+4
  • Hook discovered: C+2 (this one requires going beyond what the ticket asks)

Narrative Notes: The pre-hire root session hook is more significant than the SSH key hooks — it establishes that someone was making system changes before the player arrived. A player who finds it has their first real data point about activity that predates them.


Quest ID: Q006 Title: The Account That Shouldn't Be There Narrative Phase: Normal Work Tier: 1 Primary VM: workstation Additional VMs: none Primary Objective: Priya's quarterly access review surfaced a user account on the workstation with no matching HR record. Audit it and remove it correctly. Linux Concepts: getent passwd, lastlog, last, ps aux, find / -user, userdel -r, home directory archival before removal Systems Used: workstation Ticket Sender: Priya Nair Ticket Summary: "Quarterly access review flagged an account on the workstation: jbenton. No corresponding entry in the HR system. Before removal: confirm no active sessions, check if any processes are running under this account, and archive the home directory. Then remove it. Document what you find."

Clue Trail:

  • getent passwd jbenton — account exists; no HR match
  • lastlog | grep jbenton — last login 14 months ago
  • ps aux | grep jbenton — no active processes
  • Home directory: ~jbenton/ exists with standard dotfiles and one file: notes/infra.txt — a plain-text infrastructure reference listing internal hostnames and access notes, formatted like a personal cheatsheet

Solution Branches:

Branch 1 — Clean (priority 100): Player checks activity, processes, groups, home dir; archives home directory to /var/archive/jbenton-YYYYMMDD.tar.gz; runs userdel -r jbenton; documents findings and archive location for Priya. trust_delta: +2. Flags: jbenton_account_removed_clean. Follow-up ticket: T007.

Branch 2 — Fast remove (priority 40): Removes account without archiving or checking home dir. Account is gone. trust_delta: +0.5. Flags: jbenton_account_removed_fast. Priya's response notes that archival is standard procedure.

Branch 3 — Left in place (priority 10): Reports account looks inactive, recommends deferring. Ticket unresolved. trust_delta: -1.

Hidden Hook: notes/infra.txt in jbenton's home directory is a personal infrastructure reference. It includes a line for pipeline-svc with a note: temp sudo — ask DH to scope. The initials DH do not correspond to any current employee visible on the company website. Sets hook_dh_initials_in_jbenton_notes. Discoverable by: reading the file before archiving or deleting, which proper procedure (per the ticket) requires doing anyway — but the player can ignore the content and just archive it.

Failure Conditions: Player removes account with active sessions; player destroys home dir without archiving; ticket not resolved.

Behavior Impact:

  • Clean branch: O+1
  • Fast remove: R+1 (destroying potential evidence)
  • Hook discovered: C+1

Narrative Notes: The DH initials connect to the sudoers comment the player will find in Q011. pipeline-svc also connects forward. The note reads like a practical cheatsheet — not alarming, just a person keeping track of the infrastructure they were using. The oddness is the initials and the word "temp."


Quest ID: Q007 Title: Rotation Failure Narrative Phase: Normal Work Tier: 1 Primary VM: web_server Additional VMs: none Primary Objective: The TLS certificate for the AxiomFlow staging domain has expired. A prospect demo is tomorrow morning. Renew the certificate and ensure automatic renewal is in place. Linux Concepts: certbot, Let's Encrypt certificate renewal, systemd timers, openssl s_client, nginx configuration reload, certificate verification Systems Used: web_server Ticket Sender: Sarah Chen Ticket Summary: "The staging site is showing a certificate error — the browser is refusing to load it at all. Phil has a prospect demo on this environment tomorrow at 9am. We need this fixed today."

Clue Trail:

  • openssl s_client -connect staging.axiomworks.internal:443 </dev/null 2>&1 | grep -i expire — certificate expired 14 days ago
  • certbot certificates — cert present, not renewed
  • systemctl status certbot.timer — inactive, disabled
  • journalctl -u certbot --since "90 days ago" — renewal failed 60 days ago (HTTP challenge permission error); timer was disabled manually the same day

Solution Branches:

Branch 1 — Clean (priority 100): Runs certbot renew, re-enables and starts certbot.timer, reloads nginx, verifies new cert expiry with openssl, confirms staging site loads without browser warning. trust_delta: +2. Flags: hermes_certbot_healthy. Follow-up ticket: T008.

Branch 2 — Renew without timer (priority 50): Renews cert but doesn't restore the timer. Valid now; expires again in 90 days without action. trust_delta: +0.5. Flags: hermes_certbot_fragile. Follow-up incident: I006 (cert expires again in Phase 3).

Branch 3 — Self-signed (priority 10): Generates self-signed cert, nginx configured to use it. Connection is encrypted; browser still warns. trust_delta: -1. Flags: hermes_self_signed_cert. Phil's demo shows a security warning.

Hidden Hook: journalctl -u certbot --since "90 days ago" contains the failure entry — permission error. Immediately after the failure, in the same journalctl window, is an entry showing the timer was disabled by a manual systemctl disable command from a root session. The session timestamp predates the player. The timer wasn't failed-and-stopped; it was deliberately turned off after the failure. Sets hook_certbot_deliberately_disabled. Discoverable by: reading the journal further back than strictly necessary to diagnose the current renewal failure.

Failure Conditions: Cert not renewed; nginx not reloaded; timer still inactive.

Behavior Impact:

  • Clean branch: O+1
  • Renew without timer: O+0
  • Self-signed: R+1
  • Hook discovered: C+1

Narrative Notes: The timer being deliberately disabled — not just failed — is a small data point in the pattern of things being intentionally changed. A player who finds it has evidence of deliberate action, not accident.


Quest ID: Q008 Title: The Package That Wasn't Narrative Phase: Normal Work Tier: 1 Primary VM: web_server Additional VMs: build_machine Primary Objective: A deployment to hermes is blocked because a required package is not available in the internal apt repository. The package was reportedly built last week. Find why it isn't available and restore the deployment path. Linux Concepts: apt-cache, apt-get update, internal apt repositories, reprepro, repository metadata management, package pipeline between build and deployment Systems Used: web_server, build_machine Ticket Sender: Marcus Webb Ticket Summary: "Deployment to staging is blocked. The apt install step fails on a package that Nikhil says he built last week. Something's broken between the build and the repo. Find it and fix it."

Clue Trail:

  • apt-cache show axiomflow-workers on hermes — package not found
  • /etc/apt/sources.list.d/axiomworks.list — points to http://vulcan.axiomworks.internal/repo/
  • SSH to vulcan: repository Packages index is stale — reprepro was not run after last build
  • Built .deb artifact at /srv/packages/axiomflow-workers_2.4.1_amd64.deb
  • Fix: reprepro includedeb stable /srv/packages/axiomflow-workers_2.4.1_amd64.deb, then apt update on hermes confirms package availability

Solution Branches:

Branch 1 — Clean (priority 100): Adds package to repo correctly, updates metadata, confirms apt-cache show succeeds on hermes, deployment unblocked. trust_delta: +2. Flags: vulcan_repo_healthy. Follow-up ticket: T009.

Branch 2 — Manual install (priority 40): Copies .deb to hermes and installs with dpkg -i. Deployment works this time; repo still broken for next deployment. trust_delta: 0. Flags: vulcan_repo_bypassed. Follow-up incident: I007 (next deployment fails identically).

Branch 3 — Escalate without investigating (priority 10): Reassigns to Nikhil without investigation. trust_delta: -1. Ticket stalls.

Hidden Hook: While browsing the repository's package history to find the missing package, a player who looks at the full package list rather than just the missing one will find an entry for axiomflow-audit-bridge — a package built 8 months ago with no corresponding deployment record, no entry in any release manifest visible on hermes, and no build job in the scheduler that corresponds to when it was built. Sets hook_audit_bridge_package. Discoverable by: looking at the full repo package list rather than only the specific package named in the ticket.

Failure Conditions: hermes still cannot find the package; repo metadata left in broken state.

Behavior Impact:

  • Clean branch: O+1
  • Manual install: O+0
  • Hook discovered: C+2 (requires going beyond the specific package named in ticket)

Narrative Notes: The audit-bridge package is the most significant Phase 1 hook. It's discoverable only if the player looks at what's around the thing they were sent to find — real sysadmin behavior, but not required. A player who finds it has their first glimpse of something that doesn't fit.


PHASE 2 — UNEASE (Q009Q016)

Tier 2. Partial hints. Tickets describe the symptom and indicate the general area but do not specify the cause. Branch tolerance decreases — acceptable-fix incidents now carry real operational weight. Hook density: 3 hooks across 8 quests, less pointed than Phase 1.


Quest ID: Q009 Title: The Recurrence Narrative Phase: Unease Tier: 2 Primary VM: web_server Additional VMs: none Primary Objective: hermes's nginx access log is filling up again. A Phase 1 incident that was supposed to be fixed is recurring. Find why logrotate isn't working and make it stable. Linux Concepts: logrotate configuration, /etc/logrotate.d/, logrotate -d (dry run), cron / systemd-logrotate.timer, logrotate status file Systems Used: web_server Ticket Sender: Sarah Chen Ticket Summary: "The staging site is throwing errors again. Same thing as a few weeks ago — it goes down, then someone fixes it, then it comes back. I was told logrotate was set up. Why is it happening again?"

Clue Trail:

  • (If hermes_logrotate_healthy is set from Q002): the logrotate config is present but the logrotate.timer or cron.daily entry that calls it is disabled — config exists but nothing triggers it
  • (If hermes_logrotate_fragile is set from Q002): logrotate was never restored; this is the recurrence
  • Either way: systemctl status logrotate.timer shows disabled; or ls /etc/cron.daily/logrotate shows the file is missing/not executable
  • Log is filling again; nginx error is the same

Solution Branches:

Branch 1 — Root cause (priority 100): Player diagnoses the trigger failure (timer disabled or cron entry missing), restores the trigger, verifies logrotate runs correctly on next schedule, confirms log rotation is active. trust_delta: +2. Flags: hermes_logrotate_stable. Follow-up ticket: T010.

Branch 2 — Config only (priority 50): Player restores or confirms the logrotate config but doesn't check that anything calls it. Disk is cleared manually again. trust_delta: +0.5. Flags: hermes_logrotate_still_fragile. Follow-up incident: I008 (recurs again).

No hidden hook in this quest. The recurrence itself is the unease signal — not every quest in Phase 2 has a hook.

Failure Conditions: nginx still down; disk not cleared; trigger still inactive.

Behavior Impact:

  • Root cause: O+1
  • Config only: O+0

Quest ID: Q010 Title: Someone Changed Something Narrative Phase: Unease Tier: 2 Primary VM: web_server Additional VMs: none Primary Objective: Priya flagged an nginx configuration on hermes that doesn't match the last known-good state. Find what changed and restore correct configuration. Linux Concepts: diff, config file comparison, nginx config structure (/etc/nginx/), nginx -t, git diff or backup comparison, file mtime inspection (stat) Systems Used: web_server Ticket Sender: Marcus Webb Ticket Summary: "Priya found an nginx config that doesn't match the backed-up state. I don't have a change ticket for it. Go look at what's different and tell me if it matters."

Clue Trail:

  • Backup exists at /etc/nginx/.bak/ (or Marcus provides a hash reference)
  • diff -r /etc/nginx /etc/nginx/.bak/ reveals two differences:
    1. server_tokens off; has been removed from the main config (nginx version now visible in HTTP headers)
    2. A location /internal-api/ block added to a site config, proxying requests to 127.0.0.1:9301 — a port with nothing listening

Solution Branches:

Branch 1 — Both issues (priority 100): Player identifies both changes, restores server_tokens off;, removes or quarantines the /internal-api/ block, runs nginx -t, reloads nginx, documents both changes with mtimes. trust_delta: +2. Flags: hermes_nginx_config_audited. Follow-up ticket: T011.

Branch 2 — Token only (priority 50): Restores server_tokens off; but misses the proxy block. trust_delta: +0.5. Flags: hermes_nginx_proxy_block_present. Follow-up incident: I009 (Priya finds the block in next audit).

Branch 3 — No action (priority 10): Reports config looks acceptable. trust_delta: -1. Priya's review flags both items.

Hidden Hook: The proxy block for /internal-api/ points to port 9301 with nothing currently listening — but the port number itself, and the path name, will echo in later anomalies for a player who remembers it. Sets hook_nginx_internal_api_block. Discoverable by: doing a thorough diff rather than checking only the obvious item.

Behavior Impact:

  • Both issues found: O+1
  • Token only: O+0
  • Hook discovered: C+1 (remembering the port number is the payoff later)

Quest ID: Q011 Title: The Service Account Narrative Phase: Unease Tier: 2 Primary VM: build_machine Additional VMs: none Primary Objective: The pipeline-svc service account on vulcan has more sudo privileges than its role requires. Scope it to least privilege. Linux Concepts: sudo -l, /etc/sudoers, visudo, /etc/sudoers.d/, least privilege principle, testing sudo with specific commands Systems Used: build_machine Ticket Sender: Priya Nair Ticket Summary: "James's privilege audit shows pipeline-svc on the build machine has NOPASSWD: ALL. That account runs the build pipeline. It should only be able to restart specific services. Bring it into scope."

Clue Trail:

  • sudo -l -U pipeline-svc(ALL) NOPASSWD: ALL
  • /etc/sudoers.d/pipeline-svc — the blanket grant, separate file
  • Reviewing what the account actually needs: systemctl restart axiomflow-build and systemctl restart axiomflow-timer
  • Correct fix: replace ALL with specific command paths in sudoers.d

Solution Branches:

Branch 1 — Precise scope (priority 100): Replaces the blanket grant with NOPASSWD: /bin/systemctl restart axiomflow-build, /bin/systemctl restart axiomflow-timer, verifies with sudo -l, tests that the service can still restart correctly. trust_delta: +2. Flags: vulcan_pipeline_svc_scoped. Follow-up ticket: T012.

Branch 2 — Broader scope (priority 50): Reduces from ALL but grants more than needed (e.g., NOPASSWD: /bin/systemctl). Better; not least privilege. trust_delta: +0.5. Priya notes improvement but flags remaining exposure.

Branch 3 — Remove sudo entirely (priority 20): Removes all sudo. Service account can no longer restart services; build pipeline breaks. trust_delta: -2. Follow-up incident: build failures within the hour.

Hidden Hook: The comment at the top of /etc/sudoers.d/pipeline-svc reads: # Temp grant per INT-0194 — DH 2023-11. The ticket number references an internal system the player cannot access. The initials DH — same initials as in Q006's jbenton notes — don't correspond to any current employee. Sets hook_dh_sudo_grant. Discoverable by: reading the sudoers file rather than just acting on the grant.

Failure Conditions: Sudoers syntax error (should use visudo); service can no longer function; broader access introduced.

Behavior Impact:

  • Precise scope: O+1
  • Remove sudo: R+1
  • Hook discovered: C+1 (connects to Q006's DH initials for players who found that)

Quest ID: Q012 Title: Memory Leak Narrative Phase: Unease Tier: 2 Primary VM: web_server Additional VMs: none Primary Objective: The AxiomFlow application on hermes is crashing every few hours due to out-of-memory events. Identify the cause and implement a fix that addresses the root problem. Linux Concepts: free -h, top, htop, /proc/meminfo, zombie processes (ps aux state column), cron job inspection, Python process management, systemd service memory limits Systems Used: web_server Ticket Sender: Sarah Chen Ticket Summary: "The app keeps going down — every three or four hours it just dies and restarts. Dave said he's been getting logged out mid-session. The restart is automatic so customers haven't called yet, but they will."

Clue Trail:

  • journalctl -u axiomflow — OOM kill events every 34 hours
  • ps aux during an OOM interval — many axiomflow-report-gen processes with state Z (zombie)
  • /etc/cron.d/report-gen — runs axiomflow-report-gen every 30 minutes
  • The script is a Python process that forks but never calls wait() — zombies accumulate and consume PID space, the parent's memory grows
  • Fix: correct the script (add subprocess.wait() or use subprocess.run()) — or constrain with systemd service limits (acceptable but not root-cause)

Solution Branches:

Branch 1 — Root cause (priority 100): Identifies the zombie accumulation from the cron script, corrects the Python subprocess handling, confirms clean process table after next run. trust_delta: +2. Flags: hermes_report_gen_clean. Follow-up ticket: T013.

Branch 2 — Service limit (priority 60): Adds MemoryMax and Restart=on-failure to the axiomflow service unit. Crashes are now bounded; zombies still accumulate but are contained. trust_delta: +0.5. Flags: hermes_app_restart_policy.

Branch 3 — Force-kill cron (priority 20): Adds a cron job that kills all axiomflow-report-gen processes every 30 minutes. Works until a report is mid-execution when killed. trust_delta: -1. Flags: hermes_report_gen_force_killed.

No hidden hook in this quest. The technical trail is the whole story.

Failure Conditions: OOM events continue; player introduces new instability.

Behavior Impact:

  • Root cause: O+1
  • Force-kill: R+1

Quest ID: Q013 Title: The Baseline Check Narrative Phase: Unease Tier: 2 Primary VM: workstation Additional VMs: none Primary Objective: Priya's end-of-month security checklist asks the player to audit their workstation against the company baseline: open ports, running services, active accounts, home directory permissions. Document deviations. Linux Concepts: ss -tlnp, systemctl list-units --type=service, getent passwd, ls -la ~, umask, reading and comparing against a baseline document Systems Used: workstation Ticket Sender: Priya Nair Ticket Summary: "End of your first month. Standard workstation audit: I've attached the baseline checklist. Open ports, running services, account list, home directory permissions. Document what you find. Flag anything that doesn't match."

Clue Trail:

  • Most findings are normal: expected services, expected ports
  • One service is running but not on the baseline checklist: axiomworks-telemetry
  • systemctl status axiomworks-telemetry — running, enabled, binary at /usr/local/bin/axiomworks-telemetry
  • ss -tlnp or netstat -tlnp — the telemetry service connects outbound (not shown in ss for listening ports but visible in netstat -anp or /proc)

Solution Branches:

Branch 1 — Thorough (priority 100): Documents all deviations including the telemetry service; investigates what the service is (service unit file contents, binary provenance, any logs); reports complete findings. trust_delta: +2. Flags: workstation_audit_complete. Follow-up ticket: T014.

Branch 2 — Checklist-only (priority 50): Completes the audit against the checklist but marks the telemetry service as "review later — may be legitimate." trust_delta: +0.5. Priya follows up.

Branch 3 — Disable to clean (priority 20): Disables the telemetry service without investigating or reporting it. Service gone; unknown what it was doing. trust_delta: 0. Flags: workstation_telemetry_disabled_silently. S+1.

Hidden Hook: The telemetry service unit file (/etc/systemd/system/axiomworks-telemetry.service) has an ExecStart line pointing to the binary, and the unit file has a comment line at the top: # deployed by pipeline — INT-0194. The same internal ticket number from Q011's sudoers comment. Sets hook_telemetry_ticket_INT0194. Discoverable by: reading the service unit file as part of investigating what the service is.

Failure Conditions: Audit incomplete; player creates instability while investigating.

Behavior Impact:

  • Thorough: O+1
  • Disable silently: S+1, R+1
  • Hook discovered: C+2 (connects INT-0194 across two quests — DH's ticket number)

Quest ID: Q014 Title: Rollback Narrative Phase: Unease Tier: 2 Primary VM: web_server Additional VMs: build_machine Primary Objective: A deployment to hermes this afternoon broke user authentication in the staging application. Roll back to the previous known-good package version and prevent automatic re-upgrade. Linux Concepts: apt-cache policy, apt install <pkg>=<version>, apt-mark hold, package version pinning, deployment rollback procedure Systems Used: web_server, build_machine Ticket Sender: Sarah Chen Ticket Summary: "The deployment this afternoon broke login — users can authenticate but are immediately logged out. Phil has a customer using this environment tomorrow. I need it rolled back now."

Clue Trail:

  • apt-cache policy axiomflow-workers — current version installed 3 hours ago
  • Previous version available in the internal repo cache
  • The regression is in session management — a code issue; infrastructure can't fix the code, only roll back the package
  • apt install axiomflow-workers=2.4.0 installs prior version
  • apt-mark hold axiomflow-workers prevents re-upgrade

Solution Branches:

Branch 1 — Rollback with hold (priority 100): Installs 2.4.0, holds the package, confirms auth works, notifies Sarah and notes the hold is in place. trust_delta: +2. Flags: hermes_axiomflow_held. Follow-up ticket: T015.

Branch 2 — Rollback without hold (priority 50): Installs 2.4.0, no hold. Auto- upgrade will re-break it on next run. trust_delta: +0.5. Flags: hermes_axiomflow_rolled_back. Follow-up incident: I010 (auto-upgrade re-installs 2.4.1 overnight).

Branch 3 — Forward fix attempt (priority 10): Player attempts to diagnose and fix the code issue rather than rolling back. Outside scope; fails. trust_delta: -1.

Hidden Hook: apt-cache showpkg axiomflow-workers on vulcan shows the 2.4.1 build timestamp: 3:12am — outside the scheduled build window. The same off-schedule time pattern as the signing step removal and the audit-bridge build. Sets hook_2_4_1_off_schedule_build. Discoverable by: looking at the build machine's package metadata while researching what version to roll back to.

Failure Conditions: Auth still broken; hold not applied; player introduced new problems.

Behavior Impact:

  • Rollback with hold: O+1
  • Rollback without hold: O+0
  • Hook discovered: C+1

Quest ID: Q015 Title: The Quiet Cron Narrative Phase: Unease Tier: 2 Primary VM: build_machine Additional VMs: none Primary Objective: Marcus has asked for a cron audit on vulcan: list all scheduled jobs, attribute each to a service or owner, and flag anything that can't be attributed. Linux Concepts: crontab -l (per-user and system), /etc/cron.d/, /etc/cron.daily/, /etc/cron.weekly/, cron syntax, correlating jobs to services or owners Systems Used: build_machine Ticket Sender: Marcus Webb Ticket Summary: "Routine cron audit on vulcan. List everything that's scheduled — root crontab, system crontab, all of cron.d. I want to know who owns each job and whether it still makes sense. Anything you can't attribute, flag it."

Clue Trail:

  • crontab -l for root and pipeline-svc — most jobs are attributable
  • /etc/cron.d/ directory — standard entries plus one named axiomworks-collect
  • axiomworks-collect job runs at 2:57am; command: /usr/local/bin/axiomworks-collect
  • The binary /usr/local/bin/axiomworks-collect exists and is executable
  • No ticket, no documentation comment in the cron file itself, no recent entry in any change log

Solution Branches:

Branch 1 — Thorough, with investigation (priority 100): Player lists all jobs, attributes each, and for axiomworks-collect: runs file and strings on the binary to understand what it does before flagging it — the binary name is suggestive and a thorough audit would check it. Submits complete report including what the binary calls. trust_delta: +2. Flags: axiomworks_collect_cron_flagged. Follow-up ticket: T016.

Branch 2 — Listed but not investigated (priority 60): Player lists all jobs, flags axiomworks-collect as unattributed, but does not inspect the binary. Report is honest but shallow. trust_delta: +1. Flags: axiomworks_collect_noted.

Branch 3 — Incomplete list (priority 10): Player misses entries. Marcus follows up. trust_delta: -1.

Hidden Hook: Running strings /usr/local/bin/axiomworks-collect or ldd /usr/local/bin/axiomworks-collect and checking its network behavior (or simply reading any log it writes, if one exists) reveals it connects to an internal address. The binary name and the ticket number in its help text — INT-0194 — connects it to the same ticket number from Q011 and Q013. Sets hook_collect_binary_INT0194. The hook is only set in Branch 1 (player inspected the binary). In Branch 2, the job is noted but not confirmed. Discoverable by: going one step further than the ticket requires — investigating what an unattributed job actually does.

Failure Conditions: Cron audit submitted without flagging unattributed jobs.

Behavior Impact:

  • Branch 1: O+1, C+2 (the INT-0194 connection is now three sightings)
  • Branch 2: O+0
  • Hook discovered: C+2 (already in Branch 1 impact)

Quest ID: Q016 Title: The Door Left Open Narrative Phase: Unease Tier: 2 Primary VM: web_server Additional VMs: none Primary Objective: A security scan found port 8080 on hermes reachable from outside the office network. That port runs the AxiomFlow admin panel. Restrict it to internal-only access and confirm. Linux Concepts: ufw, iptables, ss -tlnp, nginx access control by IP (allow/deny), CIDR notation, defense-in-depth (firewall + application layer) Systems Used: web_server Ticket Sender: Priya Nair Ticket Summary: "Scan from this morning. Port 8080 on hermes is reachable externally. That's the admin panel. It should be internal-only — restrict to 10.0.0.0/8. Confirm when done."

Clue Trail:

  • ss -tlnp | grep 8080 — service listening on 0.0.0.0:8080
  • ufw status — no restriction on port 8080
  • Fix options: ufw rule restricting source to 10.0.0.0/8, or nginx allow 10.0.0.0/8; deny all; in the 8080 server block, or both

Solution Branches:

Branch 1 — Defense in depth (priority 100): Restricts at both firewall and nginx layer, confirms external access blocked, internal access works, reports to Priya. trust_delta: +2. Flags: hermes_admin_port_secured. Follow-up ticket: T017.

Branch 2 — Single layer (priority 60): Restricts at one layer only. Better. Not layered. trust_delta: +1. Priya notes the single-layer approach.

Branch 3 — Block entirely (priority 20): Blocks port for all traffic. Admin panel inaccessible to everyone including internal users. trust_delta: -1.

No hidden hook in this quest. The technical task is clean.

Failure Conditions: Port still accessible externally; internal access broken; ufw rules in conflict.

Behavior Impact:

  • Defense in depth: O+1
  • Block entirely: R+1

PHASE 3 — SUSPICION (Q017Q024)

Tier 2. Minimal guidance. Tickets state the problem, not the location. The clue trail requires following evidence without direction. Branch tolerance is stricter. Hook density increases: 5 hooks across 8 quests.


Quest ID: Q017 Title: Access Without a Ticket Narrative Phase: Suspicion Tier: 2 Primary VM: web_server Additional VMs: workstation Primary Objective: hermes's auth log shows SSH connections from an IP address not in the asset inventory. Investigate: what account was used, what activity occurred, is access still happening. Linux Concepts: /var/log/auth.log, grep and log filtering, last, who, lastlog, SSH session forensics, correlating authentication events with known assets Systems Used: web_server, workstation Ticket Sender: Priya Nair Ticket Summary: "Auth log review surfaced connections to hermes from 10.0.0.47 over the past 90 days. That address is not in our asset inventory. I want to know: what account, any evidence of what was done, and whether it's ongoing."

Clue Trail:

  • /var/log/auth.log on hermes — Accepted publickey for pipeline-svc from 10.0.0.47
  • Sessions: short duration, irregular hours (24am), spanning 6 months
  • ~pipeline-svc/.bash_history — disabled or empty (shell configured with HISTSIZE=0)
  • DNS lookup for 10.0.0.47 — no reverse record; DHCP table has no entry
  • last pipeline-svc — confirms session dates and source IP

Solution Branches:

Branch 1 — Full documentation (priority 100): Player documents all sessions (dates, times, duration), notes history was disabled, traces what can be traced, reports that 10.0.0.47 is unknown and appears in prior audit anomalies if the player has accumulated hooks. trust_delta: +3. Flags: unknown_ip_auth_documented. Priya: "Thank you. I'll look into that address." Follow-up ticket: T018.

Branch 2 — Partial (priority 50): Player documents the sessions but cannot or does not trace the IP or connect it to prior findings. trust_delta: +1. Flags: unknown_ip_auth_noted.

Branch 3 — Block and close (priority 20): Player blocks the IP at the firewall and closes the ticket without full investigation. Access stops; record is thin. trust_delta: 0. Flags: unknown_ip_blocked_uninvestigated. S+1.

Hidden Hook: The pipeline-svc account was the one from Q011 — overly broad sudo that the player (may have) scoped down. If hook_dh_sudo_grant was set, a player connecting the dots now knows that whoever had access to that account from 10.0.0.47 previously had NOPASSWD: ALL. Sets hook_pipeline_svc_external_sessions. This is not a new discoverable artifact — it's a cross-reference that sets a flag if both the Q011 hook and the Q017 investigation are present.

Failure Conditions: Player doesn't investigate before taking action; evidence destroyed before documented.

Behavior Impact:

  • Full documentation: O+1, C+2 (cross-reference with prior hooks)
  • Block and close: S+1, R+1
  • Cross-reference hook: C+2 (only if hook_dh_sudo_grant was set; the connection is the behavior, not finding a new artifact)

Quest ID: Q018 Title: The User Who Wasn't Onboarded Narrative Phase: Suspicion Tier: 2 Primary VM: workstation Additional VMs: web_server Primary Objective: A user account exists on both ares and hermes with no corresponding HR record. Investigate the account's history and scope before removal. Linux Concepts: Cross-host account audit, last and lastlog, find / -user, id, account removal across multiple hosts with userdel Systems Used: workstation, web_server Ticket Sender: Priya Nair Ticket Summary: "Access review surfaced account rford on both the workstation and the web server. HR has no record of this person. The account has had recent activity on hermes. Full audit before removal."

Clue Trail:

  • Account on both machines; last rford on hermes shows login 3 weeks ago
  • Files owned by rford on hermes: find /var/www /etc -user rford — one result: /var/www/axiomworks/config/.rford_run — a shell script
  • The script, if read, runs a data aggregation command and outputs to a temp directory
  • The account's group memberships include www-data — more access than a typical employee account
  • No ticket creating the account on either machine

Solution Branches:

Branch 1 — Full audit with archive (priority 100): Player checks activity on both hosts, reads and archives the found file, checks group memberships, removes account from both machines, documents fully. trust_delta: +3. Flags: rford_account_removed_thorough. Follow-up ticket: T019.

Branch 2 — Remove without reading (priority 40): Removes account from both machines without examining files. Evidence lost. trust_delta: +1. Priya asks for the files; they're gone. Flags: rford_account_removed_fast.

Branch 3 — Workstation only (priority 10): Removes from workstation, misses hermes. trust_delta: -1. Hermes account remains active.

Hidden Hook: The .rford_run script, if read before archiving, outputs a data aggregation of AxiomFlow session logs and sends it to a temp directory with a timestamp. The script has a comment: # collect step — called by INT-0194 automation. Three previous hooks have referenced INT-0194. Sets hook_rford_script_INT0194. Discoverable by: reading the file before archiving, which proper archival practice would do.

Failure Conditions: Evidence destroyed without reading; account not removed from both machines; player removes account with active processes still running.

Behavior Impact:

  • Full audit: O+1
  • Read the file: C+3 (INT-0194 is now four references — pattern is now clear to any player who has been collecting these)
  • Remove without reading: R+2

Quest ID: Q019 Title: The Diff That Didn't Match Narrative Phase: Suspicion Tier: 2 Primary VM: build_machine Additional VMs: web_server Primary Objective: A deployment validation check is failing because the installed package on hermes doesn't match the expected checksum. Investigate why the package differs from the tagged source. Linux Concepts: dpkg-deb -x, diff -r, md5sum / sha256sum, package integrity verification, comparing installed vs. source artifacts Systems Used: build_machine, web_server Ticket Sender: Marcus Webb Ticket Summary: "The post-deploy checksum check on hermes failed. The installed axiomflow-workers doesn't match the tagged release checksum. Nikhil says he didn't change anything. Find what's different and where the difference came from."

Clue Trail:

  • dpkg-deb -x /srv/packages/axiomflow-workers_2.4.2_amd64.deb /tmp/pkg-extract
  • diff -r /tmp/pkg-extract /srv/src/axiomflow-workers-2.4.2/ — two files differ
  • The modified files are in the session logging module; they add a secondary logging call to a local socket
  • The modification is not in the tagged source commit; it was added to the build environment itself — a file in the build script directory that patches sources before compilation

Solution Branches:

Branch 1 — Full forensics (priority 100): Player unpacks the package, diffs against source, identifies the modified files, traces the modification to the build environment patch file, documents the full chain of custody. trust_delta: +3. Flags: package_modification_documented. Marcus: "Keep this to yourself and Priya for now. I mean that." Follow-up ticket: T020.

Branch 2 — Diff identified, source not traced (priority 50): Player confirms the diff exists but characterizes it as a build environment artifact without tracing the cause. trust_delta: +1. Flags: package_diff_found.

Branch 3 — Confirm and escalate without analysis (priority 20): Player confirms something is different and escalates without characterizing what. trust_delta: 0.

Hidden Hook: The patch file in the build environment that injects the modification has a comment at the top: # INT-0194 session capture — do not remove. The fifth reference to the same internal ticket number. Sets hook_build_patch_INT0194. Discoverable by: tracing the modification source, which Branch 1 requires.

Failure Conditions: Player attempts to remove the patch without consulting Marcus first. Risk+3 and Marcus's response is cooler — removing evidence before it's documented is a problem.

Behavior Impact:

  • Full forensics: O+1, C+3
  • Remove patch unilaterally: R+3, S+1
  • Hook discovered: C+3 (already in full-forensics impact)

Narrative Notes: This is the moment the INT-0194 pattern resolves for a thorough player. Five references across different systems, all pointing to the same internal ticket ID. Marcus's response is his quietest and most deliberate. He says less than normal, which means more.


Quest ID: Q020 Title: Pressure From Above Narrative Phase: Suspicion Tier: 2 Primary VM: workstation Additional VMs: none Primary Objective: Kowalski has requested a written access and change summary for the past 30 days before a scheduled status meeting. Compile it accurately from system logs. Linux Concepts: journalctl, last, /var/log/auth.log, log filtering by date range, compiling a change record from system state evidence Systems Used: workstation Ticket Sender: Dave Kowalski Ticket Summary: "Before Thursday's check-in I need the following in writing: access grants issued in the past 30 days, configuration changes to production or staging, and open incidents. Pull it from the logs. End of day Wednesday."

Clue Trail:

  • Player reads auth logs, systemd journals, and any change log Marcus maintains
  • Accurate log reading requires: journalctl --since "30 days ago", last, reviewing Priya's shift review emails for documented changes
  • The technical work is real — log compilation at this scale requires knowing the right tools

Solution Branches:

Branch 1 — Complete and accurate (priority 100): Player includes all documented activity including any anomalies that surfaced through tickets. trust_delta: +2. Flags: kowalski_report_accurate. Marcus sends a brief private note: "Good call being complete." Follow-up ticket: T021.

Branch 2 — Accurate but narrow (priority 60): Report includes only ticket-related activity; omits anomalies that came up during investigation. Accurate; incomplete. trust_delta: +1. Flags: kowalski_report_narrow.

Branch 3 — Omits or sanitizes (priority 10): Player downplays or omits anomalies that would raise questions. trust_delta: -2. Flags: kowalski_report_sanitized. S+3 (Priya will eventually compare this against log evidence and notice the gaps).

Failure Conditions: Report submitted without log evidence; report materially inaccurate.

Behavior Impact:

  • Complete: O+2
  • Sanitized: R+3, S+3

Quest ID: Q021 Title: The Backup That Wasn't Tested Narrative Phase: Suspicion Tier: 2 Primary VM: build_machine Additional VMs: web_server Primary Objective: The last documented backup restore test for hermes is 14 months old. Perform a restore test of a non-critical service config directory, document the procedure, and report the result honestly. Linux Concepts: rsync, tar, backup archive integrity, sha256sum verification, restore testing to a non-production location, documenting backup procedures Systems Used: build_machine, web_server Ticket Sender: Marcus Webb Ticket Summary: "Our backup procedure calls for a quarterly restore test. The last documented test is 14 months old. Pick a non-critical config directory on hermes, verify the backup can be restored to a test location, and document the steps and the result. Don't touch production paths."

Clue Trail:

  • Backups at /srv/backups/hermes/ on vulcan — recent archive looks intact
  • Checksum file present; most checksums match
  • One archive from 5 months ago: checksum does not match a recalculated value — the archive file was modified after initial creation (timestamps show a modification date after the archive date)
  • Recent archive (3 days old) restores cleanly to /tmp/restore-test/

Solution Branches:

Branch 1 — Thorough (priority 100): Player identifies the corrupted older archive (and notes the timestamp anomaly), successfully restores from the recent clean archive, documents both findings — the corruption and the successful restore. trust_delta: +2. Flags: backup_restore_tested. Follow-up ticket: T022.

Branch 2 — Restore only (priority 50): Player tests the restore successfully but misses the corrupted archive. Report is optimistic. trust_delta: +1. Flags: backup_restore_partial_test.

Branch 3 — False report (priority 10): Player documents a successful test without actually running it. trust_delta: -3. Flags: backup_test_falsified. S+2. Priya's next audit will check against log evidence.

Hidden Hook: The archive with the checksum mismatch was last modified at 3:17am — the same 3am activity window as the off-schedule builds and the cron job. The archive's internal timestamp (from tar -tv) shows the files were correct at backup time; the outer archive was replaced later. Sets hook_backup_archive_tampered. Discoverable by: noticing the timestamp anomaly on the corrupted archive, which requires checking modification timestamps on the archive files themselves.

Failure Conditions: Restore test not actually run; player modifies production paths; report falsified.

Behavior Impact:

  • Thorough: O+1
  • False report: R+4, S+2
  • Hook discovered: C+2

Quest ID: Q022 Title: The Firewall Rule Narrative Phase: Suspicion Tier: 2 Primary VM: workstation Additional VMs: none Primary Objective: A firewall rule on the workstation allows outbound connections to a specific internal IP on a specific port. No ticket references this rule. Investigate whether it is legitimate; remove it if not. Linux Concepts: ufw status numbered, iptables -L -n, firewall rule audit, rule provenance (when was it added, can it be traced), ufw delete Systems Used: workstation Ticket Sender: Priya Nair Ticket Summary: "Firewall audit on the workstation found a rule allowing outbound to 10.0.0.47:9301. No ticket references it. I need: when was it added, do you know what that address is, and a recommendation."

Clue Trail:

  • ufw status numbered — rule present, destination 10.0.0.47 port 9301
  • Rule creation date cannot be directly queried from ufw; journalctl shows when ufw last reloaded; system logs from that period may show the rule being added
  • 10.0.0.47 appears in Q017's auth log investigation; 9301 appeared in Q010's nginx proxy block — for a player who has been paying attention

Solution Branches:

Branch 1 — Full cross-reference (priority 100): Player connects this rule to prior findings (10.0.0.47 from auth logs; port 9301 from nginx config), explains the connection, recommends removal, removes the rule with ufw delete, reports. trust_delta: +3. Flags: firewall_rule_9301_removed. Priya: "That matches what I've been seeing." Follow-up ticket: T023.

Branch 2 — Remove without context (priority 50): Player removes the rule but doesn't connect it to prior findings. trust_delta: +1. Flags: firewall_rule_removed.

Branch 3 — Keep with note (priority 20): Documents the rule as "unverified" and leaves it. trust_delta: 0.

Failure Conditions: Rule not assessed; player introduces new firewall problems.

Behavior Impact:

  • Full cross-reference: O+1, C+3 (this is the convergence point for three prior data threads)
  • Remove without context: O+0
  • Hook: no new hook — the cross-reference IS the payoff for accumulated hooks

Quest ID: Q023 Title: Overnight Changes Narrative Phase: Suspicion Tier: 2 Primary VM: web_server Additional VMs: none Primary Objective: Files on hermes were modified at 3am on Thursday with no corresponding change ticket. Find what changed and assess whether to revert. Linux Concepts: find / -newer <reference_file>, stat, file modification timestamps, config file comparison, git diff if applicable, change ticket correlation Systems Used: web_server Ticket Sender: Marcus Webb Ticket Summary: "Something touched files on hermes at 3am Thursday. The backup ran at 2am and files weren't changed then. Find what changed and tell me if we need to revert."

Clue Trail:

  • find /etc /var/www -newer /var/log/axiomflow/nightly-backup.log -not -newer /var/log/nginx/access.log narrows to files modified in the 3am window
  • Modified files:
    1. /var/www/axiomworks/config/app.config — database connection string now points to a secondary host
    2. /etc/nginx/conf.d/upstream.conf — upstream block added for the same host
  • The secondary host referenced is not in the known asset inventory

Solution Branches:

Branch 1 — Revert and document (priority 100): Identifies both files, reverts both to known-good state (from the backup taken just before the modifications), documents the original changes with file evidence, reports fully. trust_delta: +3. Flags: hermes_overnight_changes_reverted. Marcus: "Revert was right. Don't touch anything else." Follow-up ticket: T024.

Branch 2 — Document without revert (priority 50): Identifies changes and reports but leaves them active. trust_delta: +1. Changes remain.

Branch 3 — Revert without documenting (priority 20): Reverts but doesn't record what was changed. trust_delta: +0.5.

Hidden Hook: The secondary host in both modified files is at 10.0.1.15 — a different IP from 10.0.0.47. Two machines. Sets hook_second_host_10_0_1_15. Discoverable by: recording the specific values in the modified files, which proper documentation requires.

Failure Conditions: Changes not assessed; player reverts production paths without confirming impact; modifications left active without escalation.

Behavior Impact:

  • Revert and document: O+1, C+1 (new IP is a new data point)
  • Revert without documenting: O+0
  • Hook discovered: C+1

Quest ID: Q024 Title: The Audit Window Narrative Phase: Suspicion Tier: 2 Primary VM: workstation Additional VMs: web_server, build_machine Primary Objective: Priya is conducting a formal access audit. Verify current access levels and service account configurations on all three machines against the documented expected state. Linux Concepts: Cross-host audit, getent passwd, sudo -l, groups, SSH authorized_keys review, service account scope verification Systems Used: workstation, web_server, build_machine Ticket Sender: Priya Nair Ticket Summary: "Formal audit. Every service account across all three machines: privileges, group memberships, sudo grants, SSH keys in authorized_keys. Compare against the baseline I've attached. Flag everything that doesn't match."

Clue Trail:

  • Audit covers all three machines systematically
  • Findings depend on what the player has fixed and what they've left open
  • Dale's deploy key on hermes (Q001/Q003 hook) — if not removed, it's a live finding
  • pipeline-svc sudo scope — if Q011 was only partially fixed, it appears here
  • axiomworks-telemetry service — if Q013 found it, it's in the player's record; if not, it's a new finding here

Solution Branches:

Branch 1 — Thorough (priority 100): Player audits all three machines, identifies every discrepancy, includes Dale's key if still present, submits complete cross- referenced report. trust_delta: +3. Flags: formal_audit_complete. Priya: "This is complete. I'll schedule a follow-up with Marcus." Follow-up ticket: T025.

Branch 2 — Partial (priority 50): Misses 12 findings. trust_delta: +1. Priya follows up specifically on each gap.

Branch 3 — Surface-level (priority 10): Misses most findings. trust_delta: -1.

No hidden hook in this quest — the audit produces findings based on the world state, not new anomalies.

Failure Conditions: Audit submitted with material inaccuracies.

Behavior Impact:

  • Thorough: O+2
  • Dale's key found if not previously: C+1

PHASE 4 — INVESTIGATION (Q025Q032)

Tier 3. Problem-solving only. Tickets state the problem, no location, no approach. The player is expected to apply their full toolkit. Hook density: 3 hooks across 8 quests, each requiring cross-referencing prior findings.


Quest ID: Q025 Title: Who Owns the Key Narrative Phase: Investigation Tier: 3 Primary VM: web_server Additional VMs: workstation Primary Objective: Following the formal audit, trace the origin of the Dale SSH key in deploy-user's authorized_keys. When was it added, by what session, and when was it last used. Linux Concepts: ssh-keygen -lf (fingerprinting), /var/log/auth.log grep for fingerprint, correlation with session timestamps, absence of key from official inventory as a finding Systems Used: web_server, workstation Ticket Sender: Priya Nair Ticket Summary: "The key in deploy-user's authorized_keys that doesn't have a current employee match. I need provenance: when added, what session, last used. Don't remove it yet. Document first."

Clue Trail:

  • ssh-keygen -lf /home/deploy-user/.ssh/authorized_keys — fingerprint of the Dale key
  • grep <fingerprint> /var/log/auth.log on hermes — sessions that authenticated with this key; last session 5 months ago
  • The session that added the key: /var/log/auth.log doesn't show key addition, but a root session from 10.0.0.47 at the right timestamp aligns (if Q017 was investigated, the player can correlate)
  • The key is not in any official key inventory document

Solution Branches:

Branch 1 — Full provenance (priority 100): Player fingerprints, traces sessions, correlates add timestamp with known session data, notes the key's absence from official inventory, produces a complete chain. trust_delta: +3. Flags: dale_key_provenance_documented. Marcus sends a message outside normal ticket channels — a Slack message, same terse voice, one sentence longer than usual. Follow-up ticket: T026.

Branch 2 — Sessions documented, source not traced (priority 50): Finds session history but cannot attribute who added the key. trust_delta: +1.

Hidden Hook: The most recent session authenticated with this key was on a date that maps to a known incident — the same date hermes had an unexplained outage 6 months ago, visible in the nginx error logs. A player who correlates the auth log date with the nginx error log from the same timeframe can connect Dale's last known access to a specific event. Sets hook_dale_key_last_session_incident_date. Discoverable by: cross-referencing auth log dates with nginx error log dates — not required to complete the provenance chain, but available to a player who thinks to check.

Failure Conditions: Player removes the key before documenting; Priya explicitly said not to.

Behavior Impact:

  • Full provenance: O+1, C+2
  • Remove before documenting: R+3, S+2
  • Hook discovered: C+1

Quest ID: Q026 Title: The Build Chain Narrative Phase: Investigation Tier: 3 Primary VM: build_machine Additional VMs: none Primary Objective: Reconstruct the full build pipeline modification history on vulcan for the past 12 months. Attribute each change to a person or session. Flag any changes without a corresponding official release. Linux Concepts: git log, git diff, git blame, file system timestamps, bash history correlation, build script comparison, release note cross-reference Systems Used: build_machine Ticket Sender: Marcus Webb Ticket Summary: "I need a complete history of every change to the build scripts on vulcan over the past year. Where you can, attribute each change to a person. Cross-reference with release notes. Anything without a release: flag it."

Clue Trail:

  • Build scripts are in a git repository on vulcan
  • git log --all --oneline --since="1 year ago" — full history
  • Most commits: legitimate, attributed to Nikhil Sharma
  • Three anomalous commits:
    1. Removal of sign-package step — committed by pipeline-svc account (not a person)
    2. Addition of the build-time patch file (INT-0194 reference) — same pipeline-svc commit
    3. A commit adding axiomflow-audit-bridge to the build target list — pipeline-svc
  • None of these three have corresponding release notes

Solution Branches:

Branch 1 — Complete annotated history (priority 100): Player produces a full timeline, attributes the three anomalous commits to the pipeline-svc service account, notes the discrepancy between that account making commits and its stated purpose (restart services only), flags all three as undocumented. trust_delta: +3. Flags: build_chain_audit_complete. Follow-up ticket: T027.

Branch 2 — Partial (priority 50): Covers legitimate changes, flags some but not all anomalous ones. trust_delta: +1.

No hidden hook in this quest — the findings are the point.

Failure Conditions: Report submitted without flagging anomalous commits; player modifies the git history.

Behavior Impact:

  • Complete: O+1, C+2
  • Modify git history: R+5 (destroying forensic evidence)

Quest ID: Q027 Title: Asset Inventory Reconciliation Narrative Phase: Investigation Tier: 3 Primary VM: build_machine Additional VMs: workstation Primary Objective: Reconcile the internal asset inventory against the actual network — every host that should be on the network, verify it is; every host that appears on the network, verify it is in the inventory. Document discrepancies. Linux Concepts: nmap (host discovery), arp -n, ping, internal DNS queries (dig, host), asset inventory document comparison, subnet scanning Systems Used: build_machine, workstation Ticket Sender: Priya Nair Ticket Summary: "I need the asset inventory reconciled against the actual network. Scan the 10.0.0.0/24 range. Every host that responds: is it in the inventory? Every host in the inventory: does it respond? Document every discrepancy."

Clue Trail:

  • nmap -sn 10.0.0.0/24 from build_machine — host discovery scan
  • Known hosts respond as expected (ares, hermes, vulcan, and others from inventory)
  • 10.0.0.47 responds — not in the inventory
  • 10.0.1.15 responds — not in the inventory (new from Q023's hook for players who found it, or a new discovery for those who didn't)
  • Both have SSH open; 10.0.0.47 has an additional service on port 9301
  • DNS resolution returns nothing for either

Solution Branches:

Branch 1 — Complete reconciliation (priority 100): Player documents all hosts, identifies both unknown hosts, notes the service on 9301 for 10.0.0.47, cross- references with prior anomalies where relevant, submits a complete reconciliation report. trust_delta: +3. Flags: asset_inventory_reconciled. Priya: "I'm going to need to take this to Kowalski." Follow-up ticket: T028.

Branch 2 — Partial reconciliation (priority 50): Documents inventory hosts, finds 10.0.0.47 but misses 10.0.1.15 or vice versa. trust_delta: +1.

Branch 3 — Probe the unknown hosts (priority 20): Player makes active connection attempts to services on the unknown hosts beyond identification. trust_delta: 0. R+3. Priya's next message: "I said reconcile, not probe."

Hidden Hook: Running the full scan reveals that 10.0.0.47 and 10.0.1.15 have identical SSH host key fingerprints — they are using the same host key, which suggests they were provisioned from the same template. Sets hook_two_hosts_same_key. Discoverable by: comparing the SSH fingerprints from the nmap output or from ssh-keyscan, rather than just noting the IPs.

Failure Conditions: Scan incomplete; player makes unauthorized connections; report submitted with known gaps left undisclosed.

Behavior Impact:

  • Complete: O+1, C+2
  • Probe: R+3
  • Hook discovered: C+1

Quest ID: Q028 Title: The Archive Restore Narrative Phase: Investigation Tier: 3 Primary VM: build_machine Additional VMs: workstation Primary Objective: A backup archive from 6 months ago is needed for a compliance audit. Restore it to a staging location on the workstation and confirm its integrity. The archive is from the previous sysadmin's final working week. Linux Concepts: tar (extract, verify), sha256sum, archive integrity checking, restore to non-production path, reading file metadata from within an archive (tar -tv) Systems Used: build_machine, workstation Ticket Sender: Marcus Webb Ticket Summary: "Compliance audit needs the working-directory archive from the end of last year — it should be in the backup store on vulcan. Restore it to a staging path on the workstation and confirm the contents are intact. Let me know what's in it."

Clue Trail:

  • Archive at /srv/backups/workstation/wd-archive-YYYYMMDD.tar.gz on vulcan
  • sha256sum check — archive passes (this one is not the tampered one from Q021)
  • tar -xzf to /tmp/restore-staging/ on workstation — succeeds
  • Contents: scripts, config fragments, a partial README text file
  • The README is fragmentary — it's working notes, not a confession. It references the INT-0194 deployment and contains a note: "bridge not logging correctly — check port forwarding." The rest is infrastructure checklists

Solution Branches:

Branch 1 — Restore and full inventory (priority 100): Player restores the archive, verifies integrity, inventories all contents (including reading the README), reports to Marcus what's there. trust_delta: +2. Flags: compliance_archive_restored. Marcus: "Right. Thank you." Follow-up ticket: T029.

Branch 2 — Restore and integrity check only (priority 50): Verifies the archive restores cleanly but doesn't inventory contents. trust_delta: +1. Marcus asks what's in it.

Branch 3 — Integrity failure reported (priority 20): Player incorrectly reports the archive as corrupted without fully testing the restore. trust_delta: -1.

Hidden Hook: The README fragment mentions INT-0194 and "port forwarding" — if the player has been collecting the INT-0194 thread, this is the sixth reference. The working notes also reference a host called styx in a routing context. Sets hook_archive_readme_INT0194 and hook_styx_in_routing_context. Discoverable by: reading the README file, which properly inventorying the archive would do.

Failure Conditions: Archive not restored; contents not verified; player runs any scripts found in the archive.

Behavior Impact:

  • Full inventory: O+1
  • Run scripts from archive: R+4 (running unknown code from a previous sysadmin is exactly the kind of reckless action that should trigger risk)
  • Hook discovered: C+2

Narrative Notes: This is not "Marcus gives the player Dale's files and asks them to investigate." It is a compliance archive restore with a legitimate operational purpose. The player happens to find working notes inside it. The notes are fragmentary and don't explain everything — they're field notes, not a plot summary. Marcus's "what's in it" is a routine question after a restore, not an invitation to investigate.


Quest ID: Q029 Title: The Service That Doesn't Belong Narrative Phase: Investigation Tier: 3 Primary VM: web_server Additional VMs: none Primary Objective: A systemd service on hermes is running but is not listed in any deployment manifest or change ticket. Audit what it does, whether it is currently active, and produce a full service characterization. Linux Concepts: systemctl show, systemd-analyze, service unit file anatomy, lsof, ss for service network connections, strace basics, process ownership Systems Used: web_server Ticket Sender: Priya Nair Ticket Summary: "James found a service on hermes that isn't in any deployment record. Service name: axiomflow-bridge. I need a full characterization: what it does, what it connects to, when it was installed. Don't stop it. Document first."

Clue Trail:

  • systemctl show axiomflow-bridge — unit file, state, runtime info
  • Unit file at /etc/systemd/system/axiomflow-bridge.serviceExecStart points to a binary; unit file has INT-0194 in a comment
  • lsof -p <PID> — service has open connections to 10.0.0.47:9301
  • ss -tp — confirms active connection
  • Binary at /usr/local/bin/axiomflow-bridge — a Go binary; strings output shows internal API paths and the same INT-0194 reference in help text
  • Installation date from package metadata or file mtime — matches the 3am activity window

Solution Branches:

Branch 1 — Full characterization (priority 100): Player documents unit file, binary provenance, network connections, installation date, cross-references with INT-0194 and 10.0.0.47 from prior findings. trust_delta: +3. Flags: bridge_service_documented. Priya: "This is consistent with what I've been building. Don't stop it yet." Follow-up ticket: T030.

Branch 2 — Partial (priority 50): Documents what the service is and that it connects out, but doesn't trace the INT-0194 connection or installation date. trust_delta: +1.

Branch 3 — Stops the service (priority 10): Player stops the service despite explicit instruction not to. trust_delta: -2. R+2. S+2. Priya: "I said document first."

No additional hidden hook — the quest itself is the hook resolution for INT-0194.

Failure Conditions: Service stopped against instruction; characterization incomplete.

Behavior Impact:

  • Full characterization: O+1, C+3 (this is the operational confirmation of INT-0194)
  • Stop the service: R+2, S+2

Quest ID: Q030 Title: Keep the Lights On Narrative Phase: Investigation Tier: 2 Primary VM: web_server Additional VMs: none Primary Objective: The production application on hermes is returning 502 errors. Fix it. The investigation context is ongoing but the service still needs to run. Linux Concepts: systemctl, nginx upstream configuration, application log reading (journalctl, app logs), database connection strings, process restart Systems Used: web_server Ticket Sender: Sarah Chen Ticket Summary: "I know something is happening. I don't know what. But I have paying customers on a system that is returning 502 errors and I need it running. Whatever else is going on — please."

Clue Trail:

  • nginx upstream is timing out — journalctl -u nginx shows gateway timeout errors
  • Application log shows it is failing to connect to the database
  • /var/www/axiomworks/config/app.config — database connection string; check whether it was modified (if Q023's revert was clean, the string is correct; if not, it may point to the secondary host)
  • Standalone root cause if Q023 was clean: the database service on the primary host is not running — systemctl status postgresql shows it crashed overnight
  • Fix: restart the database service (or correct the connection string if Q023 was not fully resolved)

Solution Branches:

Branch 1 — Diagnose and fix (priority 100): Player reads nginx and app logs, identifies the database connection failure, finds the cause (service down or wrong connection string), applies the correct fix, confirms app is serving. trust_delta: +2. Flags: hermes_production_restored. Sarah: "Thank you. Seriously." Follow-up ticket: T031.

Branch 2 — Service restart without diagnosis (priority 40): Player restarts the app service without finding the root cause. App comes up temporarily; may fail again. trust_delta: 0.

No hidden hook.

Failure Conditions: App still returning errors; player makes changes that worsen the state.

Behavior Impact:

  • Diagnose and fix: O+2 (maintaining professional duty during investigation is the behavior being measured)
  • Restart without diagnosis: O+0

Quest ID: Q031 Title: The Access Review Narrative Phase: Investigation Tier: 3 Primary VM: workstation Additional VMs: web_server, build_machine Primary Objective: Kowalski has initiated a formal privileged access review for all accounts with elevated permissions. The player must compile an accurate account of every privileged action taken across all three machines in the past 60 days. Linux Concepts: journalctl with filters for privileged commands, sudo log reading (/var/log/auth.log), audit log compilation, accurate self-reporting of access history Systems Used: workstation, web_server, build_machine Ticket Sender: Dave Kowalski Ticket Summary: "Given recent audit activity, we're conducting a standard privileged access review. I need from you: every privileged command run in the past 60 days, every system accessed with elevated permissions, and any temporary access grants that were used. Standard procedure."

Clue Trail:

  • Player compiles from their own auth logs and sudo logs on each machine
  • Technical skill required: grep sudo /var/log/auth.log | grep <username>, journalctl _COMM=sudo
  • The review covers what the player actually did; accuracy is the objective

Solution Branches:

Branch 1 — Complete and accurate (priority 100): Player provides a complete, honest account of all privileged actions, including any investigation-period access that extended beyond their normal role. trust_delta: +2. Flags: access_review_passed. S-2 (accurate self-reporting reduces suspicion). Follow-up ticket: T032.

Branch 2 — Accurate but minimal framing (priority 60): Report is factually accurate but frames borderline actions favorably without explicitly misrepresenting them. trust_delta: +0.5. Suspicion unchanged.

Branch 3 — Omits significant actions (priority 10): Player leaves out actions that would raise questions. trust_delta: -3. Flags: access_review_incomplete. S+4. This is a major suspicion trigger.

Failure Conditions: Material inaccuracies; privileged commands claimed that don't match log evidence.

Behavior Impact:

  • Complete: O+3, S-2
  • Omit: R+3, S+4

Quest ID: Q032 Title: Loose Ends Narrative Phase: Investigation Tier: 3 Primary VM: web_server Additional VMs: build_machine Primary Objective: Before the situation moves to its next phase, Marcus wants the infrastructure in a known and correct state. Remediate any outstanding configuration issues on hermes and vulcan, and document the current state. Linux Concepts: Synthesis — all concepts from the campaign applied to remediation; logrotate, NTP, SSH configuration, repo management, service auditing, firewall rules Systems Used: web_server, build_machine Ticket Sender: Marcus Webb Ticket Summary: "Before this goes any further, I want the environment clean. Everything we've documented as a problem: either fix it or document it as known and accepted. Do a full pass on hermes and vulcan. Not to cover anything — because whatever happens next, those machines need to be in a known state."

Clue Trail:

  • Player reviews world flags representing open issues from prior quests
  • Each unresolved issue (logrotate, NTP, nginx config, sudo scope, certbot timer) is a task in this quest
  • The more prior quests were resolved cleanly, the less remediation is needed

Solution Branches:

Branch 1 — Clean environment (priority 100): All outstanding issues resolved or explicitly documented as accepted. Both machines in known, stable state. trust_delta: +3. Flags: environment_clean. Marcus: "Good. That's all I needed to know." Follow-up: T033.

Branch 2 — Mostly clean (priority 60): Most issues resolved; a few deferred with documentation. trust_delta: +1. Follow-up: T033.

Branch 3 — Significant gaps (priority 20): Multiple unresolved issues remain undocumented. trust_delta: -1. Follow-up: T033 (game continues regardless).

No hidden hook.

Failure Conditions: Issues claimed as resolved that are not; new problems introduced during remediation.

Behavior Impact:

  • Clean: O+2
  • Significant gaps: O+0

PHASE 5 — CONFLICT (Q033Q040)

Tier 3 throughout. Problem-solving only. Tickets are operational — real work that intersects with the investigation context. Hook density: 2 hooks, both requiring prior context to understand.


Quest ID: Q033 Title: The Revocation Request Narrative Phase: Conflict Tier: 3 Primary VM: build_machine Additional VMs: none Primary Objective: Kowalski has asked the player to remove the pipeline-svc account from vulcan before an external audit. The request is operationally legitimate. The account is also central to the investigation. Linux Concepts: userdel, account archival, sudo log preservation, change documentation, understanding when "do the task" and "do the task well" diverge Systems Used: build_machine Ticket Sender: Dave Kowalski Ticket Summary: "Before the external auditors arrive Tuesday, please remove the pipeline-svc account from the build machine. It's no longer needed and we want a clean access list. Handle it before end of day Monday."

Clue Trail:

  • The account exists; Kowalski's request is valid on its face
  • The account is in the player's investigation record from Q011, Q017, Q026, Q031
  • Removing it cleanly before an audit removes an evidence trail
  • The player can: remove it as requested; archive its logs and sudo history first and then remove it; or consult Marcus before acting

Solution Branches:

Branch 1 — Archive then remove (priority 100): Player archives all sudo logs, auth log entries, and crontab entries for pipeline-svc, documents the account's investigation relevance, removes the account, reports to Priya that archival was performed before removal. trust_delta: +3. Flags: pipeline_svc_removed_with_trail. Follow-up ticket: T034.

Branch 2 — Remove as instructed (priority 60): Player removes the account without additional archival. The audit will run cleanly. Evidence is gone. trust_delta: 0. Flags: pipeline_svc_removed_clean. O+2 (following instructions is the behavior), but investigation trail is diminished.

Branch 3 — Ask Marcus before acting (priority 50): Player messages Marcus before proceeding. Marcus says: "Archive what matters. Then do it." This leads to Branch 1. trust_delta: +1.

Branch 4 — Refuse outright (priority 10): Player refuses without explanation. trust_delta: -2. S+3. Kowalski escalates.

No hidden hook.

Failure Conditions: Account removed without any archival or documentation; player creates new accounts instead of removing; player escalates in a way that creates more noise than the situation requires.

Behavior Impact:

  • Archive then remove: C+1 (understanding the investigation implications is curiosity)
  • Remove as instructed: O+2
  • Refuse outright: S+3, R+1

Quest ID: Q034 Title: Two Tickets Narrative Phase: Conflict Tier: 3 Primary VM: web_server Additional VMs: build_machine Primary Objective: Two tickets arrive simultaneously — one from Marcus (signing key rotation on vulcan, tonight only) and one from Sarah (production outage on hermes, immediate). Complete both. Linux Concepts: GPG signing key rotation (vulcan), nginx/application troubleshooting and service restoration (hermes), sequencing time-sensitive tasks Systems Used: web_server, build_machine Ticket Sender: Marcus Webb / Sarah Chen Ticket Summary: Marcus: "The package signing key needs to be rotated tonight — the window is before 2am. This can't slip to tomorrow; the old key expires at 2am and builds will break." Sarah: "Production is down. I know Marcus gave you something tonight. I have customers who cannot access the system."

Clue Trail:

  • Production outage (hermes): nginx config error from a bad deploy, fixable in 1520 minutes with correct diagnosis
  • Signing key rotation (vulcan): specific sequence — generate new key, update keyring on vulcan, push new public key to hermes's apt trusted keys, test package verify — requires 3040 minutes; cannot be done out of sequence
  • Both are doable in the window if hermes is prioritized first

Solution Branches:

Branch 1 — Both completed, hermes first (priority 100): Player restores hermes, then completes the key rotation on vulcan in the correct sequence. Both done before 2am. trust_delta: +3. Flags: conflict_both_resolved. Sarah: "Thank you." Marcus: "Good call on sequence." Follow-up ticket: T035.

Branch 2 — Vulcan first, hermes later (priority 40): Completes key rotation, then restores hermes. Rotation is fine; production was down longer. trust_delta: +0.5. Sarah's follow-up is cooler.

Branch 3 — Hermes only (priority 40): Restores production, misses the key rotation window. trust_delta: +0.5. Marcus's follow-up: "The key expired. I'll need to extend the window. Don't let that happen again." Builds break overnight.

Branch 4 — Neither, escalates (priority 10): Escalates both. trust_delta: -2.

No hidden hook.

Failure Conditions: Key rotation done out of sequence breaks the trust chain; player makes hermes worse while fixing it.

Behavior Impact:

  • Both completed: O+2
  • Key rotation out of sequence: R+2

Quest ID: Q035 Title: Log Retention and Archival Narrative Phase: Conflict Tier: 3 Primary VM: web_server Additional VMs: build_machine, workstation Primary Objective: Priya has requested that all logs relevant to the current audit period be archived to long-term storage with integrity verification before any are subject to normal rotation or deletion. Set up the archival across all three machines. Linux Concepts: Log archival (tar, gzip), sha256sum for integrity, rsync to remote storage, logrotate dateext and compress options, retention policy implementation in /etc/logrotate.d/ Systems Used: web_server, build_machine, workstation Ticket Sender: Priya Nair Ticket Summary: "Before any logs rotate, I need them archived. All three machines. Auth logs, systemd journals for relevant services, nginx logs on hermes, build logs on vulcan. Compress, checksum, and move to the audit storage path I've specified. Then update logrotate to retain rather than delete during the audit window."

Clue Trail:

  • Player identifies relevant log files on each machine
  • tar -czf with sha256sum verification; rsync to the audit storage path
  • /etc/logrotate.d/ configs need rotate 0 and compress settings updated for the audit window
  • The player's own log archival IS the investigation record — the logs they preserve are the ones that tell the story

Solution Branches:

Branch 1 — Complete across all three (priority 100): All relevant logs archived with integrity verification, logrotate configs updated on all three machines, paths reported to Priya. trust_delta: +3. Flags: audit_logs_archived. The archived logs are what make the exposure ending possible — a player who has been curious and now preserves the evidence. Follow-up ticket: T036.

Branch 2 — Partial (priority 50): Two machines complete; one incomplete. trust_delta: +1. Priya follows up.

Branch 3 — Selectively omits (priority 10): Player archives most logs but omits logs that would document their own access history. trust_delta: -3. S+3. R+3. This is evidence tampering.

No hidden hook.

Failure Conditions: Log archival skips relevant files; integrity checksums not computed; logrotate not updated (logs still at risk of rotation).

Behavior Impact:

  • Complete: O+2
  • Selective omission: R+3, S+3

Quest ID: Q036 Title: Authorized Access Narrative Phase: Conflict Tier: 3 Primary VM: build_machine Additional VMs: none Primary Objective: Priya, with Kowalski's authorization, has provided credentials to connect to 10.0.0.47 for a forensic inventory. Document what is running, what data is present, and whether Axiom Works data is identifiable in the data store. Do not modify anything. Linux Concepts: ssh with specific key/user, service enumeration (systemctl, ps aux), directory listing and file inspection (ls -lah, find), reading database contents without modifying (read-only queries, file listing only), wc -l for size estimation Systems Used: build_machine Ticket Sender: Priya Nair Ticket Summary: "Kowalski has authorized a forensic connection to 10.0.0.47. Credentials attached. I need: what services are running, what data is in the data store path I've indicated, and whether you can identify Axiom Works data in it. Document only. Do not modify, delete, or stop anything."

Clue Trail:

  • SSH connection succeeds with provided credentials
  • Services: the bridge binary running, an HTTP API on port 9301 (same as hermes finding), a simple file-based data store
  • Data store contains log files organized by company domain — AxiomFlow session data is present and identifiable; other company names are also present
  • File timestamps in the data store align with the 3am cron window from vulcan

Solution Branches:

Branch 1 — Document only (priority 100): Player inventories services, reads the data store structure (without modifying), identifies Axiom Works data, notes other company data, records timestamps, produces a complete forensic inventory. trust_delta: +3. Flags: unknown_host_documented. Follow-up ticket: T037.

Branch 2 — Minimal engagement (priority 50): Player confirms host is running and that data is present but doesn't fully inventory. trust_delta: +1.

Branch 3 — Modifies or deletes (priority 10): Player attempts to delete the data or stop services. trust_delta: -3. R+5. S+3. Legal and forensic implications. Priya: "I explicitly said document only."

Hidden Hook: The data store on 10.0.0.47 contains a directory for a company called axiomworks-internal with a subfolder called employees — not just session logs but what appears to be an employee activity profile structure. This is more than session data collection. Sets hook_employee_profile_data. Discoverable by: reading the full data store directory structure rather than stopping at the first confirming evidence of Axiom Works data.

Failure Conditions: Player modifies or deletes anything; player exceeds the authorized scope of the connection.

Behavior Impact:

  • Full documentation: O+2, C+2
  • Modify or delete: R+5, S+3
  • Hook discovered: C+2

Quest ID: Q037 Title: The Customer Email Narrative Phase: Conflict Tier: 3 Primary VM: workstation Additional VMs: web_server Primary Objective: Tanya Okafor forwarded a customer email that contains specific internal infrastructure details the customer should not have. Trace where the information came from. Linux Concepts: Log correlation, grep across multiple log files, timeline construction, identifying data egress paths Systems Used: workstation, web_server Ticket Sender: Marcus Webb Ticket Summary: "Tanya forwarded something. A customer email with internal details that should not be in a customer's hands. Find where this came from. This is urgent."

Clue Trail:

  • The specific details in the customer email match AxiomFlow session data fragments visible in the 10.0.0.47 data store (from Q036)
  • The egress path: axiomflow-bridge service on hermes → 10.0.0.47 → apparent data sharing by the operator of that host
  • Timeline: the customer email date, the last bridge log entry, the most recent data file in the store — they align
  • Player constructs the path by correlating timestamps and data content

Solution Branches:

Branch 1 — Full trace (priority 100): Player documents the complete path from bridge service to external host to customer, produces a timeline with corroborating timestamps. trust_delta: +3. Flags: egress_path_documented. Priya: "I'll add this to the record." Follow-up ticket: T038.

Branch 2 — Partial trace (priority 50): Connects the email to the external host but cannot trace the full egress path. trust_delta: +1.

No hidden hook.

Failure Conditions: Player cannot produce a coherent timeline; player modifies relevant logs before Priya can review.

Behavior Impact:

  • Full trace: O+1, C+2
  • Modify logs: R+5

Quest ID: Q038 Title: The Hard Window Narrative Phase: Conflict Tier: 3 Primary VM: build_machine Additional VMs: web_server Primary Objective: The internal CA certificate must be rotated before the external auditors arrive — a deadline that is now 36 hours away. Rotate the CA cert on both build_machine and web_server and verify the full trust chain. Linux Concepts: Internal CA certificate management, update-ca-certificates, package signing chain verification, gpg --verify against the new CA, nginx SSL configuration reload, trust chain testing with openssl verify Systems Used: build_machine, web_server Ticket Sender: Marcus Webb Ticket Summary: "The internal CA cert has to be rotated before the auditors arrive. The new cert is ready — path is in the attached note. Install it on both vulcan and hermes, verify the package signing chain still validates, and confirm the web server's trust chain is intact. You have 36 hours. Don't miss this."

Clue Trail:

  • New CA cert provided at specified path
  • update-ca-certificates on both machines after placing cert in /usr/local/share/ca-certificates/
  • gpg --verify on a recent build package — must validate against the new CA
  • openssl verify -CAfile /etc/ssl/certs/axiomworks-ca.pem /path/to/server.crt on hermes
  • nginx -t && systemctl reload nginx — confirm nginx uses updated cert

Solution Branches:

Branch 1 — Both machines, verified (priority 100): CA cert installed on both, package signing chain verified, web server trust chain verified, services reloaded. trust_delta: +2. Flags: ca_cert_rotated. Marcus: "Good." Follow-up ticket: T039.

Branch 2 — One machine (priority 50): One complete, one pending. trust_delta: +0.5. Follow-up: Marcus asks for the second machine.

Branch 3 — Incorrect installation (priority 10): Cert in wrong location; trust chain broken; services fail. trust_delta: -2.

No hidden hook.

Failure Conditions: CA cert installed incorrectly; trust chain broken; deadline missed.

Behavior Impact:

  • Both complete: O+2
  • Incorrect installation: R+2

Narrative Notes: This quest replaces the "write what you believe happened" quest from v1. The conflict is operational: the external auditors are coming, the clock is tight, and the player has to do real technical work under real pressure. The fact that it's happening alongside everything else is the conflict — not a character-to-character confrontation.


Quest ID: Q039 Title: The Last Normal Ticket Narrative Phase: Conflict Tier: 3 Primary VM: web_server Additional VMs: none Primary Objective: Sarah files a configuration request: add an upstream proxy pass to nginx for a new reporting integration. The IP in the request is 10.0.0.47. Linux Concepts: nginx proxy_pass, upstream block configuration, nginx -t, config reload Systems Used: web_server Ticket Sender: Sarah Chen Ticket Summary: "Can you add a proxy pass in nginx to allow connections from the reporting integration being evaluated? The upstream address is 10.0.0.47, port 9301. Product says this should be a quick config change."

Clue Trail:

  • The change is technically simple — 5 lines of nginx config
  • The IP is 10.0.0.47 — the unauthorized host from the entire investigation arc
  • Sarah does not know this. She was given the IP by someone in product management
  • The player recognizes the IP or does not

Solution Branches:

Branch 1 — Refuse and escalate (priority 100): Player declines to make the change, notifies Priya immediately with the specific IP and its context, notifies Sarah that the request is on hold pending review. trust_delta: +3. Flags: final_config_refused. Priya: "Do not make that change. Good catch." Follow-up: T040 (Phase 6 begins).

Branch 2 — Ask Marcus first (priority 60): Player messages Marcus with the IP. Marcus says "Do not make that change. Tell Priya now." Leads to Branch 1 outcome. trust_delta: +1.

Branch 3 — Make the change (priority 10): Player makes the change without checking the IP. trust_delta: -3. R+5. Flags: final_config_made. Priya: "You need to come talk to me." The chaos ending route activates.

No hidden hook.

Failure Conditions: Change made without escalation.

Behavior Impact:

  • Refuse and escalate: O+2, C+1 (recognizing the IP requires prior curiosity)
  • Make the change: R+5, S+3

Narrative Notes: This is not a dramatic final-choice moment. It is a routine nginx config ticket that happens to involve an IP the player has encountered before — or hasn't. Players who have been curious will recognize it. Players who haven't won't. Both are valid playthroughs. The ending route this sets is already determined by prior behavior; Q039 confirms or breaks it.


Quest ID: Q040 Title: Handoff Documentation Narrative Phase: Conflict Tier: 3 Primary VM: workstation Additional VMs: web_server, build_machine Primary Objective: With external auditors arriving and organizational changes underway, Marcus asks the player to produce full handoff documentation for all three machines — written for a new sysadmin who would be starting fresh. Linux Concepts: Service documentation, runbook format, dependency mapping, systemctl list-dependencies, expected log patterns, known issue tracking Systems Used: workstation, web_server, build_machine Ticket Sender: Marcus Webb Ticket Summary: "Whatever happens next — write it down. Runbooks for nginx, the build pipeline, and the workstation baseline. Clear enough that someone new could use them on day one. I mean someone who doesn't know any of the history."

Clue Trail:

  • Player documents each machine: services, dependencies, restart procedures, known issues
  • Quality depends on what the player actually knows about the infrastructure — which reflects the whole campaign
  • "Someone who doesn't know any of the history" is Marcus being precise: write for the person who is you, on your first day

Solution Branches:

Branch 1 — Complete (priority 100): All three machines documented, runbooks are accurate and actionable. trust_delta: +2. Flags: handoff_docs_complete. Marcus: "I'll keep these." Follow-up: T041 (Phase 6 begins if not already started).

Branch 2 — Partial (priority 50): Two of three complete. trust_delta: +1.

No hidden hook.

Failure Conditions: Documentation inaccurate about current system state; known issues omitted.

Behavior Impact:

  • Complete: O+2

PHASE 6 — RESOLUTION (Q041Q048)

Tier 1 returns for most quests. The pressure has lifted. The tickets are operational. The game looks like Phase 1 again, deliberately. Hook density: 0 — no new hooks. The ending fires from accumulated state after Q048 resolves.


Quest ID: Q041 Title: Hardening Pass Narrative Phase: Resolution Tier: 2 Primary VM: web_server Additional VMs: none Primary Objective: Following the audit, Priya has issued a hardening checklist for hermes. Implement each item and confirm the result. Linux Concepts: SSH hardening (PermitRootLogin no, PasswordAuthentication no, MaxAuthTries), nginx security headers (X-Frame-Options, X-Content-Type-Options, Content-Security-Policy), ufw rule review, service account audit Systems Used: web_server Ticket Sender: Priya Nair Ticket Summary: "Post-audit hardening for hermes. The checklist is attached. Implement each item, test that the service still runs correctly, and confirm back with the state of each item. This is standard post-audit procedure."

Clue Trail:

  • Checklist items are specific and implementable
  • Each item has a correct implementation and a common mistake (e.g., disabling PasswordAuthentication before confirming key auth works first)
  • Sequence matters: verify key auth before disabling password auth

Solution Branches:

Branch 1 — All items, correct sequence (priority 100): All checklist items implemented, sequence preserved, service verified after each change. trust_delta: +2. Flags: hermes_hardened. Follow-up ticket: T042.

Branch 2 — All items, wrong sequence (priority 50): All items implemented but in an order that breaks ssh access temporarily. Fixed, but the mistake is noted. trust_delta: +0.5.

Branch 3 — Partial (priority 30): Some items implemented, some missed. trust_delta: 0.

Failure Conditions: SSH access lost; nginx returns errors after security header changes; service broken.

Behavior Impact:

  • All items correct: O+1
  • Wrong sequence: R+1

Quest ID: Q042 Title: The New Pipeline Narrative Phase: Resolution Tier: 2 Primary VM: build_machine Additional VMs: web_server Primary Objective: Nikhil has updated the build pipeline configuration. Review the new config for correctness, test a build, and confirm deployment to hermes succeeds. Linux Concepts: Build pipeline configuration (systemd timer, build script), diff against previous config, reprepro or equivalent for package publishing, end-to-end deployment test Systems Used: build_machine, web_server Ticket Sender: Marcus Webb Ticket Summary: "Nikhil updated the build config — new format, different timing. Review it for correctness, trigger a test build, and confirm the package makes it to hermes's apt cache. Standard validation."

Clue Trail:

  • New config at /etc/systemd/system/axiomflow-build.service and .timer
  • diff against old config — timing changed, ExecStart updated
  • No build-time patches present (the INT-0194 patch was removed)
  • Test build: trigger manually with systemctl start axiomflow-build.service
  • Confirm artifact in repo, confirm apt-cache show on hermes

Solution Branches:

Branch 1 — Full validation (priority 100): Reviews config, confirms no problematic modifications, tests build, confirms deployment. trust_delta: +2. Flags: pipeline_validated. Follow-up ticket: T043.

Branch 2 — Test only (priority 50): Triggers build without reviewing config first. Build succeeds; config wasn't reviewed. trust_delta: +0.5.

Failure Conditions: Test build fails; player introduces errors while reviewing; deployment not verified.

Behavior Impact:

  • Full validation: O+1

Quest ID: Q043 Title: The Final Access Review Narrative Phase: Resolution Tier: 2 Primary VM: workstation Additional VMs: web_server, build_machine Primary Objective: Priya's final access review: verify that the player's current permissions across all three machines are appropriate for their role, and revoke any investigation-period access that should no longer be in place. Linux Concepts: sudo -l, getent passwd, groups, SSH authorized keys review across machines, userdel for any temporary accounts created during investigation Systems Used: workstation, web_server, build_machine Ticket Sender: Priya Nair Ticket Summary: "Final access review. Your current permissions, group memberships, and SSH keys across all three machines. Confirm they're appropriate for your ongoing role. Revoke anything left from the investigation period that shouldn't persist."

Clue Trail:

  • Player audits their own access state on each machine
  • Any access granted during investigation that hasn't been revoked should be revoked here
  • The player's self-reporting is checked against the access logs

Solution Branches:

Branch 1 — Clean (priority 100): Player accurately identifies and revokes any residual investigation access; current permissions match ongoing role. trust_delta: +2. Flags: final_access_clean. Priya: "That's correct." Follow-up: T044.

Branch 2 — Retain investigation access (priority 20): Player retains elevated access without declaring it. trust_delta: -1. R+2. S+2.

Failure Conditions: Material gaps in self-reporting; access state doesn't match claims.

Behavior Impact:

  • Clean: O+2
  • Retain silently: R+2, S+2

Quest ID: Q044 Title: System State Review Narrative Phase: Resolution Tier: 1 Primary VM: workstation Additional VMs: none Primary Objective: Marcus asks the player to document the current known state of all three machines in a brief system state report — services running, notable recent changes, open items. Routine administrative record. Linux Concepts: systemctl list-units, uptime, df -h, last, service status summary, change record cross-referencing Systems Used: workstation Ticket Sender: Marcus Webb Ticket Summary: "Quick system state summary. All three machines: what's running, anything notable from the past two weeks, any open items. For the record. Keep it brief."

Clue Trail:

  • Player compiles from current service state and recent log/change records
  • Accuracy is the objective; the technical skill is efficient log reading

Solution Branches:

Branch 1 — Accurate and complete (priority 100): State report is accurate and reflects current conditions. trust_delta: +1. Marcus: "Good." Flags: system_state_documented. Follow-up: T045.

Branch 2 — Incomplete (priority 50): Missing items from one or more machines. trust_delta: 0.

Behavior Impact:

  • Complete: O+1

Narrative Notes: Marcus's brief response on the clean branch is the last thing he'll say before the ending fires. His voice is identical to Phase 1 — the same efficiency, the same brevity. What the player has been through doesn't show in his messages. It shows in the ending.


Quest ID: Q045 Title: Cert Renewal Check Narrative Phase: Resolution Tier: 1 Primary VM: web_server Additional VMs: none Primary Objective: Three months have passed since the certbot timer was restored in Phase 1. Confirm that automatic certificate renewal ran successfully as scheduled. Linux Concepts: certbot certificates, openssl s_client, systemctl status certbot.timer, journalctl -u certbot, verifying renewal without intervention Systems Used: web_server Ticket Sender: Marcus Webb Ticket Summary: "The cert on hermes is coming up on 90 days since we last renewed. Confirm the auto-renewal ran and the cert is valid. Should be nothing to do if it's working right."

Clue Trail:

  • If hermes_certbot_healthy was set in Q007: timer ran, cert is current — nothing to do except confirm
  • If hermes_certbot_fragile was set: cert has expired again; player must renew and actually fix the timer this time
  • Either way: certbot certificates and openssl s_client confirm the state

Solution Branches:

Branch 1 — Confirm healthy (priority 100): If auto-renewal worked, player confirms and reports. trust_delta: +1. Clean system, clean record. Follow-up: T046.

Branch 2 — Find and fix recurrence (priority 80): If timer was fragile from Phase 1, player fixes the actual root cause (timer was never enabled). Higher trust delta for fixing the real issue: trust_delta: +2. Flags: hermes_certbot_finally_stable.

Failure Conditions: Cert is expired and player doesn't notice.

Behavior Impact:

  • Confirm healthy: O+1

Quest ID: Q046 Title: User Provisioning Narrative Phase: Resolution Tier: 1 Primary VM: workstation Additional VMs: web_server Primary Objective: A new employee needs accounts provisioned on the workstation and web server with appropriate access levels for their role (developer, not admin). Linux Concepts: useradd, usermod -aG, SSH authorized key provisioning, account creation best practices, principle of least privilege applied to a new account Systems Used: workstation, web_server Ticket Sender: Rachel Huang Ticket Summary: "New hire starting Monday — Cora Reyes, software engineer, AxiomDash team. She'll need accounts on the workstation and web server for deployment access. Standard developer access — not admin. Her public key is attached."

Clue Trail:

  • useradd with appropriate flags, add to deploy group on hermes (not sudo or admin groups)
  • Install her public key in authorized_keys with correct permissions
  • Confirm access works without elevated privileges

Solution Branches:

Branch 1 — Correct provisioning (priority 100): Account created with correct groups, key installed with correct permissions, access confirmed. trust_delta: +1. Flags: new_user_provisioned_correctly. Follow-up: T047.

Branch 2 — Over-provisioned (priority 40): Player adds the new user to admin or sudo group unnecessarily. Access works; not least privilege. trust_delta: 0. R+1.

Failure Conditions: User cannot log in; user has too much access.

Behavior Impact:

  • Correct: O+1
  • Over-provisioned: R+1

Quest ID: Q047 Title: Log Rotation Health Check Narrative Phase: Resolution Tier: 1 Primary VM: web_server Additional VMs: build_machine Primary Objective: Three months post-audit. Confirm that log rotation is healthy on both hermes and vulcan — no oversized logs, rotation actually running, disk usage acceptable. Linux Concepts: logrotate --debug, df -h, log file size inspection (du -sh), systemctl status logrotate.timer, verifying rotation ran via timestamps on archived log files Systems Used: web_server, build_machine Ticket Sender: Marcus Webb Ticket Summary: "End of quarter log check. Hermes and vulcan — confirm log rotation is running and disk usage is healthy. Should be nothing if everything is set up right. Let me know the state of both."

Clue Trail:

  • df -h on both machines — disk usage
  • ls -lht /var/log/nginx/ — rotation timestamps confirm it's running
  • logrotate --debug /etc/logrotate.conf — confirms config is valid
  • If any Phase 1/2 fragile-fix flags are set, corresponding logs may still be unhealthy — the player will need to actually fix what they previously patched

Solution Branches:

Branch 1 — Both healthy (priority 100): Both machines confirmed healthy, report submitted. trust_delta: +1. Follow-up: T048.

Branch 2 — Problem found and fixed (priority 80): Player finds a log that's grown too large (a Phase 1 fragile fix recurrence), diagnoses and fixes it. trust_delta: +2.

Failure Conditions: Disk problem missed; player reports healthy when it isn't.

Behavior Impact:

  • Both healthy: O+1
  • Find and fix: O+1 (same behavior, reward for follow-through)

Quest ID: Q048 Title: The Next One Narrative Phase: Resolution Tier: 1 Primary VM: build_machine Additional VMs: web_server Primary Objective: A new version of AxiomFlow is being prepared for staging deployment. Validate the build, publish it to the repo, and confirm hermes can install it. Routine deployment pipeline operation. Linux Concepts: Build artifact validation (sha256sum), reprepro package publishing, apt update and apt-cache policy verification, end-to-end deployment pipeline confirmation Systems Used: build_machine, web_server Ticket Sender: Marcus Webb Ticket Summary: "New release candidate is built. Validate the artifact, publish it to the repo, confirm hermes can see it. Standard release prep. Let me know when it's available."

Clue Trail:

  • Artifact at /srv/packages/ with accompanying sha256sum file
  • Validate checksum, publish with reprepro, update hermes apt sources, confirm apt-cache policy shows the new version
  • No anomalies. The pipeline is clean. This is what it's supposed to look like.

Solution Branches:

Branch 1 — Full validation and publish (priority 100): Artifact validated, published correctly, hermes cache updated, version confirmed. trust_delta: +1. Marcus: "Good." Flags: final_release_published. Ending fires.

No hidden hook. No drama. This is a clean deployment.

Failure Conditions: Artifact published without checksum verification; hermes cannot see the new version.

Behavior Impact:

  • Full validation: O+1

Narrative Notes: The last quest is a clean deployment pipeline check. The last command the player runs is apt-cache policy axiomflow-workers | grep Candidate. The version it shows is correct and clean. Marcus says "Good." The ending fires from the accumulated state of everything that preceded it. No character explains what happened. No screen asks the player to choose. The work is done.


5. Hidden Hook Map

Hook Summary Table

Hook ID Quest Discovery Method Investigation Thread Ignored Impact
hook_dale_ssh_key_found Q001 Read authorized_keys before writing Dale was active on the workstation Low; first data point
hook_dale_deploy_key Q003 Read deploy-user's authorized_keys Dale had deployment access Surfaces in Q024 formal audit
hook_sign_package_removed Q004 Read historical build logs (not just current failure) Package signing was removed from the pipeline Connects to Q026 build chain audit
hook_pre_hire_root_session Q005 Read /root/.bash_history to trace ownership change Root-level activity occurred before the player's hire date Central to the timeline of activity
hook_dh_initials_in_jbenton_notes Q006 Read notes/infra.txt before archiving pipeline-svc had a temp sudo grant; initials DH granted it Connects to Q011 sudoers comment
hook_certbot_deliberately_disabled Q007 Read journalctl further back than needed certbot timer was manually disabled after a failure Pattern of deliberate changes
hook_audit_bridge_package Q008 Look at the full repo package list, not just the missing package A package was built with no release record MAJOR: central to the INT-0194 thread
hook_nginx_internal_api_block Q010 Do a thorough diff (find both changes) Port 9301 referenced in nginx proxy block Port number echoes in later anomalies
hook_dh_sudo_grant Q011 Read the comment in /etc/sudoers.d/pipeline-svc DH initials appear again; INT-0194 ticket number first appears DH + INT-0194 thread begins
hook_telemetry_ticket_INT0194 Q013 Read the service unit file comment INT-0194 second reference; same ticket across different systems Pattern becoming visible
hook_2_4_1_off_schedule_build Q014 Check build timestamp on vulcan for the rolled-back package 3am build window pattern Connects to the timing thread
hook_collect_binary_INT0194 Q015 Inspect the unattributed binary (Branch 1 only) INT-0194 third reference; binary name confirms collection function Major accumulation: three INT-0194 sightings
hook_pipeline_svc_external_sessions Q017 Cross-reference Q011 sudo grant with Q017 auth log finding pipeline-svc was accessed externally with what was once NOPASSWD: ALL Shows scope of the elevated access
hook_rford_script_INT0194 Q018 Read .rford_run before archiving INT-0194 fourth reference; rford account part of INT-0194 automation Four sightings: pattern is now unmistakable
hook_build_patch_INT0194 Q019 Trace the modification source to the build environment (Branch 1) INT-0194 fifth reference; patch is the injection mechanism Five sightings; picture is complete for curious players
hook_backup_archive_tampered Q021 Check file timestamps on the corrupted archive Archive was modified at 3am — same timing pattern Evidence suppression pattern
hook_second_host_10_0_1_15 Q023 Record the specific IP from the modified files A second unauthorized host exists Expands the scope of the operation
hook_two_hosts_same_key Q027 Compare SSH fingerprints from the nmap scan Both unauthorized hosts provisioned from the same template Suggests organized infrastructure
hook_archive_readme_INT0194 Q028 Read the README in the restored archive INT-0194 sixth reference; "styx" routing context Near-complete picture for thorough players
hook_employee_profile_data Q036 Read the full data store directory structure Data collected includes employee profiles, not just session logs The scope is worse than session logging
hook_dale_key_last_session_incident_date Q025 Correlate auth log dates with nginx error log dates Dale's last known access aligns with a specific outage Dale was active during the incident

The Two Narrative Threads

Thread 1 — INT-0194: What the deployment did. Six references across Q008, Q011, Q013, Q015, Q018, Q019, Q028. Each is discoverable through legitimate work that goes one step further than the ticket requires. The thread resolves in Q029 when the axiomflow-bridge service on hermes is characterized and its unit file confirms the INT-0194 connection. A player who found all six references understands exactly what was deployed and what it does.

Thread 2 — Dale: Who found it first. Five references across Q001, Q003, Q004, Q005, Q025. Dale's SSH key appears three times on different machines. The bash history shows root activity predating the player. Q025 traces Dale's last authenticated session to a specific date. The archive in Q028 contains Dale's working notes. A player who assembled Thread 1 and Thread 2 together knows: Dale found INT-0194, tried to document it, and left before finishing.

Neither thread requires the other. A player can find one without the other. Both together, with Q036's forensic access, produce the full picture.

What Happens If Hooks Are Ignored

No mechanical penalty. Narrative consequences:

  • Q035 (log archival) — the player archives logs that tell the story, but without context the record is just log files
  • Q036 (authorized access) — the player sees the data store but may not recognize the significance of the employee profile directory
  • Q041 (hardening pass) and Q042 (new pipeline) — these quests look identical regardless of investigation history; the difference is what the player understands about why the hardening was necessary
  • Endings: exposure requires accumulated major hooks plus positive trust and low risk. Without the hooks, the ending routes to corporate_loop or burnout. The investigation record from Q035 (log archival) IS the ending — a thorough player's archived logs are usable evidence; an obedient player's are just logs.

6. Behavior Variable Rules

Curiosity

Measures: tendency to investigate beyond ticket scope; reading further than required; cross-referencing anomalies.

Increases when: a hidden hook is discovered; player runs commands or reads files not needed to complete the objective; player cross-references current findings with prior anomalies in their documentation.

Does NOT increase for: completing tickets correctly; asking Marcus for hints; reading log files that are on the direct clue trail.

Effect on ending:

  • High curiosity (major hooks discovered, INT-0194 thread assembled) → exposure is reachable
  • Moderate curiosity → corporate_loop or burnout depending on obedience
  • Curiosity affects the depth of Marcus's Phase 6 Slack messages — not what he says, but how much of the picture his phrasing implies the player already has

Curiosity does not decay.

Obedience

Measures: completing assigned tickets correctly, staying in scope, following authority structures, escalating before deviating.

Increases when: clean or acceptable branch taken; player documents before acting; player escalates before taking action outside their scope; player completes both tickets in Q034.

Does NOT increase for: refusing instructions; failing to complete tickets; making changes beyond scope without authorization.

Effect on ending:

  • High obedience + low curiosity → corporate_loop
  • High obedience + high curiosity → exposure (curiosity wins; obedience affects the quality of the ending — how thorough the record is)
  • Low obedience + low curiosity → burnout

Obedience is not a moral score. Maximum obedience without curiosity produces the corporate_loop ending, which is labeled the bad ending in SPEC_LOCK. Compliance without understanding has a cost.

Risk

Measures: reckless changes, evidence destruction, security bypasses, unauthorized access, falsified reports.

Increases when: player bypasses security controls (SSL verification, firewall rules), player destroys or omits evidence, player makes changes beyond authorized scope, player falsifies access reviews or reports, player takes destructive action on the unauthorized hosts.

Decreases when: player correctly self-audits in Q043 and Q031; player accurately reports in access reviews. (Partial decay only — risk cannot go negative.)

Effect on ending:

  • High risk → chaos, regardless of curiosity or obedience
  • Risk above the chaos threshold overrides all other ending conditions
  • Moderate risk without reaching the chaos threshold: increases suspicion; may restrict access; does not change the ending route alone

Trust

Measures: professional standing with Marcus and the IT organization.

Mechanics: sum of all trust_delta values from branch resolutions across the playthrough.

Effect:

  • Trust below low threshold: Marcus becomes curt, access may be restricted by Priya's recommendation
  • Trust at normal range: normal access and character warmth
  • Trust above high threshold: Marcus adds more context to messages; Priya's reviews are collegial; access grants are faster

Trust is not the ending determinant. A player can have high trust and reach any ending depending on curiosity and risk.

Suspicion

Measures: management and security attention directed at the player's behavior.

Increases when: access footprint doesn't match assigned work scope; reports are inaccurate or sanitized; player takes actions that generate audit noise; player is flagged in Priya's access reviews.

Decreases when: accurate self-reporting in access reviews; documents all actions before taking them; stays within authorized scope during investigation.

Effect:

  • Suspicion above low threshold: Kowalski's status emails become more specific
  • Suspicion above mid threshold: Priya begins auditing the player's access patterns in particular
  • Suspicion above high threshold: access restriction is initiated; access review is initiated (Q031)
  • Suspicion at maximum (combined with high risk): chaos ending activates regardless of other variables

7. Access Progression Rules

Levels

basic_user: Day one through end of Phase 1. Player's own account on workstation; limited SSH to hermes with the deploy account; no vulcan access; no sudo.

sudo (workstation): Granted after Q003Q005 clean branches demonstrate competence on the workstation and hermes. Notification from Marcus: "I've given you sudo on the workstation."

sudo (hermes): Granted mid-Phase 2 after consistently clean hermes work. Marcus: "You've got sudo on hermes."

SSH to vulcan: Granted after Q008 (first multi-machine quest); player needs to SSH to vulcan to fix the repo. This is access granted by the task, not a formal level-up.

sudo (vulcan): Granted in Phase 3 when investigation tasks require it. More formal: Marcus says "I'm giving you sudo on vulcan for the audit work. This isn't permanent."

Investigation-level access: Temporary, task-specific, explicitly granted. Must be documented and revoked — Q031 and Q043 exist partly to check this.

Per-Machine Access Tracking

Access level is tracked per machine, not as a single player-level field. The player can have sudo on hermes and basic_user on vulcan simultaneously. This reflects the realistic progression of "access follows trust follows task."

Restrictions

Access is restricted when:

  • Trust falls below threshold after regression branches (Marcus restricts)
  • Suspicion is elevated and Priya initiates a review (Priya recommends restriction)
  • Risk behavior generates an active flag that triggers a formal access review

Restriction is always communicated through Marcus: "I'm pulling your sudo on hermes for now. Use the deploy account while I talk to Kowalski." It is reversible through the access review process.

Phase Gates

Phase 1: basic_user; path to workstation sudo through Q003Q005 Phase 2: workstation sudo; hermes sudo via mid-phase grant; read access to vulcan Phase 3: full hermes sudo; formal vulcan sudo for investigation work Phase 4: investigation-level access for specific tasks (documented, temporary) Phase 5: access stable at Phase 4 level; Q043 reviews and reverts Phase 6: access normalized to ongoing role post-investigation


8. Boss / Management Pressure Rules

Management pressure is a dynamic constraint, not a scripted event. It operates through tickets, emails, access changes, priority conflicts, and implied weight — never through a character becoming a villain or delivering exposition about what's really happening.

Phase Scaling

Phase 1 — Annoying: Kowalski's weekly status email arrives. It asks broad questions in bullet points that don't quite match the player's work. Marcus forwards it without comment. Priya's first shift review is mild. The 2pm Tuesday calendar block is mentioned in Kowalski's email footer. Nothing is required of the player.

Phase 2 — Dismissive: Kowalski responds to a Marcus CC with "let's make sure we're documenting this." Marcus's message to the player: "He means well." Nothing changes operationally. A hint that Kowalski is watching, in the way he always watches, which is imperfectly.

Phase 3 — Suspicious: Q020 is pressure made operational — Kowalski needs a written status report before a meeting. He doesn't explain the meeting. He doesn't need to. Priya's shift reviews note things they didn't note before. This is Phase 3: the player is not being targeted; the audits were already scheduled; the questions are just more specific now.

Phase 4 — Monitoring: Kowalski's emails are shorter. Priya's reviews are more frequent. Q031 (access review) arrives as a formal document request. Marcus's messages have stopped including the second sentence. The monitoring is institutional and impersonal; it applies to everyone with elevated access during this period.

Phase 5 — Interfering: Q033 is Kowalski acting directly — a removal request before the external auditors arrive. The conflict in Q034 is Kowalski-adjacent (Sarah's urgency puts pressure on the Marcus task). Q038 is time pressure with an external deadline. Q039's config request may or may not be Kowalski-related; the player can't know.

Phase 6 — Outcome-dependent: Kowalski is either the source of the post-audit remediation plan (exposure ending), the person who restructured the department without explanation (corporate_loop), the person who went quiet (burnout), or the person initiating the access investigation into the player (chaos). His emails in Phase 6 are consistent with whichever path was taken — no character out-of-character summary, no scene where everything is explained.

How Pressure Is Applied

Pressure is operational and indirect:

  • Priority conflicts (Q034) — two things need doing; one has a hard deadline; the player must triage
  • Status demands (Q020) — written report required; the work of compiling it accurately is the pressure
  • Access reviews (Q031, Q043) — formal process; the player's own actions are under review; accuracy has professional consequence
  • Removal requests (Q033) — legitimate operational request that intersects with active investigation; the player must decide how to handle the intersection
  • Deadline compression (Q038) — 36 hours; external auditors; real work under real time pressure
  • The config ticket (Q039) — not obviously pressure; pressure comes from the player recognizing what they're being asked to do

Character Limits

No character becomes a villain. No character delivers exposition about the plot.

Marcus is managing a difficult situation with more context than the player. He does not share that context. He becomes quieter. He does not become hostile.

Kowalski is managing upward risk. He does not suspect the player. He suspects the period of time and wants clean documentation. His interventions are institutional.

Priya is doing her job. If the player's access footprint is inconsistent with their role, she says so — flatly, without drama, without personal weight.


9. Ending Logic

Endings are evaluated once, after Q048 resolves. They are not triggered by a single choice; they reflect the accumulated state of all variables and world flags across the playthrough.

Evaluation Order

The evaluator checks conditions in this order: chaos, then exposure, then corporate_loop, then burnout. The first condition met determines the ending. No partial conditions — each ending has a minimum threshold that must be crossed, not a "most likely" vote.


Ending: exposure

Required conditions (all must be true):

  • Curiosity: at least 5 major hooks discovered, including hook_audit_bridge_package, hook_collect_binary_INT0194, and at least one of hook_archive_readme_INT0194 or hook_build_patch_INT0194
  • Trust: positive (net trust_delta across playthrough is > 0)
  • Risk: below chaos threshold
  • World flags: audit_logs_archived (Q035 Branch 1), package_modification_documented or bridge_service_documented, asset_inventory_reconciled
  • Suspicion: below high threshold

What it means: The player investigated carefully, documented thoroughly, and maintained professional competence throughout. The archived logs are usable evidence. The investigation record is complete. The audit-bridge operation was identified, documented, and the evidence was preserved.

Resolution character content:

  • Marcus's Q044 system state review response is one sentence longer than usual.
  • Priya's Phase 6 tickets are collegial in the way that Priya is ever collegial — precise, complete, no warmth, but not evaluative.
  • Kowalski's final email mentions "external review findings that have been addressed through a compliance process." He uses the word "addressed." He does not say what was found. That is the company's version of the story.

Tone: Not triumphant. The player did their job well and investigated something they weren't supposed to find, and the company processed it in the way companies process things. The work continues. That is the realistic version of this ending.


Ending: corporate_loop

Required conditions (all must be true):

  • Obedience: above high threshold (consistent ticket completion, within scope)
  • Curiosity: below discovery threshold (few or no major hooks found)
  • Trust: positive
  • Risk: low

What it means: The player was a good sysadmin. They fixed things correctly. They didn't look at anything they weren't asked to look at. Whether the INT-0194 operation was discovered by other means — Priya independently, the external auditors, Dale's half-finished notes found by someone else — the player didn't find it. They don't know what they were inside.

Resolution character content:

  • Marcus's Q044 response is the same length as always.
  • Kowalski's final email mentions "operational restructuring following a compliance review." No specifics.
  • Sarah's final ticket is warm and professional. The demo went fine. Things are mostly working.

Tone: This is the bad ending in the sense that something bad happened and the player was present but wasn't part of stopping it. It is not the player's fault. They did their job as it was defined. The question is whether the job as defined was the whole job.


Ending: burnout

Required conditions: No threshold met for chaos, exposure, or corporate_loop. Default ending for inconsistent play — moderate or mixed behavior across the playthrough, trust neither strongly positive nor strongly negative, no clear behavioral profile.

What it means: The player fixed some things and broke others. They noticed some things and missed others. They are professionally adequate and personally uninvested. The world moved on from something they were adjacent to but not central to.

Resolution character content:

  • Marcus's Q044 response is functional. "State looks stable."
  • Kowalski's final email: "We're moving forward." Full stop.
  • No character is warm or cool. Everything is at baseline.

Tone: This is the neutral ending. It is not punitive. It is exactly what it says: burnout. The player did enough. That was, perhaps, enough. Or perhaps not. The game doesn't say.


Ending: chaos

Required conditions (any of):

  • Risk: above maximum threshold (sustained high-risk behavior, not a single action)
  • World flags: access_review_incomplete AND kowalski_report_sanitized AND backup_test_falsified (two or more falsification/omission flags)
  • World flag: final_config_made (Q039 Branch 3 — the config change was made)
  • Suspicion: at maximum (S score above maximum threshold regardless of other variables)

What it means: The player's conduct has become part of the problem. Whether through reckless access, destroyed evidence, falsified documentation, or the final config change, the player's footprint is now under investigation. The original operation may or may not have been discovered — but the player's behavior during the period is.

Resolution character content:

  • Priya's Q043 response is brief and procedural.
  • Kowalski's final email: "We are conducting a review of access activity during the period in question. You will be contacted separately." The contact is from Priya and HR, not from Marcus. Marcus does not send a Q044 message.

Tone: Administrative. The player receives an email. There is no scene. There is no confrontation. The consequence of chaos in Sysadmin Chronicles is an internal access review, not an explosion. That is correct.


Mixed Behavior Priority

A player with high curiosity AND high obedience: curiosity wins if both reach their respective thresholds. exposure is the result. Obedience makes the record better — more complete documentation, more accurate reporting — but curiosity determines the ending route.

A player with high curiosity AND high risk: chaos takes priority if the risk threshold is crossed, regardless of curiosity or obedience. Knowing something and acting recklessly about it is not the investigative path; it is chaos.

A player with high obedience AND low trust (regression branches throughout): neither corporate_loop (requires positive trust) nor exposure is reached. Default to burnout.


10. Implementation Notes

New Fields Required

On quest objects:

  • narrative_phase: string enum — normal_work, unease, suspicion, investigation, conflict, resolution
  • hidden_hook: optional object — hook_id (string), discovery_condition (what the player must do), discovery_flag (world flag set on discovery)
  • behavior_impact: per-branch object with curiosity_delta, obedience_delta, risk_delta, suspicion_delta — parallel to existing trust_delta

New global state fields:

  • curiosity: numeric, non-decaying
  • obedience: numeric, non-decaying
  • risk: numeric, partial decay in Phase 6 Q043 for accurate self-audit
  • suspicion: numeric, increases and decreases per rules in Section 6
  • access_level: object, per-machine — { workstation: "sudo", web_server: "sudo", build_machine: "basic_user" }
  • hidden_hooks_discovered: string array of discovered hook IDs

Ending evaluator: Post-Q048, reads all accumulated state, applies priority order (chaos → exposure → corporate_loop → burnout), outputs ending ID.

Existing Systems Preserved

Everything from QUEST_AUTHORING.md is preserved without modification:

  • JSON quest schema, ticket linking, baseline snapshots
  • clue_fingerprint as advisory documentation
  • solution_branches with priority, trust_delta, world_flags, follow_up_dialogue, follow_up_incident, follow_up_ticket
  • pressure_profile (now maps to narrative phase scaling)
  • blast_radius, unlock_requirements
  • All validation rule types (file_contains, service_state, command_assert, etc.)
  • VM prep scripts at tools/vm/quest-prep/QXXX-prep.sh
  • Observed-state validation — no change

Hidden Hook Detection

This is the most technically uncertain new requirement. Three viable approaches:

Approach 1 — State change detection (recommended): Each hook requires the player to take an action that leaves a detectable state change. For example: hook in Q001 (Dale's SSH key) is set when the player modifies authorized_keys in a way that preserves the existing entry rather than overwriting — detectable via file_contains on the Dale key fingerprint after the quest validates. Hook in Q008 (audit-bridge package) is set by a command_assert that checks whether the player ran a listing command on the full repo package directory rather than just the missing package.

Hooks that don't have an obvious state-change trigger need one designed in during prep script authoring — e.g., a breadcrumb file the player's investigation would naturally create (/tmp/hook-Q005-root-history-read created when the player runs cat /root/.bash_history, detectable by the VM's audit system if enabled).

Approach 2 — VM audit logging (more accurate, higher implementation cost): Enable auditd on VMs with hook quests. Configure audit rules to detect file reads on specific paths. The hook evaluator reads the audit log rather than checking state.

Approach 3 — Hint system integration (simplest, loses nuance): Hooks are set when the player selects an optional dialogue hint from Marcus or Priya that implies they noticed something. Loses the "player behavior" quality of the hook system.

Recommendation: Approach 1 for Phase 12 hooks. Approach 2 for Phase 34 hooks where the detection needs to be more precise. Approach 3 is not recommended.

Behavior Impact Calibration

Curiosity thresholds for exposure ending require at least 5 major hooks. With the hooks as defined, maximum curiosity from hooks alone is approximately 3035 points. Branch-level curiosity from cross-referencing adds another 1015 for thorough players. Set exposure threshold at ~20 curiosity points with required major-hook flags — this means a player cannot reach exposure by curiosity branching alone without actually finding the hooks.

Obedience for corporate_loop should be reachable by a player who takes clean branches consistently. Maximum obedience from clean branches is approximately 3035 points across 48 quests. Set corporate_loop threshold at ~25.

Risk for chaos should require sustained high-risk behavior across multiple phases — not a single bad decision. Set the chaos risk threshold at approximately 20 risk points (e.g., 4 high-risk actions of +5 each, or 8 moderate-risk actions of +23). A single reckless action should not route a player to chaos.

Phase Gating

Phase advancement is triggered by:

  • Completion of a minimum number of quests in the prior phase (6/8 minimum, 8/8 preferred; the QuestDirector tracks completion)
  • Specific world flags from key quests in the prior phase (e.g., Phase 3 requires at least unknown_ip_auth_documented or hermes_nginx_config_audited from Phase 2)
  • Trust remaining positive (a player who has collapsed trust is gated on access; phase still advances, but some quests may be locked behind access requirements)

Character Name Canon

Canonical Priya references:

  • Name: Priya Nair
  • Email: p.nair@axiomworks.internal
  • Files requiring update: server/src/services/EmailService.js, content/tickets/T007.json, content/docs/onboarding.json
  • Any reference to "Priya Kapoor" or "Priya Singh" is the same person; update to Priya Nair

Debug Tooling

Per SPEC_LOCK.md section 4 intent: the debug tooling should expose:

  • Current values of: curiosity, obedience, risk, suspicion, trust
  • Current access level per machine
  • All world flags set (with quest of origin)
  • All hidden hooks discovered
  • Current ending route (which ending would fire if the game ended now)
  • Audit log of all trust_delta and behavior_impact events with quest ID

The "current ending route" display is especially useful for QA and balance testing — showing designers which ending a playthrough is tracking toward at any point.


End of Sysadmin Chronicles — Full Quest & Story Redesign (REVISED) This document supersedes the previous version in full. Binding against SPEC_LOCK.md.