# Quest Authoring
Use this guide when adding new JSON quests under `content/quests/`.

Quest files describe observed VM state. They are not command scripts and they
should model real Linux behavior, not puzzle logic detached from the system.

For complete worked files, see [`docs/AUTHORING_EXAMPLES.md`](/home/aaron/Programming/sysadmin-chronicles/docs/AUTHORING_EXAMPLES.md).

## Quest JSON Schema

### Root Fields

| Field | Type | Description |
| --- | --- | --- |
| `id` | string | Quest ID, for example `Q005`. |
| `title` | string | Player-facing quest title. |
| `tier` | int | Difficulty tier, usually `1`, `2`, or `3`. |
| `primary_vm` | string | Main VM for the quest. Current authored values are `workstation`, `web_server`, and `build_machine`. |
| `required_vms` | string[] | Every VM the quest touches. Include all VMs used in clues, validation, or prep. |
| `ticket_id` | string | Links to `content/tickets/<id>.json`. |
| `baseline_snapshot` | string | Snapshot name that the prep script should restore or build from. |
| `summary` | string | Short internal scenario summary. |
| `clue_fingerprint` | object | Advisory description of the evidence seeded into the baseline. |
| `objectives` | object[] | Objective list shown to the player and used for progress checks. |
| `solution_branches` | object[] | Branches the validator can resolve to. Higher-priority valid branches win. |
| `pressure_profile` | string or null | Optional pressure/escalation profile name. |
| `blast_radius` | string[] | Incident IDs that this quest can affect or trigger. |
| `unlock_requirements` | string[] | Prerequisites such as `world_flag:` entries. |
| `tags` | string[] | Search and classification tags. |
| `internal_notes` | string | Author-only notes for reviewers. |
| `_note` | string | Optional author-only comment. Existing content uses this at root and inside nested objects. |

### `clue_fingerprint`

`clue_fingerprint` is advisory. It documents what evidence the baseline already
contains so content reviewers can confirm the clue trail is real.

| Field | Type | Description |
| --- | --- | --- |
| `description` | string | Plain-language explanation of the clue trail. |
| `evidence` | object[] | Evidence items that point to the issue. Use the same general shape as the relevant validation type. |

Common evidence shapes in existing content:

- File and log evidence usually includes `type`, `vm`, `path`, and `contains`
- State evidence may include `type`, `vm`, `service`, `state`, or `enabled`
- Ownership evidence may include `type`, `vm`, `path`, `user`, and `group`
- Scalar evidence may include `threshold_percent`, `port`, or `command` depending on the clue

Existing clue fingerprints also use clue-only labels such as `service_state_is`,
`service_enabled_is`, and `expected_user`. Treat those as descriptive baseline
metadata, not runtime validation names.

## Objectives

| Field | Type | Description |
| --- | --- | --- |
| `id` | string | Stable objective ID. |
| `description` | string | Player-facing objective text. |
| `check_mode` | string | `passive` or `explicit`. Use `passive` by default. |
| `validation` | object | Rule object evaluated by `ValidationService`. |

Objectives are for feedback and progress tracking. They do not choose the
winning solution branch.

## Solution Branches

| Field | Type | Description |
| --- | --- | --- |
| `id` | string | Stable branch ID. |
| `label` | string | Optional short label used in content review and debugging. |
| `priority` | int | Higher wins when multiple branches validate. Priorities must be unique per quest. |
| `validation` | object | Rule object evaluated for this branch. |
| `trust_delta` | float | Trust change applied when this branch wins. Positive for better fixes, negative for risky or damaging ones. |
| `follow_up_dialogue` | string | Dialogue ID to trigger after resolution. |
| `follow_up_incident` | string | Incident ID to trigger after resolution, if the branch intentionally leaves a latent problem. |
| `follow_up_ticket` | string | Next ticket ID in the quest chain. |
| `world_flags` | string[] | Flags to set when the branch wins. |
| `_note` | string | Optional author-only comment. |

### Branch Authoring Guide

- Use branch priority to rank the quality of valid solutions.
- Put the clean, robust fix at the highest priority.
- Use lower priorities for brittle workarounds, partial fixes, or outcomes that
  leave future risk behind.
- Use `trust_delta` to reflect the quality of the fix, not just whether the
  quest technically completed.
- Use `follow_up_ticket` when a winning branch should advance the story to the
  next ticket.
- Use `follow_up_incident` only when that branch intentionally seeds a later
  recurrence or operational cost.
- Keep priorities unique. If two branches can both pass with the same priority,
  the content should be rewritten.

## Validation Rule Types

Design notes sometimes use shorthand names like `file_mode_matches` or
`command_exits_zero`. In authored JSON, use the runtime rule names below.

- `file_mode_matches` -> `file_mode`
- `file_owner_matches` -> `file_owner`
- `service_state_matches` -> `service_state`
- `service_is_enabled` -> `service_enabled`
- `process_is_running` -> `process_running`
- `port_is_listening` -> `port_listening`
- `package_is_installed` -> `package_installed`
- `command_exits_zero` -> `command_assert`

| JSON type | Fields | Notes |
| --- | --- | --- |
| `file_exists` | `vm`, `path` | Passes when the file exists. |
| `file_absent` | `vm`, `path` | Inverse of `file_exists`. |
| `directory_exists` | `vm`, `path` | Passes when the directory exists. |
| `file_contains` | `vm`, `path`, `contains` | Passes when the file contains the given text. |
| `log_contains` | `vm`, `path`, `contains` | Alias for `file_contains` used by some clue fingerprints. |
| `file_mode` | `vm`, `path`, `mode` | Checks the exact file mode string, such as `0600`. |
| `file_owner` | `vm`, `path`, `user`, `group` | Checks exact ownership. |
| `file_owner_is_not` | `vm`, `path`, `user`, `group` | Negated ownership check. |
| `service_state` | `vm`, `service`, `state` | Checks the active state, such as `active`, `inactive`, or `failed`. |
| `service_enabled` | `vm`, `service`, `enabled` | Checks boot-time enablement. The `enabled` field defaults to `true`. |
| `process_running` | `vm`, `process` | Passes when the named process is running. |
| `process_user` | `vm`, `process`, `user` | Passes when the named process runs as the given user. |
| `port_listening` | `vm`, `port`, `listening` | Checks whether a port is listening. The `listening` field defaults to `true`. |
| `package_installed` | `vm`, `package` | Passes when the package is installed. |
| `mount_present` | `vm`, `path` | Passes when the mount is present. |
| `disk_usage_below` | `vm`, `path`, `threshold_percent` | Passes when disk usage is below the threshold. `percent` is accepted in older content. |
| `disk_usage_above` | `vm`, `path`, `threshold_percent` | Passes when disk usage is above the threshold. `percent` is accepted in older content. |
| `command_assert` | `vm`, `command` | Fallback rule for command-based checks. Use sparingly. |
| `and` | `rules` | All sub-rules must pass. |
| `or` | `rules` | Any sub-rule may pass. |
| `not` | `rule` | Inverts the inner rule. |

### Validation Notes

- Prefer state-based checks over command checks.
- Use `and` and `or` to model genuinely alternative states, not to hide weak
  authoring.
- `command_assert` is a fallback. If a real state rule exists, use that first.
- Some older quest files include extra fields such as `protocol` or
  `installed`. The loader ignores unknown keys, but new quests should stick to
  the documented fields above.

## Prep Script Requirements

Each quest needs a prep script at `tools/vm/quest-prep/QXXX-prep.sh`.

- The script must be idempotent.
- It must set up the starting VM state for the quest.
- It runs at image build time, not when the player starts the quest.
- It should install required packages only from local or pre-baked sources.
- It may create logs, users, groups, permissions, or broken config files that
  form the scenario.
- It must not rely on a live player session.

When a quest continues an existing chain, the prep script should restore the
prior clean snapshot first, then apply the new scenario changes, and finally
take the next baseline snapshot.

## VM Provisioning Pipeline

A new quest requires a VM baseline before it can be played. The full authoring
workflow from scratch to playable quest:

### 1. Write the prep script

Create `tools/vm/quest-prep/QXXX-prep.sh`. Requirements:

- Must be idempotent — safe to run twice on the same domain.
- Accepts the domain name as $1 and an optional `--dry-run` flag as $2.
- Must not prompt for input or depend on internet access.
- Reads `tools/vm/lib/common.sh` for shared helpers (`run`, `step`, `ok`, etc.).

Typical operations: break a config file, chown a directory, remove a logrotate
config, add a cron entry, delete a key. Nothing that would be undone by the
player before the quest starts.

### 2. Register the quest in seed-vms.sh

Open `tools/setup/seed-vms.sh` and:

1. Add a `require_file` check near the top (`STEP 1 — Pre-flight checks`):
   ```bash
   require_file "$QUEST_PREP/QXXX-prep.sh" "QXXX prep script"
   ```

2. Add a `run_prep_and_snapshot` call in `STEP 4 — Run quest-prep scripts`:
   ```bash
   run_prep_and_snapshot "QXXX" "sc-<vm-domain>" "baseline.<snapshot-name>"
   ```
   The snapshot name must match the quest's `baseline_snapshot` field.

### 3. Baseline snapshot chain

Each VM has its own chain. Only the CLEAN branch resolution of a quest is used
as the baseline for the next quest. Brittle-branch resolutions are never
snapshotted.

| VM | Snapshot chain |
|----|----------------|
| `sc-workstation` | `baseline.day-one` (Q001 only) |
| `sc-web-server` | `baseline.clean` → `baseline.post-q002` → `baseline.post-q003` → `baseline.post-q004` |
| `sc-build-machine` | `baseline.clean` → `baseline.post-q006` |

A prep script that builds on a prior quest must revert to the prior snapshot
before applying its changes.

### 4. VM baseline package set

Each authored VM has a guaranteed minimum set of packages that players can rely on
during gameplay. New quests must not assume packages outside this set unless the
quest prep script installs them.

| VM | OS | Guaranteed packages |
|----|----|---------------------|
| `sc-workstation` (ares) | Ubuntu 24.04 | `qemu-guest-agent`, `openssh-server`, `sudo`, `bash-completion`, `hostname`, `ssh` client (system) |
| `sc-web-server` (hermes) | Debian 12 | `qemu-guest-agent`, `openssh-server`, `sudo`, `nginx`, `logrotate`, `rsync`, `curl`, `hostname`, `ssh` client |
| `sc-build-machine` (vulcan) | Arch Linux | `qemu-guest-agent`, `openssh`, `sudo`, `base-devel`, `archlinux-keyring`, `inetutils` (provides `hostname`, `ping`), `ssh` client |

`hostname`, `whoami`, `id`, `ls`, `cat`, `echo`, `ps`, `df`, `du`, `free`,
`systemctl`, `journalctl` are available on all VMs.

The in-game terminal auto-adds `-C` to bare `ls` calls so column output renders
correctly. If a quest step requires `ls -l` or another explicit format, pass it
explicitly — the auto-`-C` injection only fires when no layout flag is present.

### 5. Run the pipeline

```bash
# Dry run first — shows what would execute without touching VMs
bash tools/setup/seed-vms.sh --dry-run

# Full build — requires libvirt and all three sc-* domains to exist
bash tools/setup/seed-vms.sh

# Prep + snapshot only (skip the image build step)
bash tools/setup/seed-vms.sh --skip-build

# Single VM only
bash tools/setup/seed-vms.sh --vm web_server
```

### 5. Validate

After seed-vms.sh completes:

```bash
# Check content integrity (including baseline_snapshot field)
node tools/content/validate-content.js

# Verify snapshots exist on each domain
virsh snapshot-list sc-web-server
virsh snapshot-list sc-build-machine
```

## Multi-Solution Quest Example

```json
{
  "id": "Q099",
  "title": "Cron Runs as Root",
  "tier": 2,
  "primary_vm": "web_server",
  "required_vms": ["web_server"],
  "ticket_id": "T099",
  "baseline_snapshot": "baseline.clean",
  "_note": "Minimal example: the nightly cron job should run as www-data, not root.",
  "summary": "A site-sync cron entry was copied from a root shell. It still runs, but it now leaves root-owned cache files behind.",
  "clue_fingerprint": {
    "description": "The cron file exists, but it names root as the executor. The cache directory is already polluted with root-owned files.",
    "evidence": [
      { "type": "file_contains", "vm": "web_server", "path": "/etc/cron.d/site-sync", "contains": "root /opt/site-sync/bin/sync-cache.sh" },
      { "type": "file_owner_is_not", "vm": "web_server", "path": "/var/www/axiomworks/cache", "user": "www-data" }
    ]
  },
  "objectives": [
    {
      "id": "sync-safe",
      "description": "The cron job runs as www-data and the scheduler is active",
      "check_mode": "passive",
      "validation": {
        "type": "and",
        "rules": [
          { "type": "file_contains", "vm": "web_server", "path": "/etc/cron.d/site-sync", "contains": "www-data /opt/site-sync/bin/sync-cache.sh" },
          {
            "type": "or",
            "rules": [
              { "type": "command_assert", "vm": "web_server", "command": "systemctl is-active --quiet cron" },
              { "type": "command_assert", "vm": "web_server", "command": "pgrep -x cron >/dev/null" }
            ]
          }
        ]
      }
    }
  ],
  "solution_branches": [
    {
      "id": "correct-cron",
      "label": "Correct Cron User",
      "priority": 100,
      "validation": {
        "type": "and",
        "rules": [
          { "type": "file_contains", "vm": "web_server", "path": "/etc/cron.d/site-sync", "contains": "www-data /opt/site-sync/bin/sync-cache.sh" },
          {
            "type": "or",
            "rules": [
              { "type": "command_assert", "vm": "web_server", "command": "systemctl is-active --quiet cron" },
              { "type": "command_assert", "vm": "web_server", "command": "pgrep -x cron >/dev/null" }
            ]
          }
        ]
      },
      "trust_delta": 2,
      "world_flags": ["site_sync_healthy"],
      "follow_up_dialogue": "marcus-Q099-complete-clean",
      "follow_up_ticket": "T100",
      "_note": "Preferred fix: keep the job and run it with the correct user."
    },
    {
      "id": "disabled-cron",
      "label": "Brittle Disable",
      "priority": 40,
      "validation": {
        "type": "command_assert",
        "vm": "web_server",
        "command": "test ! -f /etc/cron.d/site-sync"
      },
      "trust_delta": -1,
      "world_flags": ["site_sync_brittle"],
      "follow_up_dialogue": "marcus-Q099-complete-brittle",
      "_note": "The job was deleted instead of repaired. It stops the symptom, but it is not a durable fix."
    }
  ],
  "pressure_profile": null,
  "blast_radius": [],
  "unlock_requirements": ["world_flag:player_ssh_configured"],
  "tags": ["cron", "permissions", "web_server"],
  "internal_notes": "Example only."
}
```

## Multi-VM Quest Example

```json
{
  "id": "Q098",
  "title": "Build Sync Writes Bad Ownership",
  "tier": 2,
  "primary_vm": "build_machine",
  "required_vms": ["workstation", "build_machine", "web_server"],
  "ticket_id": "T098",
  "baseline_snapshot": "baseline.post-q006",
  "_note": "The build machine is pushing release files to the web server, but the ownership is wrong and the deploy helper is still running.",
  "summary": "A deployment helper on the build machine is writing release files to the web server with root ownership. The helper must be stopped and the output repaired so the web server can manage the files again.",
  "clue_fingerprint": {
    "description": "The deploy helper is still running on build_machine. On web_server, the release artifact is owned by root instead of www-data.",
    "evidence": [
      { "type": "file_contains", "vm": "build_machine", "path": "/opt/deploy/bin/push-release.sh", "contains": "rsync -a --chown=root:root" },
      { "type": "process_running", "vm": "build_machine", "process": "deploy-sync" },
      { "type": "file_owner_is_not", "vm": "web_server", "path": "/var/www/axiomworks/releases/current/index.html", "user": "www-data", "group": "www-data" }
    ]
  },
  "objectives": [
    {
      "id": "release-owned-correctly",
      "description": "The web release file is owned by www-data and the deploy helper is stopped",
      "check_mode": "passive",
      "validation": {
        "type": "and",
        "rules": [
          { "type": "file_owner", "vm": "web_server", "path": "/var/www/axiomworks/releases/current/index.html", "user": "www-data", "group": "www-data" },
          { "type": "not", "rule": { "type": "process_running", "vm": "build_machine", "process": "deploy-sync" } }
        ]
      }
    }
  ],
  "solution_branches": [
    {
      "id": "deploy-stopped-owner-fixed",
      "label": "Stop Helper and Fix Ownership",
      "priority": 100,
      "validation": {
        "type": "and",
        "rules": [
          { "type": "file_owner", "vm": "web_server", "path": "/var/www/axiomworks/releases/current/index.html", "user": "www-data", "group": "www-data" },
          { "type": "not", "rule": { "type": "process_running", "vm": "build_machine", "process": "deploy-sync" } }
        ]
      },
      "trust_delta": 2,
      "world_flags": ["release_permissions_fixed"],
      "follow_up_dialogue": "marcus-Q098-complete-clean",
      "_note": "This branch validates both VMs: the release file is fixed on web_server and the helper is no longer running on build_machine."
    }
  ],
  "pressure_profile": null,
  "blast_radius": [],
  "unlock_requirements": ["world_flag:player_ssh_configured"],
  "tags": ["deploy", "permissions", "multi-vm", "build_machine", "web_server"],
  "internal_notes": "Example only."
}
```

## Quest Chain Authoring

Use `follow_up_ticket` to chain the campaign in sequence. The winning branch
emits the next ticket, and `QuestDirector` activates the next quest from that
ticket.

| Quest | Clean branch `follow_up_ticket` |
| --- | --- |
| `Q001` | `T002` |
| `Q002` | `T003` |
| `Q003` | `T004` |
| `Q004` | `T005` |

Keep the chain on the clean, high-priority branch. If a brittle branch should
continue the story differently, use its own `follow_up_ticket` or
`follow_up_incident` intentionally.