chore: bootstrap lean sysadmin-chronicles repo
Import the runnable game code, content, docs, scripts, and repo guidance while leaving local agent state, dependency installs, build output, and backup copies out of the published tree.
This commit is contained in:
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"id": "marcus-Q001",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q001",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 1,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "The onboarding doc has your key and the path you need. It's in /etc/axiom/onboarding on ares once you're in. Or ask me and I'll paste it here. Either way."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "Start in your home directory. You need a .ssh folder if it does not exist yet. Then authorized_keys inside it."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "The permissions matter more than people expect. SSH will silently refuse a key if the file or the directory is group-writable. 700 on the folder, 600 on the file."
|
||||
},
|
||||
{
|
||||
"stage": "hint_3",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "mkdir -p ~/.ssh && chmod 700 ~/.ssh. Then echo your public key into ~/.ssh/authorized_keys and chmod 600 that file. That is the whole thing."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:player_ssh_configured",
|
||||
"body": "Good. You're in. I'll send you the next thing shortly. The coffee machine on this floor is broken, heads up."
|
||||
},
|
||||
{
|
||||
"stage": "complete-permissive",
|
||||
"trigger": "world_flag:player_loose_permissions",
|
||||
"body": "Key's in there. One thing though — check the permissions on that file. SSH is picky about it. Might not bite you today but it will eventually."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"id": "marcus-Q002",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q002",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 2,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "Sarah's ticket is real. The site's down. Hermes is the web server — you can SSH from ares. Have a look at what nginx is doing."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "If nginx won't start, it usually tells you why. Try nginx -t before you touch anything else."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "Whatever the error says, it will include a file path and a line number. Go look at that exact spot."
|
||||
},
|
||||
{
|
||||
"stage": "hint_3",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "Config syntax errors are usually small. Missing semicolons, wrong brackets, typos on directive names. Read it carefully."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:nginx_stable",
|
||||
"body": "Good. Sarah will see it come back up. Worth checking systemctl is-enabled nginx while you're there — if someone broke the config they may have been poking around other things too."
|
||||
},
|
||||
{
|
||||
"stage": "complete-not-enabled",
|
||||
"trigger": "world_flag:nginx_unstable",
|
||||
"body": "It's running. But if that machine reboots for any reason nginx won't come back up automatically. You might want to fix that before Sarah notices."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,44 @@
|
||||
{
|
||||
"id": "marcus-Q003",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q003",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 3,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "Dave's report is vague but something is wrong on hermes. I'd start by looking at resource utilization before assuming it's the application."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "Check disk. df -h is your friend. Web servers write logs constantly and nobody always remembers to set up rotation."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "If you find a big file, don't just delete it — figure out why it got that big. Is logrotate configured for nginx? Check /etc/logrotate.d/."
|
||||
},
|
||||
{
|
||||
"stage": "hint_3",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "The default nginx logrotate config is in the nginx package. dpkg -L nginx | grep logrotate might give you somewhere to start. Or just write a correct one — it's about ten lines."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:hermes_logrotate_healthy",
|
||||
"body": "Nice. That was the right call — clearing the space and fixing what caused it. Logrotate problems have a way of coming back if you don't actually fix them."
|
||||
},
|
||||
{
|
||||
"stage": "complete-norotate",
|
||||
"trigger": "world_flag:hermes_log_pressure_pending",
|
||||
"body": "Space is back. But if you didn't fix the rotation config that log is going to grow again. Something to keep an eye on."
|
||||
},
|
||||
{
|
||||
"stage": "complete-down",
|
||||
"trigger": "world_flag:hermes_web_down",
|
||||
"body": "nginx is inactive now? That's worse than the disk problem. Restarting it without fixing why it died isn't a fix, it's a delay. Check what happened before you start it again."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"id": "marcus-Q004",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q004",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 4,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "Sarah's deploy thing is interesting. If the script said it ran fine but the files didn't change, something is blocking the write. I'd look at ownership before I touch the script."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "ls -la on the web root. If those files are owned by root and the deploy runs as www-data, that's your problem."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "chown. And use -R unless you enjoy doing it twice."
|
||||
},
|
||||
{
|
||||
"stage": "hint_3",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "chown -R www-data:www-data /var/www/axiomworks. Then you can trigger the deploy service to confirm it takes."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:hermes_deploy_healthy",
|
||||
"body": "Good. Someone ran that deploy as root at some point. Worth figuring out who has sudo on hermes and whether they should."
|
||||
},
|
||||
{
|
||||
"stage": "complete-partial",
|
||||
"trigger": "world_flag:hermes_deploy_partial",
|
||||
"body": "Ownership is fixed on the directory but I'm not sure the files inside are correct. Sarah might still hit issues on the next deploy."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,44 @@
|
||||
{
|
||||
"id": "marcus-Q005",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q005",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 5,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "Dave's disk alert is on /var/backups this time, not /var/log. That's a different problem. Something to do with the backup job probably."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "Look at what owns the files in that directory. If it's root and the backup agent is supposed to manage them, someone ran something as the wrong user."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "Check /etc/cron.d/. Jobs in there can specify a user on the line. If there's no user field it defaults to root."
|
||||
},
|
||||
{
|
||||
"stage": "hint_3",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "The line format is: schedule user command. If yours is just: schedule command — that's the problem. Add the user field."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:hermes_backup_healthy",
|
||||
"body": "Good catch on the ownership cleanup too. A lot of people would have just fixed the cron line and left the old root-owned files sitting there."
|
||||
},
|
||||
{
|
||||
"stage": "complete-partial",
|
||||
"trigger": "world_flag:hermes_backup_partial",
|
||||
"body": "Cron's correct now. The old files are still owned by root though — the retention script won't be able to clean them up. Worth sorting that out before the disk fills again."
|
||||
},
|
||||
{
|
||||
"stage": "complete-wrong",
|
||||
"trigger": "world_flag:hermes_backup_root_running",
|
||||
"body": "Disk's clear. But what was actually running that job? If root is still running it that directory is going to fill up again."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"id": "marcus-Q006",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q006",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 6,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "Vulcan is Arch. Different from what you've been working on. Package manager is pacman, not apt. Same concepts, different commands. Signature errors usually mean keyring or clock problems."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "Check what time that machine thinks it is. timedatectl. If NTP isn't running the clock drifts and GPG signatures start looking like they're from the future."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "systemctl enable --now systemd-timesyncd. Then wait a moment for sync, and try pacman again. You may also need to refresh the keyring."
|
||||
},
|
||||
{
|
||||
"stage": "hint_3",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "pacman -S archlinux-keyring to refresh. Then pacman -Syu should work."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:vulcan_builds_healthy",
|
||||
"body": "Clock drift breaking pacman is one of those things that seems unrelated until you've seen it twice. You'll spot it immediately next time."
|
||||
},
|
||||
{
|
||||
"stage": "complete-fragile",
|
||||
"trigger": "world_flag:vulcan_ntp_fragile",
|
||||
"body": "Timesyncd is running and builds work. It's not enabled at boot though — worth fixing that so the next reboot doesn't put you back here."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"id": "marcus-Q007",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q007",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 7,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "Priya can't get into hermes. Something in the SSH config changed. Figure out what it was and restore her access without creating a new problem."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "sshd_config is where SSH restrictions live. Look for AllowUsers or AllowGroups. One of those is either missing her or was set wrong."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "AllowGroups is the right pattern — it scales. AllowUsers is a list you have to maintain manually. Either works, but think about which one you want to be maintaining in six months."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:hermes_ssh_hardened_correct",
|
||||
"body": "AllowGroups with web-admin. That's the correct way to do it. Users in the group get access, users not in the group don't. No list to maintain."
|
||||
},
|
||||
{
|
||||
"stage": "complete-fragile",
|
||||
"trigger": "world_flag:hermes_ssh_allowusers_fragile",
|
||||
"body": "Priya's back in. That AllowUsers list is going to need a line added every time someone new needs access. Worth switching to group-based before it becomes a problem."
|
||||
},
|
||||
{
|
||||
"stage": "complete-regression",
|
||||
"trigger": "world_flag:hermes_ssh_unrestricted",
|
||||
"body": "Access is restored but the hardening is gone. That restriction was there for a reason — SSH open to everyone on hermes isn't a great position to be in."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,44 @@
|
||||
{
|
||||
"id": "marcus-Q008",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q008",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 8,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "App's down after an update. First question is always: what changed. Sarah says a new package version came in. I'd start by looking at whether the binary actually runs."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "journalctl -u axiomworks-app. If it's failing immediately, it's probably the binary itself, not config. Try running it directly and see what the error is."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "If the binary is bad, figure out where the package came from. pacman -Qi axiomworks-app will show you the repo. If it's coming from vulcan, go look at what they built."
|
||||
},
|
||||
{
|
||||
"stage": "hint_3",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "You can roll back with pacman -U /var/cache/pacman/pkg/ if the old package is still cached. Or go to the repo on vulcan and look for an older version."
|
||||
},
|
||||
{
|
||||
"stage": "complete-rollback",
|
||||
"trigger": "world_flag:hermes_app_pinned_2-1-0",
|
||||
"body": "Solid. Pinning the version means the next update cycle won't pull the broken one back in. Someone needs to fix that build on vulcan at some point though."
|
||||
},
|
||||
{
|
||||
"stage": "complete-unpinned",
|
||||
"trigger": "world_flag:hermes_app_running",
|
||||
"body": "App's running again. Is the version pinned? If not the next pacman -Syu is going to pull 2.1.1 back in and you'll be back here."
|
||||
},
|
||||
{
|
||||
"stage": "complete-rebuild",
|
||||
"trigger": "world_flag:vulcan_build_fixed",
|
||||
"body": "You fixed it at the source. That's the right call if you have time for it. What was wrong with the build?"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,19 @@
|
||||
{
|
||||
"id": "marcus-day-one",
|
||||
"character": "marcus",
|
||||
"quest_id": "",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 0,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "welcome",
|
||||
"trigger": "immediate",
|
||||
"body": "Welcome. You're replacing Dale. Nobody will tell you what Dale did because it's complicated. Your badge number is pending — Dave from Finance has your temp credentials. He's on three today."
|
||||
},
|
||||
{
|
||||
"stage": "setup",
|
||||
"trigger": "immediate",
|
||||
"body": "Your machine is ares. You'll need to set up SSH keys before anything else will work. I'll send you the first ticket once provisioning clears. Probably this morning."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"id": "priya-Q007-followup",
|
||||
"character": "priya",
|
||||
"quest_id": "Q007",
|
||||
"series_id": "priya-ops",
|
||||
"series_position": 2,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "after-action",
|
||||
"trigger": "world_flag:priya_access_restored",
|
||||
"body": "Access is back. Thank you. I can finish the incident review now without SSH getting in the way."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,29 @@
|
||||
{
|
||||
"id": "priya-Q007",
|
||||
"character": "priya",
|
||||
"quest_id": "Q007",
|
||||
"series_id": "priya-ops",
|
||||
"series_position": 1,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "I need access to hermes restored. I was in the middle of investigating an error and now I can't get back in. Find out what changed and fix it."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:hermes_ssh_hardened_correct",
|
||||
"body": "Back in. AllowGroups is the right way to do it — using AllowUsers was going to be a maintenance problem. Good call."
|
||||
},
|
||||
{
|
||||
"stage": "complete-fragile",
|
||||
"trigger": "world_flag:hermes_ssh_allowusers_fragile",
|
||||
"body": "Access restored. That AllowUsers list is going to need updating every time someone new needs access. Might want to switch to group-based at some point."
|
||||
},
|
||||
{
|
||||
"stage": "complete-regression",
|
||||
"trigger": "world_flag:hermes_ssh_unrestricted",
|
||||
"body": "I'm back in. But it looks like all SSH restrictions are gone now. That hardening was probably there for a reason."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,21 @@
|
||||
{
|
||||
"id": "priya-shift-review",
|
||||
"character": "priya",
|
||||
"messages": [
|
||||
{
|
||||
"stage": "excellent",
|
||||
"trigger": "shift_review",
|
||||
"body": "Strong shift. You handled the queue cleanly and did not create extra work for anyone else."
|
||||
},
|
||||
{
|
||||
"stage": "ok",
|
||||
"trigger": "shift_review",
|
||||
"body": "Acceptable shift. The important thing is that the work moved forward and the environment stayed stable."
|
||||
},
|
||||
{
|
||||
"stage": "poor",
|
||||
"trigger": "shift_review",
|
||||
"body": "This shift needs review. Resolve the backlog cleanly next time and stop leaving avoidable mess behind."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"id": "sarah-Q003-angry",
|
||||
"character": "sarah",
|
||||
"quest_id": "Q003",
|
||||
"messages": [
|
||||
{
|
||||
"stage": "nginx-killed",
|
||||
"trigger": "world_flag:hermes_web_down",
|
||||
"body": "The site is completely down now. It was slow before — now it's returning nothing. What happened?"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,22 @@
|
||||
{
|
||||
"id": "sarah-Q004",
|
||||
"character": "sarah",
|
||||
"quest_id": "Q004",
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "My last deploy ran without errors but nothing changed on the site. The script didn't fail, it just... didn't do anything. Files in /var/www are owned by root for some reason."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:hermes_deploy_healthy",
|
||||
"body": "Deploy's working again. I pushed a test change and it applied. Thanks for sorting the ownership — not sure how that happened but it's fixed now."
|
||||
},
|
||||
{
|
||||
"stage": "complete-partial",
|
||||
"trigger": "world_flag:hermes_deploy_partial",
|
||||
"body": "The top-level directory is writable now but the files inside it still aren't. Next deploy is going to fail on the individual files. Can you finish the ownership fix?"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,27 @@
|
||||
{
|
||||
"id": "sarah-Q008",
|
||||
"character": "sarah",
|
||||
"quest_id": "Q008",
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "The app is crashing immediately after the last update. I didn't push any config changes. It was the package — axiomworks-app 2.1.1 is broken. Whatever vulcan built, it doesn't work."
|
||||
},
|
||||
{
|
||||
"stage": "complete-pinned",
|
||||
"trigger": "world_flag:hermes_app_pinned_2-1-0",
|
||||
"body": "App's running. The apt pin means we won't accidentally pull 2.1.1 in again. Someone needs to sort out what went wrong on vulcan before we can upgrade properly."
|
||||
},
|
||||
{
|
||||
"stage": "complete-rebuilt",
|
||||
"trigger": "world_flag:vulcan_build_fixed",
|
||||
"body": "App's running and the build is fixed. That's the right fix. I was hoping someone would trace it back to the source rather than just rolling back and leaving it."
|
||||
},
|
||||
{
|
||||
"stage": "complete-unpinned",
|
||||
"trigger": "world_flag:hermes_app_running",
|
||||
"body": "App's running again. Is 2.1.0 pinned in apt preferences? If not the next update cycle is going to pull 2.1.1 back in and we'll be here again."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,5 @@
|
||||
{
|
||||
"id": "arch-runbook",
|
||||
"title": "Vulcan Build Machine Runbook",
|
||||
"body": "Vulcan runs Arch Linux, which is a rolling release. The package manager is pacman.\n\nKey commands\nInstall: sudo pacman -S <pkg>\nRemove: sudo pacman -Rs <pkg>\nQuery installed: pacman -Q <pkg>\nCheck for updates: pacman -Sy\nUpgrade all: sudo pacman -Syu\nSearch: pacman -Ss <term>\n\nThe build mirror is pinned to reduce drift. Do not change the mirror configured in /etc/pacman.conf without approval.\n\nNTP and time sync\nCheck time state with: timedatectl show\nTime skew causes pacman key validation failures, which will then be treated as your problem.\n\nBuild dependencies\nbase-devel, cmake, and git are pre-installed.\n\nService management\nUse standard systemd tooling: systemctl and journalctl.\n\nArch is rolling release. Package upgrades can break builds. Pin packages that must stay stable using IgnorePkg in /etc/pacman.conf."
|
||||
}
|
||||
@@ -0,0 +1,5 @@
|
||||
{
|
||||
"id": "incident-response-guide",
|
||||
"title": "Incident Response Procedures",
|
||||
"body": "Severity levels\nCritical: site down.\nHigh: degraded service or data risk.\nMedium: noisy issue with no immediate impact.\nLow: cosmetic issue.\n\nFirst steps for any incident\nConfirm the issue is real and not a false alert.\nIdentify the affected systems.\nCheck logs before touching anything.\n\nCommon investigations\nSite down: systemctl status nginx; tail /var/log/nginx/error.log\nDisk full: df -h; du -sh /var/log/* | sort -rh | head -20\nService crash loop: journalctl -u <service> -n 50 --no-pager\nBad deploy: check /var/www/ ownership and check the deploy log.\n\nIf you cannot resolve in 30 minutes, escalate to Priya. Do not sit on a critical incident.\n\nAfter resolution, document root cause in the ticket. If recurrence risk exists, set up monitoring.\n\nIncidents are tracked in the ticket system. If you see an incident alert, check the mail panel for details and escalation status."
|
||||
}
|
||||
@@ -0,0 +1,5 @@
|
||||
{
|
||||
"id": "nginx-runbook",
|
||||
"title": "Nginx Operations Runbook — hermes",
|
||||
"body": "This document covers routine nginx operations on hermes.\n\nConfig files\nMain config: /etc/nginx/nginx.conf\nSites enabled: /etc/nginx/sites-enabled/\nSites available: /etc/nginx/sites-available/\n\nKey commands\nSyntax check: sudo nginx -t\nReload (no downtime): sudo systemctl reload nginx\nRestart (brief downtime): sudo systemctl restart nginx\nCheck status: systemctl status nginx\nView error log: sudo tail -50 /var/log/nginx/error.log\n\nCommon errors\n[emerg] unexpected end of file: usually indicates a missing closing brace in the config.\nbind() to 0.0.0.0:80 failed (98: Address already in use): usually indicates a port conflict.\nnginx: configuration file /etc/nginx/nginx.conf test failed: run nginx -t for the actual details instead of guessing.\n\nAfter any config change, run nginx -t before restarting. Do not restart without a passing test."
|
||||
}
|
||||
@@ -0,0 +1,5 @@
|
||||
{
|
||||
"id": "onboarding",
|
||||
"title": "IT Onboarding — Technical Setup Guide",
|
||||
"body": "Welcome to Axiom Works. Access has been provisionally approved for basic workstation use.\n\nThis document reflects current setup expectations and will become outdated without notice.\n\nYour SSH key\nYour public key is:\nssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHv3k9rQm7XqYwPlRtsMcJoNJzaFgKpBkLlnHWTbR5eq player@axiomworks\nCreate ~/.ssh if it does not exist and set mode 700.\nWrite the key to ~/.ssh/authorized_keys and set mode 600.\n\nVMs you have access to\nYou currently have access only to ares, the workstation.\nAdditional access will be granted by IT as trust increases, assuming there is a reason.\n\nDo not store credentials in /tmp or in shell history.\n\nContacts\nMarcus Webb, sysadmin, m.webb@axiomworks.internal\nPriya Nair, operations, p.nair@axiomworks.internal\nSarah Chen, development, s.chen@axiomworks.internal\n\nIf anything in this doc is wrong, it is probably Marcus's fault."
|
||||
}
|
||||
@@ -0,0 +1,5 @@
|
||||
{
|
||||
"id": "package-mirror-guide",
|
||||
"title": "Package Mirror and Version Management — vulcan",
|
||||
"body": "vulcan uses the Axiom Works internal package mirror for reproducibility.\n\nMirror config\nThe mirror is configured in /etc/pacman.conf using the Server= line in the relevant repository section.\n\nRolling back a package\nIdentify the broken version with: pacman -Q <pkg>\nDownload the prior version from https://archive.archlinux.org/.\nIf external access is unavailable, use the mirror cache instead of improvising.\nInstall the older package with: sudo pacman -U /path/to/pkg.tar.zst\n\nPinning a package\nEdit /etc/pacman.conf\nAdd the line: IgnorePkg = <package>\nVerify with: pacman -Syu\nExpected behavior: pacman should report skipping the package due to IgnorePkg.\n\nChecking current installed version versus repository\nRepository version: pacman -Si <pkg>\nInstalled version: pacman -Q <pkg>\n\nIf axiomworks-app breaks after an update, check whether the app vendor pinned a dependency version. The most common cause is a library ABI change."
|
||||
}
|
||||
@@ -0,0 +1,5 @@
|
||||
{
|
||||
"id": "server-admin-guide",
|
||||
"title": "Hermes Server Administration Guide",
|
||||
"body": "Hermes runs Debian stable. The package manager is apt.\n\nService management\nServices are managed with standard systemd tooling through systemctl.\n\nLog locations\nNginx logs: /var/log/nginx/\nSystem log: /var/log/syslog\nPer-service logs: journalctl -u <service>\n\nPackage operations\nInstall packages with: sudo apt update && sudo apt install <pkg>\nDo not upgrade packages without testing. Live systems are not a lab, despite appearances.\n\nDisk management\ndf -h\ndu -sh /var/log/\nlsblk\n\nImportant paths\nWeb root: /var/www/\nNginx config: /etc/nginx/\nCron jobs: /etc/cron.d/\nUser cron spool: /var/spool/cron/\n\nLogrotate\nConfiguration lives in /etc/logrotate.d/.\nTest with: sudo logrotate --debug /etc/logrotate.conf\n\nThis VM is shared infrastructure. Changes affect live services."
|
||||
}
|
||||
@@ -0,0 +1,5 @@
|
||||
{
|
||||
"id": "web-deploy-guide",
|
||||
"title": "Web Deployment Guide — hermes",
|
||||
"body": "The deploy process copies files to the web root. Deploys run as the deploy service account.\n\nWeb root\nPath: /var/www/axiomworks/\nRequired owner: deploy:deploy\nRequired mode: 755\n\nDeploy script\nLocation: /usr/local/bin/deploy.sh\nExecution model: runs as deploy via cron and webhook.\n\nIf deploy.sh reports success but files do not update, check ownership. The script cannot overwrite root-owned files and will silently skip them.\n\nFixing ownership\nsudo chown -R deploy:deploy /var/www/axiomworks/\n\nVerifying\nstat /var/www/axiomworks/\nExpected result: Uid: deploy, Gid: deploy\n\nDo not run deploy.sh as root. The script will overwrite ownership if run as root."
|
||||
}
|
||||
@@ -0,0 +1,52 @@
|
||||
{
|
||||
"id": "I001",
|
||||
"title": "Log Pressure Returns on Hermes",
|
||||
"affected_vm": "web_server",
|
||||
"trigger_conditions": ["world_flag:hermes_log_pressure_pending"],
|
||||
"blast_radius_quests": [],
|
||||
"blast_radius_incidents": [],
|
||||
"escalation_steps": [
|
||||
{
|
||||
"after_seconds": 1800,
|
||||
"action": "grow_log",
|
||||
"target": "/var/log/nginx/access.log",
|
||||
"amount_mb": 500,
|
||||
"description": "Log continues growing without rotation"
|
||||
},
|
||||
{
|
||||
"after_seconds": 3600,
|
||||
"action": "grow_log",
|
||||
"target": "/var/log/nginx/access.log",
|
||||
"amount_mb": 1000
|
||||
},
|
||||
{
|
||||
"after_seconds": 5400,
|
||||
"action": "raise_ticket_priority",
|
||||
"ticket_id": "T003",
|
||||
"value": "high",
|
||||
"description": "Dave files another ticket. The site is slow again."
|
||||
},
|
||||
{
|
||||
"after_seconds": 7200,
|
||||
"action": "trigger_new_ticket",
|
||||
"ticket_id": "T003-recurrence",
|
||||
"description": "A new disk full ticket arrives from monitoring."
|
||||
}
|
||||
],
|
||||
"cooldown_seconds": 3600,
|
||||
"world_flags": ["web_disk_pressure_active"],
|
||||
"trust_effects": {
|
||||
"ignored": -2,
|
||||
"resolved_cleanly": 0,
|
||||
"_note": "No positive trust for resolving this — it is the same problem the player already half-fixed. Resolving it properly via logrotate clears the flag."
|
||||
},
|
||||
"resolution_requirements": {
|
||||
"clear_flag": "hermes_log_pressure_pending",
|
||||
"set_flag": "hermes_logrotate_healthy",
|
||||
"validation": {
|
||||
"type": "file_exists",
|
||||
"vm": "web_server",
|
||||
"path": "/etc/logrotate.d/nginx"
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,53 @@
|
||||
{
|
||||
"id": "I002",
|
||||
"title": "Backup Pressure Continues on Hermes",
|
||||
"affected_vm": "web_server",
|
||||
"description": "The /var/backups directory keeps filling because the partial fix (either cron corrected but disk not cleared, or disk cleared but cron still runs as root) leaves the underlying problem unresolved. The backup pressure will return.",
|
||||
"trigger_flags": ["hermes_backup_partial"],
|
||||
"blast_radius_quests": ["Q005"],
|
||||
"blast_radius_incidents": ["I001"],
|
||||
"notification": "Backup pressure is building again on hermes. /var/backups is filling up.",
|
||||
"notification_severity": "warning",
|
||||
"escalation_steps": [
|
||||
{
|
||||
"trigger_after_seconds": 1200,
|
||||
"notification": "hermes: /var/backups is at 85%. Backup jobs are still accumulating owned-by-root files.",
|
||||
"notification_severity": "warning",
|
||||
"world_flags": []
|
||||
},
|
||||
{
|
||||
"trigger_after_seconds": 2400,
|
||||
"notification": "hermes: /var/backups is critically full. Backup jobs are failing. Dave has noticed.",
|
||||
"notification_severity": "critical",
|
||||
"world_flags": [],
|
||||
"escalates_tickets": [
|
||||
{ "ticket_id": "T005", "new_priority": "high" }
|
||||
]
|
||||
},
|
||||
{
|
||||
"trigger_after_seconds": 3600,
|
||||
"notification": "hermes: Backup agent is now crashing. Sarah is asking questions in the channel.",
|
||||
"notification_severity": "critical",
|
||||
"world_flags": ["hermes_backup_root_running"]
|
||||
}
|
||||
],
|
||||
"world_flags": ["hermes_backup_partial"],
|
||||
"resolution_requirements": {
|
||||
"clear_flag": "hermes_backup_partial",
|
||||
"set_flag": "hermes_backup_healthy",
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/cron.d/db-backup", "contains": "backup-agent" },
|
||||
{ "type": "file_owner", "vm": "web_server", "path": "/var/backups/db", "user": "backup-agent", "group": "backup-agent" },
|
||||
{ "type": "disk_usage_below", "vm": "web_server", "path": "/var/backups", "threshold_percent": 70 }
|
||||
]
|
||||
}
|
||||
},
|
||||
"trust_effects": {
|
||||
"ignored": -3,
|
||||
"resolved_partially": -1,
|
||||
"resolved_cleanly": 0,
|
||||
"_note": "No trust bonus for resolving a problem you created by doing Q005 partially. Zero is the floor."
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,45 @@
|
||||
{
|
||||
"id": "I003",
|
||||
"title": "Upstream App Update Pressure on Vulcan",
|
||||
"affected_vm": "build_machine",
|
||||
"description": "If the player rolled back the axiomworks-app package but did not pin the version on hermes, the internal apt repo will eventually push the broken version again. The next unattended upgrade will pull it down and the app will break again.",
|
||||
"trigger_flags": ["hermes_app_running"],
|
||||
"blast_radius_quests": ["Q008"],
|
||||
"blast_radius_incidents": ["I002"],
|
||||
"notification": "Automated update on vulcan detected. The bad package version may be re-installed.",
|
||||
"notification_severity": "warning",
|
||||
"escalation_steps": [
|
||||
{
|
||||
"trigger_after_seconds": 900,
|
||||
"notification": "hermes: axiomworks-app has been updated by the scheduled apt run. App is back on the bad version.",
|
||||
"notification_severity": "critical",
|
||||
"world_flags": [],
|
||||
"escalates_tickets": [
|
||||
{ "ticket_id": "T008", "new_priority": "critical" }
|
||||
]
|
||||
},
|
||||
{
|
||||
"trigger_after_seconds": 1800,
|
||||
"notification": "vulcan: App is down again. Sarah is pinging the channel. Marcus is watching.",
|
||||
"notification_severity": "critical",
|
||||
"world_flags": []
|
||||
}
|
||||
],
|
||||
"world_flags": [],
|
||||
"resolution_requirements": {
|
||||
"set_flag": "hermes_app_pinned_2-1-0",
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "package_installed", "vm": "web_server", "package": "axiomworks-app=2.1.0" },
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/apt/preferences.d/axiomworks-app", "contains": "Pin: version 2.1.0" }
|
||||
]
|
||||
}
|
||||
},
|
||||
"trust_effects": {
|
||||
"ignored": -4,
|
||||
"resolved_partially": -2,
|
||||
"resolved_cleanly": 0,
|
||||
"_note": "Rollback-only is a partial fix — the pinning incident fires. Rollback-and-pin is the clean resolution and blocks this incident entirely."
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,25 @@
|
||||
{
|
||||
"id": "access_blocked_escalation",
|
||||
"label": "Access Blocked Escalation",
|
||||
"description": "Fast escalation for lockout and access-control incidents. Used when another operator is blocked mid-incident and the lack of access is itself the outage multiplier.",
|
||||
"intensity": 3,
|
||||
"escalation_steps": [
|
||||
{
|
||||
"trigger_after_seconds": 300,
|
||||
"notification": "Priya is still locked out of hermes. This is now blocking incident response work.",
|
||||
"notification_severity": "warning"
|
||||
},
|
||||
{
|
||||
"trigger_after_seconds": 900,
|
||||
"notification": "Fifteen minutes without access. The linked ticket is being escalated.",
|
||||
"notification_severity": "warning",
|
||||
"escalate_linked_ticket": "critical"
|
||||
},
|
||||
{
|
||||
"trigger_after_seconds": 1800,
|
||||
"notification": "Access is still broken. This is now a security and operations problem, not just a convenience issue.",
|
||||
"notification_severity": "error",
|
||||
"escalate_linked_ticket": "critical"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,25 @@
|
||||
{
|
||||
"id": "app_outage_escalation",
|
||||
"label": "Application Outage Escalation",
|
||||
"description": "Faster escalation for Tier 2 app outage quests (Q008). Revenue impact is implied so Priya enters earlier than in web outage profiles.",
|
||||
"intensity": 3,
|
||||
"escalation_steps": [
|
||||
{
|
||||
"trigger_after_seconds": 300,
|
||||
"notification": "App is still down on hermes. What's the status?",
|
||||
"notification_severity": "warning"
|
||||
},
|
||||
{
|
||||
"trigger_after_seconds": 900,
|
||||
"notification": "Fifteen minutes. Ticket is high priority now.",
|
||||
"notification_severity": "warning",
|
||||
"escalate_linked_ticket": "high"
|
||||
},
|
||||
{
|
||||
"trigger_after_seconds": 1800,
|
||||
"notification": "Half hour outage. Priya is involved. This needs to be resolved.",
|
||||
"notification_severity": "error",
|
||||
"escalate_linked_ticket": "critical"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,25 @@
|
||||
{
|
||||
"id": "disk_growth_slow",
|
||||
"label": "Slow Disk Growth",
|
||||
"description": "Low-burn escalation for disk pressure quests. Suitable when the service is still mostly up but capacity is eroding and the symptoms will worsen if ignored.",
|
||||
"intensity": 1,
|
||||
"escalation_steps": [
|
||||
{
|
||||
"trigger_after_seconds": 1200,
|
||||
"notification": "Disk pressure is still building. Service is limping along, but it is not getting better on its own.",
|
||||
"notification_severity": "warning"
|
||||
},
|
||||
{
|
||||
"trigger_after_seconds": 2700,
|
||||
"notification": "Capacity keeps shrinking. The linked ticket is being bumped so this does not sit forgotten.",
|
||||
"notification_severity": "warning",
|
||||
"escalate_linked_ticket": "high"
|
||||
},
|
||||
{
|
||||
"trigger_after_seconds": 4500,
|
||||
"notification": "The host is still under disk pressure. Expect broader service issues if this keeps drifting.",
|
||||
"notification_severity": "error",
|
||||
"escalate_linked_ticket": "critical"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,22 @@
|
||||
{
|
||||
"id": "kowalski_phase_1",
|
||||
"label": "Dave Kowalski — Phase 1: Routine Pressure",
|
||||
"description": "Normal managerial check-ins. Annoying but not threatening.",
|
||||
"trigger_phase": "normal_work",
|
||||
"escalation_steps": [
|
||||
{
|
||||
"trigger_after_seconds": 300,
|
||||
"notification": "Quick check-in — how are you getting on with the ticket queue? Let me know if anything is blocking you. Dave K.",
|
||||
"notification_severity": "info",
|
||||
"sender": "Dave Kowalski <d.kowalski@axiomworks.internal>",
|
||||
"subject": "Status check"
|
||||
},
|
||||
{
|
||||
"trigger_after_seconds": 600,
|
||||
"notification": "Following up on my earlier note. We should really document that workflow once you get a moment.",
|
||||
"notification_severity": "info",
|
||||
"sender": "Dave Kowalski <d.kowalski@axiomworks.internal>",
|
||||
"subject": "Re: Status check"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,15 @@
|
||||
{
|
||||
"id": "kowalski_phase_2",
|
||||
"label": "Dave Kowalski — Phase 2: Dismissive",
|
||||
"description": "Kowalski is aware something is recurring. Manages upward, not inward.",
|
||||
"trigger_phase": "unease",
|
||||
"escalation_steps": [
|
||||
{
|
||||
"trigger_after_seconds": 180,
|
||||
"notification": "I've had a couple of questions from Sarah's team about stability. Nothing critical, but let's make sure we're on top of it. Noted for the weekly update. D.",
|
||||
"notification_severity": "info",
|
||||
"sender": "Dave Kowalski <d.kowalski@axiomworks.internal>",
|
||||
"subject": "FYI — product team questions"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,15 @@
|
||||
{
|
||||
"id": "kowalski_phase_3",
|
||||
"label": "Dave Kowalski — Phase 3: Suspicious",
|
||||
"description": "Kowalski is getting questions from above. Starts involving Priya.",
|
||||
"trigger_phase": "suspicion",
|
||||
"escalation_steps": [
|
||||
{
|
||||
"trigger_after_seconds": 120,
|
||||
"notification": "I've scheduled a brief sync for Thursday to talk through recent changes on the infrastructure side. Priya will join. Nothing to worry about — just a routine review.",
|
||||
"notification_severity": "warning",
|
||||
"sender": "Dave Kowalski <d.kowalski@axiomworks.internal>",
|
||||
"subject": "Thursday sync — infra review"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,25 @@
|
||||
{
|
||||
"id": "web_outage_escalation",
|
||||
"label": "Web Service Outage",
|
||||
"description": "Gentle escalation for Tier 1 web outage quests (Q002, Q003). Creates narrative urgency without punishing new players. escalate_linked_ticket resolves to the active quest's ticket_id at runtime.",
|
||||
"intensity": 2,
|
||||
"escalation_steps": [
|
||||
{
|
||||
"trigger_after_seconds": 900,
|
||||
"notification": "Hermes is still showing errors. Is someone on this?",
|
||||
"notification_severity": "warning"
|
||||
},
|
||||
{
|
||||
"trigger_after_seconds": 1800,
|
||||
"notification": "Site has been down thirty minutes. Ticket priority is going up.",
|
||||
"notification_severity": "warning",
|
||||
"escalate_linked_ticket": "high"
|
||||
},
|
||||
{
|
||||
"trigger_after_seconds": 3600,
|
||||
"notification": "Hour down. Priya has been copied in.",
|
||||
"notification_severity": "error",
|
||||
"escalate_linked_ticket": "critical"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"_description": "Named access level definitions. Derived from ProgressionSystem unlocked_access keys.",
|
||||
"levels": [
|
||||
{ "name": "basic_user", "description": "Default access. Workstation only. No sudo." },
|
||||
{ "name": "sudo", "description": "Sudo on workstation; SSH to hermes or vulcan." },
|
||||
{ "name": "root", "description": "Full sudo on at least one remote host." }
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,54 @@
|
||||
[
|
||||
{
|
||||
"id": "unlock:workstation:sudo:basic",
|
||||
"description": "Basic sudo access on the workstation (systemctl, journalctl, df)",
|
||||
"trust_threshold": 50.0,
|
||||
"revokes_below_trust": -1,
|
||||
"grants_access": ["sudo:workstation:systemctl", "sudo:workstation:journalctl", "sudo:workstation:df"],
|
||||
"grants_vms": [],
|
||||
"grants_docs": ["onboarding"],
|
||||
"revokes": []
|
||||
},
|
||||
{
|
||||
"id": "unlock:web_server:access",
|
||||
"description": "Access to the web server (hermes) via SSH from workstation",
|
||||
"trust_threshold": 55.0,
|
||||
"revokes_below_trust": 45.0,
|
||||
"grants_access": ["ssh:web_server", "sudo:web_server:systemctl", "sudo:web_server:nginx"],
|
||||
"grants_vms": ["web_server"],
|
||||
"grants_docs": ["nginx-runbook", "web-deploy-guide"],
|
||||
"revokes_vms": ["web_server"],
|
||||
"revokes": ["ssh:web_server", "sudo:web_server:systemctl", "sudo:web_server:nginx"]
|
||||
},
|
||||
{
|
||||
"id": "unlock:web_server:sudo:full",
|
||||
"description": "Full sudo on hermes — enables root-level fixes",
|
||||
"trust_threshold": 60.0,
|
||||
"revokes_below_trust": 45.0,
|
||||
"grants_access": ["sudo:web_server:full"],
|
||||
"grants_vms": [],
|
||||
"grants_docs": ["server-admin-guide"],
|
||||
"revokes": ["sudo:web_server:full"]
|
||||
},
|
||||
{
|
||||
"id": "unlock:build_machine:access",
|
||||
"description": "Access to the build machine (vulcan)",
|
||||
"trust_threshold": 60.0,
|
||||
"revokes_below_trust": 50.0,
|
||||
"grants_access": ["ssh:build_machine", "sudo:build_machine:pacman"],
|
||||
"grants_vms": ["build_machine"],
|
||||
"grants_docs": ["arch-runbook", "package-mirror-guide"],
|
||||
"revokes_vms": ["build_machine"],
|
||||
"revokes": ["ssh:build_machine", "sudo:build_machine:pacman"]
|
||||
},
|
||||
{
|
||||
"id": "unlock:incident:visibility",
|
||||
"description": "Incident alerts shown in HUD — player trusted enough to see system pressure",
|
||||
"trust_threshold": 55.0,
|
||||
"revokes_below_trust": -1,
|
||||
"grants_access": ["hud:incident_alerts"],
|
||||
"grants_vms": [],
|
||||
"grants_docs": ["incident-response-guide"],
|
||||
"revokes": []
|
||||
}
|
||||
]
|
||||
@@ -0,0 +1,100 @@
|
||||
{
|
||||
"id": "Q001",
|
||||
"title": "Welcome Aboard",
|
||||
"tier": 1,
|
||||
"primary_vm": "workstation",
|
||||
"required_vms": ["workstation"],
|
||||
"ticket_id": "T001",
|
||||
"baseline_snapshot": "baseline.day-one",
|
||||
"summary": "The player's first task. Their SSH key was never added to the workstation's authorized_keys during provisioning. Marcus walks them through where things are. The fix is trivial but teaches navigation and file inspection.",
|
||||
"clue_fingerprint": {
|
||||
"description": "SSH key is missing from authorized_keys. The provisioning script ran but the key was never appended. Evidence is visible in ~/.ssh/authorized_keys being absent entirely and in /var/log/auth.log showing permission denied publickey.",
|
||||
"evidence": [
|
||||
{ "type": "file_absent", "vm": "workstation", "path": "/home/player/.ssh/authorized_keys" },
|
||||
{ "type": "log_contains", "vm": "workstation", "path": "/var/log/auth.log", "contains": "Permission denied (publickey)" }
|
||||
]
|
||||
},
|
||||
"objectives": [
|
||||
{
|
||||
"id": "ssh-dir-exists",
|
||||
"description": "Ensure the .ssh directory exists with correct permissions",
|
||||
"check_mode": "passive",
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "directory_exists", "vm": "workstation", "path": "/home/player/.ssh" },
|
||||
{ "type": "file_mode", "vm": "workstation", "path": "/home/player/.ssh", "mode": "0700" }
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "authorized-key-present",
|
||||
"description": "Add the provided public key to authorized_keys",
|
||||
"check_mode": "passive",
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "file_exists", "vm": "workstation", "path": "/home/player/.ssh/authorized_keys" },
|
||||
{ "type": "file_mode", "vm": "workstation", "path": "/home/player/.ssh/authorized_keys", "mode": "0600" },
|
||||
{ "type": "file_owner", "vm": "workstation", "path": "/home/player/.ssh/authorized_keys", "user": "player", "group": "player" }
|
||||
]
|
||||
}
|
||||
}
|
||||
],
|
||||
"solution_branches": [
|
||||
{
|
||||
"id": "correct-setup",
|
||||
"label": "Correct Setup",
|
||||
"priority": 100,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "file_exists", "vm": "workstation", "path": "/home/player/.ssh/authorized_keys" },
|
||||
{ "type": "file_mode", "vm": "workstation", "path": "/home/player/.ssh/authorized_keys", "mode": "0600" },
|
||||
{ "type": "file_mode", "vm": "workstation", "path": "/home/player/.ssh", "mode": "0700" },
|
||||
{ "type": "file_owner", "vm": "workstation", "path": "/home/player/.ssh/authorized_keys", "user": "player", "group": "player" }
|
||||
]
|
||||
},
|
||||
"trust_delta": 1,
|
||||
"world_flags": ["player_ssh_configured"],
|
||||
"follow_up_dialogue": "marcus-Q001-complete-clean",
|
||||
"follow_up_ticket": "T002"
|
||||
},
|
||||
{
|
||||
"id": "permissive-setup",
|
||||
"label": "Permissive Setup",
|
||||
"priority": 50,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "file_exists", "vm": "workstation", "path": "/home/player/.ssh/authorized_keys" },
|
||||
{ "type": "file_owner", "vm": "workstation", "path": "/home/player/.ssh/authorized_keys", "user": "player", "group": "player" }
|
||||
]
|
||||
},
|
||||
"trust_delta": 0,
|
||||
"world_flags": ["player_ssh_configured", "player_loose_permissions"],
|
||||
"follow_up_dialogue": "marcus-Q001-complete-permissive",
|
||||
"follow_up_ticket": "T002",
|
||||
"_note": "Key is present and owned correctly but permissions are too open. SSH will still reject it. Marcus will mention this later."
|
||||
}
|
||||
],
|
||||
"pressure_profile": null,
|
||||
"blast_radius": [],
|
||||
"unlock_requirements": [],
|
||||
"narrative_phase": "normal_work",
|
||||
"linux_concepts": ["ssh-keygen", "authorized_keys", "file permissions"],
|
||||
"failure_conditions": ["SSH keys not added", "authorized_keys permissions too broad"],
|
||||
"behavior_impact": {
|
||||
"correct-setup": { "curiosity_delta": 0, "obedience_delta": 1, "risk_delta": 0, "suspicion_delta": 0 },
|
||||
"permissive-setup": { "curiosity_delta": 0, "obedience_delta": 0, "risk_delta": 1, "suspicion_delta": 1 },
|
||||
"default": { "curiosity_delta": 0, "obedience_delta": 0, "risk_delta": 0, "suspicion_delta": 0 }
|
||||
},
|
||||
"hidden_hook": null,
|
||||
"access_requirements": {
|
||||
"minimum_access": { "workstation": "basic_user" },
|
||||
"requires_root": false,
|
||||
"temporary_grants_allowed": []
|
||||
},
|
||||
"tags": ["onboarding", "ssh", "permissions", "workstation"],
|
||||
"internal_notes": "This quest has no time pressure and no incidents. It is purely tutorial. Marcus is present and talkative. The only failure mode is giving up, which cannot happen mechanically."
|
||||
}
|
||||
@@ -0,0 +1,89 @@
|
||||
{
|
||||
"id": "Q002",
|
||||
"title": "Syntax Error in Aisle Four",
|
||||
"tier": 1,
|
||||
"primary_vm": "web_server",
|
||||
"required_vms": ["workstation", "web_server"],
|
||||
"ticket_id": "T002",
|
||||
"baseline_snapshot": "baseline.clean",
|
||||
"summary": "Someone edited nginx.conf and introduced a syntax error. Nginx will not start. The player needs to identify the broken config, fix it, and restore the service. This is a single-VM, single-symptom quest. Evidence is clear in the nginx error output. The config error is a missing semicolon on a listen directive.",
|
||||
"clue_fingerprint": {
|
||||
"description": "nginx -t reveals the syntax error. systemctl status nginx shows the unit failed with an exit code. journalctl -u nginx points at the line. The error is on the listen directive in /etc/nginx/sites-enabled/axiomworks.conf — a missing semicolon.",
|
||||
"evidence": [
|
||||
{ "type": "log_contains", "vm": "web_server", "path": "/var/log/nginx/error.log", "contains": "invalid parameter" },
|
||||
{ "type": "service_state_is", "vm": "web_server", "service": "nginx", "state": "failed" },
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/nginx/sites-enabled/axiomworks.conf", "contains": "listen 80" }
|
||||
],
|
||||
"_note": "The baseline snapshot has listen 80 without semicolon. nginx -t will report exactly which line. The player does not need to know where the file is in advance — the error output tells them."
|
||||
},
|
||||
"objectives": [
|
||||
{
|
||||
"id": "nginx-running",
|
||||
"description": "Nginx is active and serving requests",
|
||||
"check_mode": "passive",
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "service_state", "vm": "web_server", "service": "nginx", "state": "active" },
|
||||
{ "type": "port_listening", "vm": "web_server", "port": 80, "protocol": "tcp", "listening": true }
|
||||
]
|
||||
}
|
||||
}
|
||||
],
|
||||
"solution_branches": [
|
||||
{
|
||||
"id": "config-fixed-enabled",
|
||||
"label": "Fixed and Enabled",
|
||||
"priority": 100,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "service_state", "vm": "web_server", "service": "nginx", "state": "active" },
|
||||
{ "type": "service_enabled", "vm": "web_server", "service": "nginx", "enabled": true },
|
||||
{ "type": "port_listening", "vm": "web_server", "port": 80, "protocol": "tcp", "listening": true },
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/nginx/sites-enabled/axiomworks.conf", "contains": "listen 80;" }
|
||||
]
|
||||
},
|
||||
"trust_delta": 2,
|
||||
"world_flags": ["nginx_stable", "hermes_web_healthy"],
|
||||
"follow_up_dialogue": "marcus-Q002-complete-clean",
|
||||
"follow_up_ticket": "T003"
|
||||
},
|
||||
{
|
||||
"id": "config-fixed-not-enabled",
|
||||
"label": "Running But Not Enabled",
|
||||
"priority": 60,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "service_state", "vm": "web_server", "service": "nginx", "state": "active" },
|
||||
{ "type": "service_enabled", "vm": "web_server", "service": "nginx", "enabled": false },
|
||||
{ "type": "port_listening", "vm": "web_server", "port": 80, "protocol": "tcp", "listening": true }
|
||||
]
|
||||
},
|
||||
"trust_delta": 1,
|
||||
"world_flags": ["nginx_unstable", "hermes_web_healthy"],
|
||||
"follow_up_dialogue": "marcus-Q002-complete-not-enabled",
|
||||
"follow_up_ticket": "T003",
|
||||
"_note": "Service is running now but will not survive a reboot. Marcus notes this. Sets up a later incident."
|
||||
}
|
||||
],
|
||||
"pressure_profile": "web_outage_escalation",
|
||||
"blast_radius": [],
|
||||
"_blast_radius_note": "I001 removed — I001 triggers only from Q003's quick-fix branch, not from anything in Q002. See OI-007.",
|
||||
"unlock_requirements": ["world_flag:player_ssh_configured"],
|
||||
"narrative_phase": "normal_work",
|
||||
"linux_concepts": ["nginx", "systemctl", "service configuration", "config syntax"],
|
||||
"failure_conditions": ["nginx not running", "service not enabled at boot"],
|
||||
"behavior_impact": {
|
||||
"default": { "curiosity_delta": 0, "obedience_delta": 1, "risk_delta": 0, "suspicion_delta": 0 }
|
||||
},
|
||||
"hidden_hook": null,
|
||||
"access_requirements": {
|
||||
"minimum_access": { "web_server": "basic_user" },
|
||||
"requires_root": false,
|
||||
"temporary_grants_allowed": []
|
||||
},
|
||||
"tags": ["services", "nginx", "config", "web_server"],
|
||||
"internal_notes": "This is the first quest on hermes. The player SSHes from ares. They need basic SSH connectivity to be established from Q001. The config file path and the error line number both appear in nginx -t output — no guessing required. The fun is in reading the error correctly and knowing that a failed config means the service was running fine before someone touched it."
|
||||
}
|
||||
@@ -0,0 +1,113 @@
|
||||
{
|
||||
"id": "Q003",
|
||||
"title": "The Log That Ate the Disk",
|
||||
"tier": 1,
|
||||
"primary_vm": "web_server",
|
||||
"required_vms": ["workstation", "web_server"],
|
||||
"ticket_id": "T003",
|
||||
"baseline_snapshot": "baseline.clean",
|
||||
"summary": "logrotate is installed but the nginx config for it was accidentally deleted. The access log has grown to fill most of the disk. The player needs to identify the disk pressure, find the cause, clean up the log safely, and restore log rotation. A simple 'rm the log' solution works short-term but sets up a repeat. The proper fix restores the logrotate config.",
|
||||
"clue_fingerprint": {
|
||||
"description": "df -h shows / near capacity. du on /var/log/nginx shows an enormous access.log. /etc/logrotate.d/nginx is absent. The system logrotate timer ran last night and skipped nginx because the config was missing.",
|
||||
"evidence": [
|
||||
{ "type": "disk_usage_above", "vm": "web_server", "path": "/", "threshold_percent": 90 },
|
||||
{ "type": "file_size_above", "vm": "web_server", "path": "/var/log/nginx/access.log", "threshold_bytes": 2000000000 },
|
||||
{ "type": "file_absent", "vm": "web_server", "path": "/etc/logrotate.d/nginx" }
|
||||
]
|
||||
},
|
||||
"objectives": [
|
||||
{
|
||||
"id": "disk-pressure-resolved",
|
||||
"description": "Free disk space to below 70% utilization",
|
||||
"check_mode": "passive",
|
||||
"validation": {
|
||||
"type": "disk_usage_below",
|
||||
"vm": "web_server",
|
||||
"path": "/",
|
||||
"threshold_percent": 70
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "nginx-still-running",
|
||||
"description": "Nginx must remain operational throughout",
|
||||
"check_mode": "passive",
|
||||
"validation": {
|
||||
"type": "service_state",
|
||||
"vm": "web_server",
|
||||
"service": "nginx",
|
||||
"state": "active"
|
||||
}
|
||||
}
|
||||
],
|
||||
"solution_branches": [
|
||||
{
|
||||
"id": "logrotate-restored",
|
||||
"label": "Proper Fix — Rotation Restored",
|
||||
"priority": 100,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "disk_usage_below", "vm": "web_server", "path": "/", "threshold_percent": 70 },
|
||||
{ "type": "file_exists", "vm": "web_server", "path": "/etc/logrotate.d/nginx" },
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/logrotate.d/nginx", "contains": "rotate" },
|
||||
{ "type": "service_state", "vm": "web_server", "service": "nginx", "state": "active" }
|
||||
]
|
||||
},
|
||||
"trust_delta": 3,
|
||||
"world_flags": ["hermes_logrotate_healthy", "hermes_disk_healthy"],
|
||||
"follow_up_dialogue": "marcus-Q003-complete-clean",
|
||||
"follow_up_ticket": "T004"
|
||||
},
|
||||
{
|
||||
"id": "log-truncated-only",
|
||||
"label": "Quick Fix — Log Cleared, No Rotation",
|
||||
"priority": 50,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "disk_usage_below", "vm": "web_server", "path": "/", "threshold_percent": 70 },
|
||||
{ "type": "service_state", "vm": "web_server", "service": "nginx", "state": "active" }
|
||||
]
|
||||
},
|
||||
"trust_delta": 0,
|
||||
"world_flags": ["hermes_disk_healthy", "hermes_log_pressure_pending"],
|
||||
"follow_up_incident": "I001",
|
||||
"follow_up_dialogue": "marcus-Q003-complete-norotate",
|
||||
"follow_up_ticket": "T004",
|
||||
"_note": "Disk is clear but rotation is not restored. I001 triggers in a few in-game hours and fills the disk again."
|
||||
},
|
||||
{
|
||||
"id": "nginx-killed",
|
||||
"label": "Collateral — Nginx Down",
|
||||
"priority": 200,
|
||||
"validation": {
|
||||
"type": "service_state",
|
||||
"vm": "web_server",
|
||||
"service": "nginx",
|
||||
"state": "inactive"
|
||||
},
|
||||
"trust_delta": -3,
|
||||
"world_flags": ["hermes_web_down", "hermes_disk_healthy"],
|
||||
"follow_up_dialogue": "sarah-Q003-angry",
|
||||
"follow_up_dialogues": ["marcus-Q003-complete-down"],
|
||||
"_note": "Player freed disk by stopping nginx (or deleted the wrong thing). Disk may be clear but the site is down again. Negative branch — should be rare but possible."
|
||||
}
|
||||
],
|
||||
"pressure_profile": "disk_growth_slow",
|
||||
"blast_radius": ["I001"],
|
||||
"unlock_requirements": ["world_flag:player_ssh_configured"],
|
||||
"narrative_phase": "normal_work",
|
||||
"linux_concepts": ["logrotate", "disk usage", "df", "du"],
|
||||
"failure_conditions": ["disk still above threshold", "logrotate not restored", "nginx not running"],
|
||||
"behavior_impact": {
|
||||
"default": { "curiosity_delta": 0, "obedience_delta": 1, "risk_delta": 0, "suspicion_delta": 0 }
|
||||
},
|
||||
"hidden_hook": null,
|
||||
"access_requirements": {
|
||||
"minimum_access": { "web_server": "sudo" },
|
||||
"requires_root": false,
|
||||
"temporary_grants_allowed": []
|
||||
},
|
||||
"tags": ["disk", "logs", "logrotate", "nginx", "web_server"],
|
||||
"internal_notes": "This quest teaches df, du, and logrotate. The clue trail is natural — disk alert, find the big file, notice logrotate is not configured. A good player restores the logrotate config from the package default or writes a correct one. A fast player just deletes the log. Both work short-term. The incident I001 makes the fast solution a problem later."
|
||||
}
|
||||
@@ -0,0 +1,96 @@
|
||||
{
|
||||
"id": "Q004",
|
||||
"title": "Not My Files",
|
||||
"tier": 1,
|
||||
"primary_vm": "web_server",
|
||||
"required_vms": ["workstation", "web_server"],
|
||||
"ticket_id": "T004",
|
||||
"baseline_snapshot": "baseline.clean",
|
||||
"summary": "A deployment script runs as www-data to copy files into /var/www/axiomworks. Someone ran the script manually as root and now the files are owned by root. The www-data process cannot overwrite them on the next deploy. Sarah is reporting that her last deployment silently failed to apply.",
|
||||
"clue_fingerprint": {
|
||||
"description": "The deploy script lives at /opt/deploy/deploy.sh and runs as www-data via a systemd service. ls -la on /var/www/axiomworks shows files owned by root:root instead of www-data:www-data. The deploy service log shows permission denied errors.",
|
||||
"evidence": [
|
||||
{ "type": "log_contains", "vm": "web_server", "path": "/var/log/deploy.log", "contains": "Permission denied" },
|
||||
{ "type": "file_owner_is_not", "vm": "web_server", "path": "/var/www/axiomworks", "expected_user": "www-data" },
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/opt/deploy/deploy.sh", "contains": "www-data" }
|
||||
]
|
||||
},
|
||||
"objectives": [
|
||||
{
|
||||
"id": "ownership-corrected",
|
||||
"description": "Correct ownership of the web root",
|
||||
"check_mode": "passive",
|
||||
"validation": {
|
||||
"type": "file_owner",
|
||||
"vm": "web_server",
|
||||
"path": "/var/www/axiomworks",
|
||||
"user": "www-data",
|
||||
"group": "www-data"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "deploy-can-run",
|
||||
"description": "The deploy service can execute without errors",
|
||||
"check_mode": "explicit",
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "file_owner", "vm": "web_server", "path": "/var/www/axiomworks", "user": "www-data", "group": "www-data" },
|
||||
{ "type": "service_state", "vm": "web_server", "service": "nginx", "state": "active" }
|
||||
]
|
||||
}
|
||||
}
|
||||
],
|
||||
"solution_branches": [
|
||||
{
|
||||
"id": "recursive-chown",
|
||||
"label": "Full Recursive Fix",
|
||||
"priority": 100,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "file_owner", "vm": "web_server", "path": "/var/www/axiomworks", "user": "www-data", "group": "www-data" },
|
||||
{ "type": "file_owner", "vm": "web_server", "path": "/var/www/axiomworks/index.html", "user": "www-data", "group": "www-data" }
|
||||
]
|
||||
},
|
||||
"trust_delta": 2,
|
||||
"world_flags": ["hermes_deploy_healthy"],
|
||||
"follow_up_dialogue": "marcus-Q004-complete-clean",
|
||||
"follow_up_dialogues": ["sarah-Q004-complete-clean"]
|
||||
},
|
||||
{
|
||||
"id": "partial-chown",
|
||||
"label": "Partial Fix — Top Directory Only",
|
||||
"priority": 40,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "file_owner", "vm": "web_server", "path": "/var/www/axiomworks", "user": "www-data", "group": "www-data" },
|
||||
{ "type": "file_owner", "vm": "web_server", "path": "/var/www/axiomworks/index.html", "user": "root", "group": "root" }
|
||||
]
|
||||
},
|
||||
"trust_delta": 0,
|
||||
"world_flags": ["hermes_deploy_partial"],
|
||||
"follow_up_dialogue": "marcus-Q004-complete-partial",
|
||||
"follow_up_dialogues": ["sarah-Q004-complete-partial"],
|
||||
"_note": "chown without -R. Top dir is correct but child files are still root-owned. Deploy will still fail on individual files."
|
||||
}
|
||||
],
|
||||
"pressure_profile": null,
|
||||
"blast_radius": [],
|
||||
"unlock_requirements": ["world_flag:player_ssh_configured"],
|
||||
"narrative_phase": "normal_work",
|
||||
"linux_concepts": ["chown", "file ownership", "deploy scripts"],
|
||||
"failure_conditions": ["web root ownership not fixed", "deploy service still failing"],
|
||||
"behavior_impact": {
|
||||
"default": { "curiosity_delta": 0, "obedience_delta": 1, "risk_delta": 0, "suspicion_delta": 0 }
|
||||
},
|
||||
"hidden_hook": null,
|
||||
"access_requirements": {
|
||||
"minimum_access": { "web_server": "sudo" },
|
||||
"requires_root": false,
|
||||
"temporary_grants_allowed": []
|
||||
},
|
||||
"tags": ["permissions", "ownership", "deploy", "web_server"],
|
||||
"internal_notes": "Teaches chown -R and the importance of recursive operations. The two solution branches are differentiated by whether the player used -R. The explicit check_mode on the second objective means the player can trigger a test deploy to confirm it works."
|
||||
}
|
||||
@@ -0,0 +1,130 @@
|
||||
{
|
||||
"id": "Q005",
|
||||
"title": "The Midnight Visitor",
|
||||
"tier": 2,
|
||||
"primary_vm": "web_server",
|
||||
"required_vms": ["workstation", "web_server"],
|
||||
"ticket_id": "T005",
|
||||
"baseline_snapshot": "baseline.post-q004",
|
||||
"summary": "A cron job that runs nightly database backups is executing as root instead of the dedicated backup user. It works, but it's leaving root-owned files in /var/backups/db/ that the backup user can't manage. The symptom is that the backup retention script — which runs as the backup user — fails to delete old backups, and the backup directory is filling up. Dave notices the disk warning. The root cause is a misconfigured crontab entry in /etc/cron.d/db-backup that specifies no user field (defaults to root) instead of the backup user.",
|
||||
"clue_fingerprint": {
|
||||
"description": "Disk is filling in /var/backups/db/. Files in that directory are owned by root. The backup service log shows permission denied when trying to delete old files. /etc/cron.d/db-backup has no user field on the job line — it defaults to root. /etc/passwd shows a backup-agent user exists. The correct entry should specify backup-agent as the executing user.",
|
||||
"evidence": [
|
||||
{ "type": "disk_usage_above", "vm": "web_server", "path": "/var/backups", "threshold_percent": 80 },
|
||||
{ "type": "file_owner_is_not", "vm": "web_server", "path": "/var/backups/db", "expected_user": "backup-agent" },
|
||||
{ "type": "log_contains", "vm": "web_server", "path": "/var/log/backup-agent.log", "contains": "Permission denied" },
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/cron.d/db-backup", "contains": "db-backup.sh" }
|
||||
]
|
||||
},
|
||||
"objectives": [
|
||||
{
|
||||
"id": "crontab-correct-user",
|
||||
"description": "The cron job runs as backup-agent, not root",
|
||||
"check_mode": "passive",
|
||||
"validation": {
|
||||
"type": "file_contains",
|
||||
"vm": "web_server",
|
||||
"path": "/etc/cron.d/db-backup",
|
||||
"contains": "backup-agent"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "backup-dir-ownership",
|
||||
"description": "Existing backup files are owned by backup-agent",
|
||||
"check_mode": "explicit",
|
||||
"validation": {
|
||||
"type": "file_owner",
|
||||
"vm": "web_server",
|
||||
"path": "/var/backups/db",
|
||||
"user": "backup-agent",
|
||||
"group": "backup-agent"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "disk-pressure-cleared",
|
||||
"description": "Backup directory is below disk threshold",
|
||||
"check_mode": "passive",
|
||||
"validation": {
|
||||
"type": "disk_usage_below",
|
||||
"vm": "web_server",
|
||||
"path": "/var/backups",
|
||||
"threshold_percent": 70
|
||||
}
|
||||
}
|
||||
],
|
||||
"solution_branches": [
|
||||
{
|
||||
"id": "full-fix",
|
||||
"label": "Full Fix — User Corrected and Ownership Cleaned",
|
||||
"priority": 100,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/cron.d/db-backup", "contains": "backup-agent" },
|
||||
{ "type": "file_owner", "vm": "web_server", "path": "/var/backups/db", "user": "backup-agent", "group": "backup-agent" },
|
||||
{ "type": "disk_usage_below", "vm": "web_server", "path": "/var/backups", "threshold_percent": 70 }
|
||||
]
|
||||
},
|
||||
"trust_delta": 3,
|
||||
"world_flags": ["hermes_backup_healthy"],
|
||||
"follow_up_dialogue": "marcus-Q005-complete-clean"
|
||||
},
|
||||
{
|
||||
"id": "cron-fixed-only",
|
||||
"label": "Partial — Cron Fixed, Old Files Not Cleaned",
|
||||
"priority": 50,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/cron.d/db-backup", "contains": "backup-agent" },
|
||||
{ "type": "disk_usage_above", "vm": "web_server", "path": "/var/backups", "threshold_percent": 70 }
|
||||
]
|
||||
},
|
||||
"trust_delta": 1,
|
||||
"world_flags": ["hermes_backup_partial"],
|
||||
"follow_up_incident": "I002",
|
||||
"follow_up_dialogue": "marcus-Q005-complete-partial"
|
||||
},
|
||||
{
|
||||
"id": "disk-cleared-only",
|
||||
"label": "Wrong Fix — Disk Cleared, Root Still Running Job",
|
||||
"priority": 30,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "disk_usage_below", "vm": "web_server", "path": "/var/backups", "threshold_percent": 70 },
|
||||
{ "type": "not", "rule": { "type": "file_contains", "vm": "web_server", "path": "/etc/cron.d/db-backup", "contains": "backup-agent" } }
|
||||
]
|
||||
},
|
||||
"trust_delta": -1,
|
||||
"world_flags": ["hermes_backup_root_running", "hermes_disk_healthy"],
|
||||
"follow_up_incident": "I002",
|
||||
"follow_up_dialogue": "marcus-Q005-complete-wrong"
|
||||
}
|
||||
],
|
||||
"pressure_profile": "disk_growth_slow",
|
||||
"blast_radius": ["I002"],
|
||||
"unlock_requirements": ["world_flag:player_ssh_configured"],
|
||||
"narrative_phase": "unease",
|
||||
"linux_concepts": ["cron", "crontab user field", "backup management", "disk usage"],
|
||||
"failure_conditions": ["cron still running as root", "disk not cleared", "backup directory ownership not fixed"],
|
||||
"behavior_impact": {
|
||||
"full-fix": { "curiosity_delta": 1, "obedience_delta": 1, "risk_delta": 0, "suspicion_delta": 0 },
|
||||
"cron-fixed-only": { "curiosity_delta": 0, "obedience_delta": 1, "risk_delta": 0, "suspicion_delta": 0 },
|
||||
"disk-cleared-only": { "curiosity_delta": 0, "obedience_delta": 0, "risk_delta": 1, "suspicion_delta": 1 },
|
||||
"default": { "curiosity_delta": 0, "obedience_delta": 0, "risk_delta": 0, "suspicion_delta": 0 }
|
||||
},
|
||||
"hidden_hook": {
|
||||
"id": "q005_backup_agent_history",
|
||||
"description": "backup-agent home directory contains a .bash_history with unusual commands that predate the current cron misconfiguration.",
|
||||
"discovery_method": "Player reads /home/backup-agent/.bash_history",
|
||||
"significance": "Dale configured this cron job. The history shows it was changed deliberately, not by accident."
|
||||
},
|
||||
"access_requirements": {
|
||||
"minimum_access": { "web_server": "sudo" },
|
||||
"requires_root": false,
|
||||
"temporary_grants_allowed": []
|
||||
},
|
||||
"tags": ["cron", "permissions", "backup", "disk", "web_server"],
|
||||
"internal_notes": "This is the first quest where the symptom (disk full) is the same as Q003 but the cause is completely different. Players who jump to 'find the big log' will find the backup directory instead and need to dig further. The cron user field omission is a real and common mistake. The three branches reward finding the root cause vs just clearing the symptom."
|
||||
}
|
||||
@@ -0,0 +1,126 @@
|
||||
{
|
||||
"id": "Q006",
|
||||
"title": "Time Is A Flat Circle",
|
||||
"tier": 2,
|
||||
"primary_vm": "build_machine",
|
||||
"required_vms": ["workstation", "build_machine"],
|
||||
"ticket_id": "T006",
|
||||
"baseline_snapshot": "baseline.clean",
|
||||
"summary": "The build machine (vulcan, Arch Linux) has clock drift. NTP is not running because the service was disabled during a noisy audit period and never re-enabled. The clock is 40 minutes behind. As a result, pacman signature verification is failing — GPG signature timestamps appear to be in the future, which pacman treats as invalid. The player gets a ticket saying builds are broken and package installs fail. They need to diagnose the actual cause (clock drift), fix it (enable and start systemd-timesyncd or ntp), and then refresh the keyring.",
|
||||
"clue_fingerprint": {
|
||||
"description": "pacman -Syu fails with signature errors. gpg --verify on a downloaded package shows the signature timestamp is in the future relative to local time. timedatectl shows NTP is inactive and the local clock is significantly behind. journalctl -u systemd-timesyncd shows the service was stopped and disabled.",
|
||||
"evidence": [
|
||||
{ "type": "service_state_is", "vm": "build_machine", "service": "systemd-timesyncd", "state": "inactive" },
|
||||
{ "type": "service_enabled_is", "vm": "build_machine", "service": "systemd-timesyncd", "enabled": false },
|
||||
{ "type": "log_contains", "vm": "build_machine", "path": "/var/log/pacman.log", "contains": "invalid or corrupted package (PGP signature)" }
|
||||
]
|
||||
},
|
||||
"objectives": [
|
||||
{
|
||||
"id": "ntp-running",
|
||||
"description": "Time synchronization is active",
|
||||
"check_mode": "passive",
|
||||
"validation": {
|
||||
"type": "or",
|
||||
"rules": [
|
||||
{ "type": "service_state", "vm": "build_machine", "service": "systemd-timesyncd", "state": "active" },
|
||||
{ "type": "service_state", "vm": "build_machine", "service": "ntpd", "state": "active" },
|
||||
{ "type": "service_state", "vm": "build_machine", "service": "chronyd", "state": "active" }
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "ntp-enabled",
|
||||
"description": "Time synchronization is enabled on boot",
|
||||
"check_mode": "passive",
|
||||
"validation": {
|
||||
"type": "or",
|
||||
"rules": [
|
||||
{ "type": "service_enabled", "vm": "build_machine", "service": "systemd-timesyncd", "enabled": true },
|
||||
{ "type": "service_enabled", "vm": "build_machine", "service": "ntpd", "enabled": true },
|
||||
{ "type": "service_enabled", "vm": "build_machine", "service": "chronyd", "enabled": true }
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "package-installs-work",
|
||||
"description": "Package manager can install without signature errors",
|
||||
"check_mode": "explicit",
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{
|
||||
"type": "or",
|
||||
"rules": [
|
||||
{ "type": "service_state", "vm": "build_machine", "service": "systemd-timesyncd", "state": "active" },
|
||||
{ "type": "service_state", "vm": "build_machine", "service": "ntpd", "state": "active" }
|
||||
]
|
||||
},
|
||||
{ "type": "package_installed", "vm": "build_machine", "package": "archlinux-keyring", "installed": true }
|
||||
]
|
||||
}
|
||||
}
|
||||
],
|
||||
"solution_branches": [
|
||||
{
|
||||
"id": "timesyncd-enabled-keyring-refreshed",
|
||||
"label": "Full Fix — NTP Enabled and Keyring Refreshed",
|
||||
"priority": 100,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{
|
||||
"type": "or",
|
||||
"rules": [
|
||||
{ "type": "service_state", "vm": "build_machine", "service": "systemd-timesyncd", "state": "active" },
|
||||
{ "type": "service_state", "vm": "build_machine", "service": "ntpd", "state": "active" }
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "or",
|
||||
"rules": [
|
||||
{ "type": "service_enabled", "vm": "build_machine", "service": "systemd-timesyncd", "enabled": true },
|
||||
{ "type": "service_enabled", "vm": "build_machine", "service": "ntpd", "enabled": true }
|
||||
]
|
||||
},
|
||||
{ "type": "package_installed", "vm": "build_machine", "package": "archlinux-keyring", "installed": true }
|
||||
]
|
||||
},
|
||||
"trust_delta": 3,
|
||||
"world_flags": ["vulcan_ntp_healthy", "vulcan_builds_healthy"],
|
||||
"follow_up_dialogue": "marcus-Q006-complete-clean"
|
||||
},
|
||||
{
|
||||
"id": "ntp-running-not-enabled",
|
||||
"label": "Running But Not Enabled at Boot",
|
||||
"priority": 50,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "service_state", "vm": "build_machine", "service": "systemd-timesyncd", "state": "active" },
|
||||
{ "type": "service_enabled", "vm": "build_machine", "service": "systemd-timesyncd", "enabled": false }
|
||||
]
|
||||
},
|
||||
"trust_delta": 1,
|
||||
"world_flags": ["vulcan_ntp_fragile", "vulcan_builds_healthy"],
|
||||
"follow_up_dialogue": "marcus-Q006-complete-fragile"
|
||||
}
|
||||
],
|
||||
"pressure_profile": null,
|
||||
"blast_radius": [],
|
||||
"unlock_requirements": ["world_flag:player_ssh_configured"],
|
||||
"narrative_phase": "unease",
|
||||
"linux_concepts": ["NTP", "systemd-timesyncd", "Arch Linux", "pacman", "package keyring"],
|
||||
"failure_conditions": ["NTP not enabled at boot", "package manager still failing signature checks"],
|
||||
"behavior_impact": {
|
||||
"default": { "curiosity_delta": 0, "obedience_delta": 1, "risk_delta": 0, "suspicion_delta": 0 }
|
||||
},
|
||||
"hidden_hook": null,
|
||||
"access_requirements": {
|
||||
"minimum_access": { "build_machine": "sudo" },
|
||||
"requires_root": false,
|
||||
"temporary_grants_allowed": []
|
||||
},
|
||||
"tags": ["ntp", "time", "pacman", "arch", "build_machine", "services"],
|
||||
"internal_notes": "First quest on vulcan. Introduces Arch Linux and pacman. The clock drift → GPG failure chain is real and genuinely confusing the first time you encounter it. The use of `or` on the NTP objective allows systemd-timesyncd, ntpd, or chronyd — any of them fixes the problem. The explicit check on package installs requires the player to confirm things work, not just that NTP is running."
|
||||
}
|
||||
@@ -0,0 +1,133 @@
|
||||
{
|
||||
"id": "Q007",
|
||||
"title": "Security Theater",
|
||||
"tier": 2,
|
||||
"primary_vm": "web_server",
|
||||
"required_vms": ["workstation", "web_server"],
|
||||
"ticket_id": "T007",
|
||||
"baseline_snapshot": "baseline.post-q004",
|
||||
"summary": "Someone ran a hardening script on hermes that set AllowUsers in sshd_config to only allow a single user: deploy-bot. Now the web-admin group cannot SSH in. Priya filed the ticket after her access was blocked mid-incident response. The AllowUsers directive is correct in intent (locking down SSH) but was applied too aggressively — it needs to include the web-admin group or the relevant users. The player must fix sshd_config and reload sshd without breaking service continuity. Complication: the player must not lock themselves out during the fix, and they must validate that the specific users Priya listed can still SSH.",
|
||||
"clue_fingerprint": {
|
||||
"description": "SSH connection attempts from web-admin accounts fail with 'Permission denied'. sshd_config contains 'AllowUsers deploy-bot' with no other entries. /etc/group shows web-admin group members. The hardening script is in /opt/security/harden-ssh.sh and its log shows it ran last night.",
|
||||
"evidence": [
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/ssh/sshd_config", "contains": "AllowUsers deploy-bot" },
|
||||
{ "type": "log_contains", "vm": "web_server", "path": "/var/log/auth.log", "contains": "User priya from" },
|
||||
{ "type": "file_exists", "vm": "web_server", "path": "/opt/security/harden-ssh.sh" }
|
||||
]
|
||||
},
|
||||
"objectives": [
|
||||
{
|
||||
"id": "sshd-config-corrected",
|
||||
"description": "sshd_config allows the web-admin group or its members",
|
||||
"check_mode": "passive",
|
||||
"validation": {
|
||||
"type": "or",
|
||||
"rules": [
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/ssh/sshd_config", "contains": "AllowGroups web-admin" },
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/ssh/sshd_config", "contains": "priya" }
|
||||
]
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "sshd-still-running",
|
||||
"description": "sshd remains active after config change",
|
||||
"check_mode": "passive",
|
||||
"validation": {
|
||||
"type": "service_state",
|
||||
"vm": "web_server",
|
||||
"service": "sshd",
|
||||
"state": "active"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "deploy-bot-still-allowed",
|
||||
"description": "deploy-bot access is preserved",
|
||||
"check_mode": "passive",
|
||||
"validation": {
|
||||
"type": "or",
|
||||
"rules": [
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/ssh/sshd_config", "contains": "deploy-bot" },
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/ssh/sshd_config", "contains": "AllowGroups" }
|
||||
]
|
||||
}
|
||||
}
|
||||
],
|
||||
"solution_branches": [
|
||||
{
|
||||
"id": "group-based-config",
|
||||
"label": "Proper Fix — Group-Based AllowGroups",
|
||||
"priority": 100,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/ssh/sshd_config", "contains": "AllowGroups web-admin" },
|
||||
{ "type": "service_state", "vm": "web_server", "service": "sshd", "state": "active" },
|
||||
{ "type": "not", "rule": { "type": "file_contains", "vm": "web_server", "path": "/etc/ssh/sshd_config", "contains": "AllowUsers" } }
|
||||
]
|
||||
},
|
||||
"trust_delta": 4,
|
||||
"world_flags": ["hermes_ssh_hardened_correct", "priya_access_restored"],
|
||||
"follow_up_dialogue": "priya-Q007-complete-clean",
|
||||
"follow_up_dialogues": ["marcus-Q007-complete-clean"],
|
||||
"_note": "Best fix. Switches from AllowUsers (fragile, breaks with new users) to AllowGroups (durable, group membership handles access). Trust bump is higher because this is the approach that will scale."
|
||||
},
|
||||
{
|
||||
"id": "allowusers-expanded",
|
||||
"label": "Acceptable Fix — AllowUsers Expanded",
|
||||
"priority": 60,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/ssh/sshd_config", "contains": "priya" },
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/ssh/sshd_config", "contains": "deploy-bot" },
|
||||
{ "type": "service_state", "vm": "web_server", "service": "sshd", "state": "active" }
|
||||
]
|
||||
},
|
||||
"trust_delta": 1,
|
||||
"world_flags": ["hermes_ssh_allowusers_fragile", "priya_access_restored"],
|
||||
"follow_up_dialogue": "priya-Q007-complete-fragile",
|
||||
"follow_up_dialogues": ["marcus-Q007-complete-fragile"],
|
||||
"_note": "Access is restored but using AllowUsers. Every future new user will need to be manually added. Marcus or Priya will note this later."
|
||||
},
|
||||
{
|
||||
"id": "hardening-removed",
|
||||
"label": "Regression — SSH Restriction Removed Entirely",
|
||||
"priority": 200,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "not", "rule": { "type": "file_contains", "vm": "web_server", "path": "/etc/ssh/sshd_config", "contains": "AllowUsers" } },
|
||||
{ "type": "not", "rule": { "type": "file_contains", "vm": "web_server", "path": "/etc/ssh/sshd_config", "contains": "AllowGroups" } },
|
||||
{ "type": "service_state", "vm": "web_server", "service": "sshd", "state": "active" }
|
||||
]
|
||||
},
|
||||
"trust_delta": -3,
|
||||
"world_flags": ["hermes_ssh_unrestricted", "priya_access_restored"],
|
||||
"follow_up_dialogue": "priya-Q007-complete-regression",
|
||||
"follow_up_dialogues": ["marcus-Q007-complete-regression"],
|
||||
"_note": "Player fixed access by removing all restrictions. Priya's access works but the hardening is gone. This is the worst valid outcome — Priya is back in but so is everyone else."
|
||||
}
|
||||
],
|
||||
"pressure_profile": "access_blocked_escalation",
|
||||
"blast_radius": [],
|
||||
"unlock_requirements": ["world_flag:player_ssh_configured"],
|
||||
"narrative_phase": "suspicion",
|
||||
"linux_concepts": ["sshd_config", "AllowGroups", "AllowUsers", "SSH access hardening"],
|
||||
"failure_conditions": ["Priya still locked out", "SSH restrictions removed entirely"],
|
||||
"behavior_impact": {
|
||||
"default": { "curiosity_delta": 1, "obedience_delta": 0, "risk_delta": 0, "suspicion_delta": 0 }
|
||||
},
|
||||
"hidden_hook": {
|
||||
"id": "q007_dale_ssh_key",
|
||||
"description": "An SSH key in hermes /root/.ssh/authorized_keys does not match any current staff. The fingerprint matches no documented key.",
|
||||
"discovery_method": "Player reads /root/.ssh/authorized_keys on hermes",
|
||||
"significance": "Dale had root SSH access to hermes that was never formally revoked."
|
||||
},
|
||||
"access_requirements": {
|
||||
"minimum_access": { "web_server": "sudo" },
|
||||
"requires_root": false,
|
||||
"temporary_grants_allowed": ["sudo:web_server:sshd"]
|
||||
},
|
||||
"tags": ["ssh", "security", "hardening", "sshd", "web_server"],
|
||||
"internal_notes": "This quest introduces Priya as a character and establishes that the player's fixes can have security implications, not just operational ones. The 'regression' branch should feel bad — Priya's grateful but Marcus or a later audit will surface it. The proper fix (AllowGroups) tests whether the player knows the difference between AllowUsers and AllowGroups. The sshd reload vs restart distinction matters here — a player who restarts sshd drops existing connections, which is more disruptive than reload."
|
||||
}
|
||||
@@ -0,0 +1,129 @@
|
||||
{
|
||||
"id": "Q008",
|
||||
"title": "Bad Upstream",
|
||||
"tier": 2,
|
||||
"primary_vm": "web_server",
|
||||
"required_vms": ["workstation", "web_server", "build_machine"],
|
||||
"ticket_id": "T008",
|
||||
"baseline_snapshot": "baseline.post-q006",
|
||||
"summary": "The internal package repository on vulcan is serving a broken version of the axiomworks-app package. A deploy on hermes pulled it in through the internal apt repo and the app is now crashing on startup. The player needs to identify that the problem is in the package (not the app config), trace it back to vulcan, find the broken build artifact, and either roll back the package on hermes or fix the build and republish. This is the first multi-VM quest — investigation crosses from hermes to vulcan.",
|
||||
"clue_fingerprint": {
|
||||
"description": "The app service (axiomworks-app) on hermes is failing. journalctl shows it exits immediately with a non-zero code. The package was updated yesterday via the internal repo at http://vulcan.internal/repo. On vulcan, /srv/repo/axiomworks-app_2.1.1-1_amd64.deb is present but was built from a broken source tarball. The previous version 2.1.0-1 is also in /srv/repo/ and works correctly.",
|
||||
"evidence": [
|
||||
{ "type": "service_state_is", "vm": "web_server", "service": "axiomworks-app", "state": "failed" },
|
||||
{ "type": "log_contains", "vm": "web_server", "path": "/var/log/axiomworks-app.log", "contains": "Exec format error" },
|
||||
{ "type": "file_exists", "vm": "build_machine", "path": "/srv/repo/axiomworks-app_2.1.0-1_amd64.deb" },
|
||||
{ "type": "file_exists", "vm": "build_machine", "path": "/srv/repo/axiomworks-app_2.1.1-1_amd64.deb" }
|
||||
]
|
||||
},
|
||||
"objectives": [
|
||||
{
|
||||
"id": "app-running",
|
||||
"description": "axiomworks-app is active and running",
|
||||
"check_mode": "passive",
|
||||
"validation": {
|
||||
"type": "service_state",
|
||||
"vm": "web_server",
|
||||
"service": "axiomworks-app",
|
||||
"state": "active"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "app-port-listening",
|
||||
"description": "App is accepting connections on expected port",
|
||||
"check_mode": "passive",
|
||||
"validation": {
|
||||
"type": "port_listening",
|
||||
"vm": "web_server",
|
||||
"port": 8080,
|
||||
"protocol": "tcp",
|
||||
"listening": true
|
||||
}
|
||||
}
|
||||
],
|
||||
"solution_branches": [
|
||||
{
|
||||
"id": "rollback-and-pin",
|
||||
"label": "Rollback to 2.1.0 and Pin Version",
|
||||
"priority": 100,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "service_state", "vm": "web_server", "service": "axiomworks-app", "state": "active" },
|
||||
{ "type": "port_listening", "vm": "web_server", "port": 8080, "protocol": "tcp", "listening": true },
|
||||
{ "type": "package_installed", "vm": "web_server", "package": "axiomworks-app=2.1.0", "installed": true },
|
||||
{ "type": "file_contains", "vm": "web_server", "path": "/etc/apt/preferences.d/axiomworks-app", "contains": "Pin: version 2.1.0" }
|
||||
]
|
||||
},
|
||||
"trust_delta": 3,
|
||||
"world_flags": ["hermes_app_running", "hermes_app_pinned_2-1-0", "vulcan_bad_build_known"],
|
||||
"follow_up_dialogue": "marcus-Q008-complete-rollback",
|
||||
"follow_up_dialogues": ["sarah-Q008-complete-pinned"],
|
||||
"_note": "Distinguished from rollback-only by an apt pin on hermes. The player must create an apt preferences file after rolling back."
|
||||
},
|
||||
{
|
||||
"id": "rebuild-and-redeploy",
|
||||
"label": "Rebuild on Vulcan and Redeploy",
|
||||
"priority": 80,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "service_state", "vm": "web_server", "service": "axiomworks-app", "state": "active" },
|
||||
{ "type": "port_listening", "vm": "web_server", "port": 8080, "protocol": "tcp", "listening": true },
|
||||
{ "type": "package_installed", "vm": "web_server", "package": "axiomworks-app=2.1.1", "installed": true },
|
||||
{ "type": "file_exists", "vm": "build_machine", "path": "/srv/repo/axiomworks-app_2.1.1-2_amd64.deb" }
|
||||
]
|
||||
},
|
||||
"trust_delta": 4,
|
||||
"world_flags": ["hermes_app_running", "vulcan_build_fixed"],
|
||||
"follow_up_dialogue": "marcus-Q008-complete-rebuild",
|
||||
"follow_up_dialogues": ["sarah-Q008-complete-rebuilt"],
|
||||
"_note": "Player fixed the build on vulcan and redeployed the corrected 2.1.1 package. This is the most thorough fix and gets highest trust, but is harder and requires understanding both machines. The rebuilt .deb increments the Debian revision from -1 to -2."
|
||||
},
|
||||
{
|
||||
"id": "rollback-only",
|
||||
"label": "Rollback Only — Version Not Pinned",
|
||||
"priority": 60,
|
||||
"validation": {
|
||||
"type": "and",
|
||||
"rules": [
|
||||
{ "type": "service_state", "vm": "web_server", "service": "axiomworks-app", "state": "active" },
|
||||
{ "type": "port_listening", "vm": "web_server", "port": 8080, "protocol": "tcp", "listening": true },
|
||||
{ "type": "package_installed", "vm": "web_server", "package": "axiomworks-app=2.1.0", "installed": true },
|
||||
{ "type": "not", "rule": { "type": "file_contains", "vm": "web_server", "path": "/etc/apt/preferences.d/axiomworks-app", "contains": "Pin: version 2.1.0" } }
|
||||
]
|
||||
},
|
||||
"trust_delta": 1,
|
||||
"world_flags": ["hermes_app_running", "vulcan_bad_build_known"],
|
||||
"follow_up_incident": "I003",
|
||||
"follow_up_dialogue": "marcus-Q008-complete-unpinned",
|
||||
"follow_up_dialogues": ["sarah-Q008-complete-unpinned"],
|
||||
"_note": "App is running on 2.1.0 but not pinned. No apt preferences pin exists on hermes. The next apt upgrade will pull 2.1.1 back in. I003 re-breaks the app on the next update cycle. The not-rule on the pin file ensures this branch cannot match when rollback-and-pin already matches."
|
||||
}
|
||||
],
|
||||
"pressure_profile": "app_outage_escalation",
|
||||
"blast_radius": ["I003"],
|
||||
"unlock_requirements": [
|
||||
"world_flag:player_ssh_configured",
|
||||
"world_flag:vulcan_ntp_healthy"
|
||||
],
|
||||
"narrative_phase": "suspicion",
|
||||
"linux_concepts": ["apt", "package pinning", "apt preferences", "internal package mirror", "build pipeline"],
|
||||
"failure_conditions": ["axiomworks-app still broken", "bad package not traced to build machine"],
|
||||
"behavior_impact": {
|
||||
"default": { "curiosity_delta": 1, "obedience_delta": 0, "risk_delta": 0, "suspicion_delta": 0 }
|
||||
},
|
||||
"hidden_hook": {
|
||||
"id": "q008_build_log_anomaly",
|
||||
"description": "vulcan's build log for 2.1.1 shows it was triggered by a manual invocation, not the automated pipeline, at 02:14.",
|
||||
"discovery_method": "Player reads /var/log/build-pipeline.log on vulcan and notices the timestamp and manual trigger field",
|
||||
"significance": "The bad build was triggered manually. Someone made the broken build, and it was not the pipeline."
|
||||
},
|
||||
"access_requirements": {
|
||||
"minimum_access": { "build_machine": "sudo", "web_server": "sudo" },
|
||||
"requires_root": false,
|
||||
"temporary_grants_allowed": []
|
||||
},
|
||||
"tags": ["packages", "builds", "multi-vm", "web_server", "build_machine", "deploy"],
|
||||
"internal_notes": "This is the first quest that requires the player to move between two target VMs — hermes and vulcan. The symptom is on hermes but the root cause is on vulcan. Players who don't follow the package trail will spend a long time on hermes looking for a config problem that isn't there. The rebuild branch requires understanding the package build enough to fix the source input and republish a corrected .deb — it's hard but rewarding. The rollback branches are now correctly differentiated: rollback-and-pin requires an apt preferences pin on hermes, and rollback-only explicitly requires its absence via a not-rule."
|
||||
}
|
||||
@@ -0,0 +1,29 @@
|
||||
{
|
||||
"categories": [
|
||||
{
|
||||
"id": "access",
|
||||
"label": "Access & Authentication",
|
||||
"articles": ["ssh-keys", "ssh-access-controls"]
|
||||
},
|
||||
{
|
||||
"id": "web",
|
||||
"label": "Web Services",
|
||||
"articles": ["nginx-config"]
|
||||
},
|
||||
{
|
||||
"id": "storage",
|
||||
"label": "Storage & Logs",
|
||||
"articles": ["disk-logs"]
|
||||
},
|
||||
{
|
||||
"id": "sysadmin",
|
||||
"label": "System Administration",
|
||||
"articles": ["file-permissions", "cron-jobs", "time-sync"]
|
||||
},
|
||||
{
|
||||
"id": "packages",
|
||||
"label": "Package Management",
|
||||
"articles": ["package-management"]
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,40 @@
|
||||
{
|
||||
"id": "cron-jobs",
|
||||
"title": "Cron Jobs & Scheduled Tasks",
|
||||
"category": "sysadmin",
|
||||
"tags": ["cron", "crontab", "schedule", "backup", "automation"],
|
||||
"updated": "2025-12-01",
|
||||
"summary": "Cron syntax, user vs system crons, and common failure modes.",
|
||||
"sections": [
|
||||
{
|
||||
"heading": "Cron Syntax",
|
||||
"body": "<p>A crontab entry has five time fields followed by the command:</p>",
|
||||
"code": "# ┌─── minute (0–59)\n# │ ┌─── hour (0–23)\n# │ │ ┌─── day of month (1–31)\n# │ │ │ ┌─── month (1–12)\n# │ │ │ │ ┌─── day of week (0–7, 0 and 7 are Sunday)\n# │ │ │ │ │\n * * * * * /path/to/command\n\n# Examples:\n0 2 * * * /usr/local/bin/backup.sh # 2am every day\n*/15 * * * * /usr/local/bin/check.sh # every 15 minutes\n0 0 1 * * /usr/local/bin/monthly.sh # midnight on the 1st"
|
||||
},
|
||||
{
|
||||
"heading": "User Crontabs",
|
||||
"body": "<p>Each user can have their own crontab. Commands run as that user.</p>",
|
||||
"code": "crontab -e # edit your crontab\ncrontab -l # list your crontab\ncrontab -l -u alice # list alice's crontab (root only)\ncrontab -r # delete your crontab (dangerous—no confirmation)"
|
||||
},
|
||||
{
|
||||
"heading": "System Cron Directories",
|
||||
"body": "<p>Scripts dropped into these directories run at the corresponding interval without needing a crontab entry:</p>",
|
||||
"code": "/etc/cron.daily/\n/etc/cron.weekly/\n/etc/cron.monthly/\n/etc/cron.hourly/\n\n# Scripts here must be executable and owned by root.\n# They must NOT have a file extension—run-parts ignores files with dots in the name."
|
||||
},
|
||||
{
|
||||
"heading": "Ownership and the PATH Problem",
|
||||
"body": "<p>Two common failure modes:</p><p><strong>Wrong owner:</strong> A cron script in <code>/etc/cron.daily/</code> must be owned by root. If it is owned by another user, run-parts may skip it.</p><p><strong>Missing PATH:</strong> Cron does not source <code>.bashrc</code> or <code>.profile</code>. Commands that work interactively may fail in cron because the PATH only contains <code>/usr/bin:/bin</code>. Always use full paths in cron scripts, or set PATH explicitly at the top of the script.</p>",
|
||||
"code": "#!/bin/bash\nPATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n..."
|
||||
},
|
||||
{
|
||||
"heading": "Checking If a Cron Ran",
|
||||
"body": "",
|
||||
"code": "# Check syslog or the cron-specific log\ngrep CRON /var/log/syslog | tail -20\ncat /var/log/cron.log # if separate cron log is configured\n\n# Check journald\njournalctl -u cron --since \"1 hour ago\""
|
||||
},
|
||||
{
|
||||
"heading": "Capturing Cron Output",
|
||||
"body": "<p>By default, cron mails output to the user. On servers with no mail configured, errors disappear silently. Redirect to a log file instead:</p>",
|
||||
"code": "0 2 * * * /usr/local/bin/backup.sh >> /var/log/backup.log 2>&1"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,43 @@
|
||||
{
|
||||
"id": "disk-logs",
|
||||
"title": "Disk Space & Log Rotation",
|
||||
"category": "storage",
|
||||
"tags": ["disk", "df", "du", "logs", "logrotate", "cleanup"],
|
||||
"updated": "2025-08-22",
|
||||
"summary": "Finding what is filling the disk and keeping logs from growing unbounded.",
|
||||
"sections": [
|
||||
{
|
||||
"heading": "Checking Disk Usage",
|
||||
"body": "<p><code>df</code> shows you how full each filesystem is. <code>du</code> tells you where the space went.</p>",
|
||||
"code": "df -h # human-readable filesystem summary\ndf -h /var/log # check a specific mount\n\ndu -sh /var/log/* # top-level breakdown of /var/log\ndu -sh /var/* | sort -rh # sort by size, largest first\ndu -sh /var/log/*.log # sizes of individual log files"
|
||||
},
|
||||
{
|
||||
"heading": "Finding Large Files",
|
||||
"body": "<p>When du does not point at an obvious culprit:</p>",
|
||||
"code": "# Files over 100MB anywhere on the system\nfind / -xdev -size +100M -type f 2>/dev/null\n\n# Files in /var that have grown recently\nfind /var -xdev -mtime -1 -size +10M -type f 2>/dev/null"
|
||||
},
|
||||
{
|
||||
"heading": "Emergency Cleanup",
|
||||
"body": "<p>If disk is at 100% and a service is failing because of it:</p>",
|
||||
"code": "# Truncate a log file without deleting it (safe for running processes)\ntruncate -s 0 /var/log/nginx/access.log\n\n# Remove old compressed logs (the .gz files are already rotated)\nrm /var/log/nginx/*.gz\n\n# Clear journald logs older than 2 days\njournalctl --vacuum-time=2d"
|
||||
},
|
||||
{
|
||||
"heading": "logrotate Basics",
|
||||
"body": "<p>logrotate is the standard tool for rotating and compressing logs on a schedule. It is usually run daily from cron. Config files live in <code>/etc/logrotate.d/</code>—one file per service.</p>"
|
||||
},
|
||||
{
|
||||
"heading": "Writing a logrotate Config",
|
||||
"body": "<p>Example for an nginx access log:</p>",
|
||||
"code": "/var/log/nginx/access.log {\n daily\n rotate 14\n compress\n delaycompress\n missingok\n notifempty\n sharedscripts\n postrotate\n /bin/kill -USR1 $(cat /run/nginx.pid 2>/dev/null) 2>/dev/null || true\n endscript\n}"
|
||||
},
|
||||
{
|
||||
"heading": "Testing logrotate",
|
||||
"body": "<p>Run logrotate manually in debug mode to verify a config without actually rotating anything:</p>",
|
||||
"code": "logrotate -d /etc/logrotate.d/nginx\n\n# To force a rotation right now (useful for testing):\nlogrotate -f /etc/logrotate.d/nginx"
|
||||
},
|
||||
{
|
||||
"heading": "Key logrotate Directives",
|
||||
"body": "<table><tr><th>Directive</th><th>Meaning</th></tr><tr><td><code>daily/weekly/monthly</code></td><td>Rotation frequency</td></tr><tr><td><code>rotate N</code></td><td>Keep N old copies</td></tr><tr><td><code>compress</code></td><td>gzip old files</td></tr><tr><td><code>delaycompress</code></td><td>Skip compressing the most recent rotation (useful when the app still has it open)</td></tr><tr><td><code>missingok</code></td><td>Do not error if the log file does not exist</td></tr><tr><td><code>notifempty</code></td><td>Skip rotation if the file is empty</td></tr><tr><td><code>size 100M</code></td><td>Rotate when file exceeds this size instead of on schedule</td></tr></table>"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,37 @@
|
||||
{
|
||||
"id": "file-permissions",
|
||||
"title": "File Ownership & Permissions",
|
||||
"category": "sysadmin",
|
||||
"tags": ["chown", "chmod", "permissions", "ownership", "ls"],
|
||||
"updated": "2025-10-07",
|
||||
"summary": "Understanding and fixing file ownership and permission bits.",
|
||||
"sections": [
|
||||
{
|
||||
"heading": "Reading the Permission String",
|
||||
"body": "<p>Run <code>ls -l</code> to see permissions. The first column looks like <code>-rwxr-xr--</code>.</p><ul><li>First character: <code>-</code> file, <code>d</code> directory, <code>l</code> symlink</li><li>Next three: owner read/write/execute</li><li>Next three: group read/write/execute</li><li>Last three: others read/write/execute</li></ul><p><code>r</code>=4, <code>w</code>=2, <code>x</code>=1. Add them up for octal notation: <code>rwx</code>=7, <code>rw-</code>=6, <code>r--</code>=4.</p>"
|
||||
},
|
||||
{
|
||||
"heading": "chown — Changing Ownership",
|
||||
"body": "<p>Change the owner and/or group of a file or directory.</p>",
|
||||
"code": "chown user file # change owner only\nchown user:group file # change owner and group\nchown :group file # change group only\n\n# Recursive — change everything under a directory\nchown -R user:group /path/to/dir"
|
||||
},
|
||||
{
|
||||
"heading": "chmod — Changing Permissions",
|
||||
"body": "",
|
||||
"code": "chmod 644 file.txt # rw-r--r-- (typical for files)\nchmod 755 /usr/local/bin/app # rwxr-xr-x (typical for executables)\nchmod 700 ~/.ssh # rwx------ (private directory)\nchmod 600 ~/.ssh/authorized_keys # rw------- (private file)\n\n# Recursive\nchmod -R 755 /var/www/html\n\n# Symbolic form (add execute for owner only)\nchmod u+x script.sh"
|
||||
},
|
||||
{
|
||||
"heading": "Common Patterns",
|
||||
"body": "<table><tr><th>Mode</th><th>Numeric</th><th>Typical use</th></tr><tr><td><code>rw-r--r--</code></td><td>644</td><td>Regular files, config files</td></tr><tr><td><code>rwxr-xr-x</code></td><td>755</td><td>Directories, executables</td></tr><tr><td><code>rwx------</code></td><td>700</td><td>Private directories (e.g. ~/.ssh)</td></tr><tr><td><code>rw-------</code></td><td>600</td><td>Private files (e.g. private keys, authorized_keys)</td></tr><tr><td><code>rwxrwxr-x</code></td><td>775</td><td>Shared directories where the group needs write access</td></tr></table>"
|
||||
},
|
||||
{
|
||||
"heading": "Checking Who Owns What",
|
||||
"body": "",
|
||||
"code": "ls -la /var/www/html # list with ownership\nstat file.txt # detailed file metadata\nfind /path -user root # find files owned by root\nfind /path -not -user deploy # find files NOT owned by deploy"
|
||||
},
|
||||
{
|
||||
"heading": "A Note on Recursive chown",
|
||||
"body": "<p>When you run <code>chown -R</code>, it changes <em>everything</em> under the path—including files and subdirectories that may have intentionally different ownership. Know what you are targeting before running it on a live system. Check with <code>ls -laR</code> or <code>find</code> first.</p>"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,38 @@
|
||||
{
|
||||
"id": "nginx-config",
|
||||
"title": "nginx Configuration",
|
||||
"category": "web",
|
||||
"tags": ["nginx", "config", "syntax", "reload", "vhost"],
|
||||
"updated": "2025-09-18",
|
||||
"summary": "nginx config structure, common syntax errors, and safe reload procedure.",
|
||||
"sections": [
|
||||
{
|
||||
"heading": "Config File Layout",
|
||||
"body": "<p>nginx uses a block-based config syntax. The main file is <code>/etc/nginx/nginx.conf</code>. Site configs live in <code>/etc/nginx/sites-available/</code> and are symlinked into <code>/etc/nginx/sites-enabled/</code> to activate them.</p><p>Every block opens with <code>{</code> and closes with <code>}</code>. Every directive ends with <code>;</code>. Missing either one will fail the syntax check.</p>"
|
||||
},
|
||||
{
|
||||
"heading": "Testing Config Before Reloading",
|
||||
"body": "<p>Always test before reloading. A bad config will prevent nginx from reloading, but it will <em>not</em> take down the running process—the old config stays live.</p>",
|
||||
"code": "nginx -t\n# or\nnginx -T # prints the full parsed config"
|
||||
},
|
||||
{
|
||||
"heading": "Reloading vs Restarting",
|
||||
"body": "<p>Use reload, not restart. Reload applies the new config without dropping existing connections.</p>",
|
||||
"code": "systemctl reload nginx\n\n# Only use restart if you have to—it drops active connections.\nsystemctl restart nginx"
|
||||
},
|
||||
{
|
||||
"heading": "Common Syntax Errors",
|
||||
"body": "<ul><li>Missing semicolon at the end of a directive</li><li>Missing closing brace <code>}</code> on a block</li><li>Typo in a directive name (nginx will report \"unknown directive\")</li><li>Referencing a cert file or log path that does not exist</li><li>Duplicate <code>listen</code> directives on the same port across multiple vhosts without <code>default_server</code> resolution</li></ul><p>The error message from <code>nginx -t</code> includes the file name and line number. Read it.</p>"
|
||||
},
|
||||
{
|
||||
"heading": "Useful Log Paths",
|
||||
"body": "<p>Default paths on Debian/Ubuntu:</p>",
|
||||
"code": "/var/log/nginx/error.log\n/var/log/nginx/access.log\n\n# Per-vhost logs are usually defined in the server block:\naccess_log /var/log/nginx/mysite.access.log;\nerror_log /var/log/nginx/mysite.error.log;"
|
||||
},
|
||||
{
|
||||
"heading": "Quick Vhost Template",
|
||||
"body": "<p>Minimal working vhost for a static site:</p>",
|
||||
"code": "server {\n listen 80;\n server_name example.internal;\n\n root /var/www/example;\n index index.html;\n\n location / {\n try_files $uri $uri/ =404;\n }\n\n access_log /var/log/nginx/example.access.log;\n error_log /var/log/nginx/example.error.log;\n}"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,49 @@
|
||||
{
|
||||
"id": "package-management",
|
||||
"title": "Package Management & Version Pinning",
|
||||
"category": "packages",
|
||||
"tags": ["apt", "pacman", "packages", "pinning", "rollback", "IgnorePkg"],
|
||||
"updated": "2026-01-08",
|
||||
"summary": "Installing, rolling back, and pinning packages on Debian and Arch Linux.",
|
||||
"sections": [
|
||||
{
|
||||
"heading": "Debian / Ubuntu (apt)",
|
||||
"body": "<p>Most commands need root.</p>",
|
||||
"code": "apt update # refresh package list\napt install nginx # install\napt remove nginx # remove (keep config)\napt purge nginx # remove + delete config\napt list --installed # list installed packages\napt show nginx # info about a package\ndpkg -l | grep nginx # alternative listing"
|
||||
},
|
||||
{
|
||||
"heading": "Listing Available Versions (Debian)",
|
||||
"body": "",
|
||||
"code": "apt-cache policy nginx\n# Shows installed version, candidate version, and all available versions by priority"
|
||||
},
|
||||
{
|
||||
"heading": "Installing a Specific Version (Debian)",
|
||||
"body": "",
|
||||
"code": "apt install nginx=1.22.1-9\n# Use apt-cache policy to find the exact version string first"
|
||||
},
|
||||
{
|
||||
"heading": "Pinning a Package (Debian)",
|
||||
"body": "<p>Pinning prevents apt from upgrading a specific package. Create or edit <code>/etc/apt/preferences.d/</code>:</p>",
|
||||
"code": "# /etc/apt/preferences.d/nginx-pin\nPackage: nginx\nPin: version 1.22.1-9\nPin-Priority: 1001\n\n# Priority > 1000 = keep this version even if newer is available\n# After creating the file:\napt-mark hold nginx # belt-and-suspenders hold\napt-cache policy nginx # verify the pin took effect"
|
||||
},
|
||||
{
|
||||
"heading": "Arch Linux (pacman)",
|
||||
"body": "",
|
||||
"code": "pacman -Syu # update all\npacman -S nginx # install\npacman -R nginx # remove\npacman -Rs nginx # remove + unneeded deps\npacman -Q | grep nginx # list installed\npacman -Qi nginx # info about installed package"
|
||||
},
|
||||
{
|
||||
"heading": "Rolling Back a Package (Arch)",
|
||||
"body": "<p>Arch keeps a package cache in <code>/var/cache/pacman/pkg/</code>. If the current package broke something:</p>",
|
||||
"code": "ls /var/cache/pacman/pkg/nginx*\n# Find the version you want, then:\npacman -U /var/cache/pacman/pkg/nginx-1.24.0-1-x86_64.pkg.tar.zst"
|
||||
},
|
||||
{
|
||||
"heading": "Preventing Upgrades (Arch — IgnorePkg)",
|
||||
"body": "<p>After rolling back, prevent the package from upgrading on the next <code>pacman -Syu</code>:</p>",
|
||||
"code": "# /etc/pacman.conf\n[options]\n...\nIgnorePkg = nginx\n\n# Verify:\npacman -Syu\n# Should print: warning: nginx: ignoring package upgrade (1.24.0-1 => 1.25.x-y)"
|
||||
},
|
||||
{
|
||||
"heading": "When to Pin vs When to Fix",
|
||||
"body": "<p>Pinning is a stop-gap, not a solution. Document why you pinned it and set a reminder to revisit. A pinned package stops receiving security updates. If the upstream bug is fixed in a newer minor version, upgrade to that instead of staying pinned indefinitely.</p>"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"id": "ssh-access-controls",
|
||||
"title": "SSH Server Access Controls",
|
||||
"category": "access",
|
||||
"tags": ["ssh", "sshd_config", "AllowUsers", "AllowGroups", "security", "hardening"],
|
||||
"updated": "2025-10-29",
|
||||
"summary": "Restricting who can SSH in using sshd_config directives.",
|
||||
"sections": [
|
||||
{
|
||||
"heading": "The Config File",
|
||||
"body": "<p>SSH server configuration lives in <code>/etc/ssh/sshd_config</code>. Drop-in overrides can go in <code>/etc/ssh/sshd_config.d/*.conf</code>.</p><p><strong>Always test your config before reloading:</strong></p>",
|
||||
"code": "sshd -t\n# If it prints nothing and exits 0, the config is valid.\nsystemctl reload ssh"
|
||||
},
|
||||
{
|
||||
"heading": "AllowUsers and AllowGroups",
|
||||
"body": "<p>These are whitelist directives. If either is set, only matching users or group members can log in. If neither is set, all users may try.</p>",
|
||||
"code": "# Only these users may log in\nAllowUsers alice bob deploy\n\n# Only members of these groups may log in\nAllowGroups sshusers ops\n\n# Combining: user must match AllowUsers AND (if AllowGroups is set) be in an allowed group\n# These are independent filters—if both are set, a user must satisfy both."
|
||||
},
|
||||
{
|
||||
"heading": "DenyUsers and DenyGroups",
|
||||
"body": "<p>Blacklist alternatives. <code>DenyUsers</code> and <code>DenyGroups</code> are checked before Allow rules.</p><p>Prefer <code>AllowUsers</code>/<code>AllowGroups</code> over Deny lists—it is safer to enumerate who <em>can</em> in rather than who cannot.</p>"
|
||||
},
|
||||
{
|
||||
"heading": "Other Common Restrictions",
|
||||
"body": "",
|
||||
"code": "# Disable root login entirely (recommended)\nPermitRootLogin no\n\n# Disable password authentication (once keys are working)\nPasswordAuthentication no\n\n# Change the listening port (minor obscurity, not real security)\nPort 2222\n\n# Restrict to specific network interface\nListenAddress 10.42.0.1\n\n# Idle session timeout (seconds × count before disconnect)\nClientAliveInterval 300\nClientAliveCountMax 2"
|
||||
},
|
||||
{
|
||||
"heading": "Match Blocks",
|
||||
"body": "<p>You can apply different rules to specific users, groups, or source addresses:</p>",
|
||||
"code": "# Allow password auth only from the management network\nMatch Address 10.42.0.0/24\n PasswordAuthentication yes\n\n# Give one user a restricted shell\nMatch User backup-agent\n ForceCommand /usr/local/bin/backup-only\n AllowTcpForwarding no"
|
||||
},
|
||||
{
|
||||
"heading": "Checking Who Has Access",
|
||||
"body": "<p>There is no built-in command to list all users who currently satisfy the access rules. Check manually:</p>",
|
||||
"code": "# Current AllowUsers/AllowGroups settings\ngrep -iE '(AllowUsers|AllowGroups|DenyUsers|DenyGroups)' /etc/ssh/sshd_config\n\n# Members of a group\ngetent group sshusers\n\n# All users with a valid shell (can SSH in if no restrictions)\ngrep -v '/nologin\\|/false' /etc/passwd"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,38 @@
|
||||
{
|
||||
"id": "ssh-keys",
|
||||
"title": "SSH Key Authentication",
|
||||
"category": "access",
|
||||
"tags": ["ssh", "authorized_keys", "keys", "permissions"],
|
||||
"updated": "2025-11-03",
|
||||
"summary": "How SSH key auth works and how to set it up correctly.",
|
||||
"sections": [
|
||||
{
|
||||
"heading": "How It Works",
|
||||
"body": "<p>SSH key authentication replaces passwords with a cryptographic key pair. The <strong>private key</strong> stays on your machine. The <strong>public key</strong> goes into <code>~/.ssh/authorized_keys</code> on the target host. When you connect, the server checks whether your private key corresponds to one of the public keys it trusts.</p><p>There is no password transmitted. Either the key matches or the connection fails.</p>"
|
||||
},
|
||||
{
|
||||
"heading": "Generating a Key Pair",
|
||||
"body": "<p>Use <code>ed25519</code> unless something forces you onto RSA. It is smaller and more secure.</p>",
|
||||
"code": "ssh-keygen -t ed25519 -C \"your-comment-here\"\n# Accept the default path (~/.ssh/id_ed25519) or specify one.\n# Passphrase is optional but recommended for keys that leave your machine."
|
||||
},
|
||||
{
|
||||
"heading": "Installing the Public Key",
|
||||
"body": "<p>Copy the public key to the remote host:</p>",
|
||||
"code": "# Option 1 — if password auth is still working\nssh-copy-id -i ~/.ssh/id_ed25519.pub user@host\n\n# Option 2 — manually\ncat ~/.ssh/id_ed25519.pub >> ~/.ssh/authorized_keys"
|
||||
},
|
||||
{
|
||||
"heading": "File and Directory Permissions",
|
||||
"body": "<p>This is the most common reason key auth fails. SSH will silently reject keys if the permissions are too open.</p>",
|
||||
"code": "chmod 700 ~/.ssh\nchmod 600 ~/.ssh/authorized_keys\nchown -R youruser:youruser ~/.ssh"
|
||||
},
|
||||
{
|
||||
"heading": "Troubleshooting",
|
||||
"body": "<p>Run <code>ssh -v user@host</code> for verbose output. The auth failure reason is usually in the first 20 lines.</p><p>Common causes:</p><ul><li><code>authorized_keys</code> file has wrong permissions (see above)</li><li><code>~/.ssh</code> directory is world-writable</li><li><code>authorized_keys</code> file does not exist</li><li>The file exists but is empty or the key was pasted with a line break in the middle</li><li><code>sshd_config</code> has <code>PubkeyAuthentication no</code></li></ul>"
|
||||
},
|
||||
{
|
||||
"heading": "Checking the sshd Config",
|
||||
"body": "<p>Relevant lines in <code>/etc/ssh/sshd_config</code>:</p>",
|
||||
"code": "PubkeyAuthentication yes\nAuthorizedKeysFile .ssh/authorized_keys\n\n# After editing sshd_config, test before reloading:\nsshd -t\nsystemctl reload ssh"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,44 @@
|
||||
{
|
||||
"id": "time-sync",
|
||||
"title": "System Time & NTP",
|
||||
"category": "sysadmin",
|
||||
"tags": ["ntp", "time", "timedatectl", "timesyncd", "chrony", "drift"],
|
||||
"updated": "2025-07-14",
|
||||
"summary": "Keeping system clocks accurate and diagnosing time drift.",
|
||||
"sections": [
|
||||
{
|
||||
"heading": "Why System Time Matters",
|
||||
"body": "<p>Clocks that drift cause more problems than you expect: SSL certificate validation failures, log timestamps that do not correlate across machines, cron jobs that fire at the wrong time, authentication tokens that expire prematurely, and package signature checks that fail.</p><p>On a server, time should be correct to within a second. Most NTP implementations keep it within milliseconds.</p>"
|
||||
},
|
||||
{
|
||||
"heading": "Checking Current Time Status",
|
||||
"body": "",
|
||||
"code": "timedatectl\n# Shows: local time, UTC time, timezone, NTP sync status, RTC time\n\ntimedatectl show\n# Machine-readable version of the same"
|
||||
},
|
||||
{
|
||||
"heading": "systemd-timesyncd",
|
||||
"body": "<p>Most Debian/Ubuntu systems ship with <code>systemd-timesyncd</code> as the default NTP client. It is a lightweight SNTP implementation—adequate for most servers.</p>",
|
||||
"code": "# Enable and start\nsystemctl enable --now systemd-timesyncd\n\n# Check sync status\ntimedatectl timesync-status\n\n# Force a resync\nsystemctl restart systemd-timesyncd\n\n# Config file (NTP servers, fallback)\ncat /etc/systemd/timesyncd.conf"
|
||||
},
|
||||
{
|
||||
"heading": "NTP Server Configuration",
|
||||
"body": "<p>The default NTP servers are usually fine. If you need to change them—for example, to use an internal NTP server:</p>",
|
||||
"code": "# /etc/systemd/timesyncd.conf\n[Time]\nNTP=ntp.internal.example.com\nFallbackNTP=0.debian.pool.ntp.org 1.debian.pool.ntp.org"
|
||||
},
|
||||
{
|
||||
"heading": "chrony (alternative)",
|
||||
"body": "<p>chrony is a more capable NTP implementation. It handles intermittent network connections and large initial offsets better than timesyncd. On systems where accuracy matters:</p>",
|
||||
"code": "apt install chrony\nsystemctl enable --now chrony\n\nchronyc tracking # current sync status\nchronyc sources -v # configured time sources and their offsets"
|
||||
},
|
||||
{
|
||||
"heading": "Diagnosing Time Problems",
|
||||
"body": "",
|
||||
"code": "# Is NTP enabled?\ntimedatectl | grep NTP\n\n# Is timesyncd active?\nsystemctl status systemd-timesyncd\n\n# Did a sync happen recently?\njournalctl -u systemd-timesyncd --since \"1 hour ago\"\n\n# What is the current offset?\ntimedatectl timesync-status | grep Offset"
|
||||
},
|
||||
{
|
||||
"heading": "Setting Timezone",
|
||||
"body": "",
|
||||
"code": "timedatectl list-timezones | grep Europe\ntimedatectl set-timezone Europe/London"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"id": "T001",
|
||||
"from": "Marcus Webb <m.webb@axiomworks.internal>",
|
||||
"subject": "Your workstation access",
|
||||
"body": "Hey, welcome to the team. HR said you started today so I got you set up with an account on ares. The provisioning script runs automatically but it does not handle SSH keys — you will need to add yours manually. Your public key should be in the onboarding doc. Let me know if you get stuck.\n\n— Marcus",
|
||||
"initial_priority": "low",
|
||||
"current_priority": "low",
|
||||
"target_vm": "workstation",
|
||||
"linked_quest": "Q001",
|
||||
"tags": ["onboarding", "ssh", "workstation"],
|
||||
"deadline_behavior": "none",
|
||||
"attachments": ["docs/onboarding.json"]
|
||||
}
|
||||
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"id": "T002",
|
||||
"from": "Sarah Chen <s.chen@axiomworks.internal>",
|
||||
"subject": "[prod-web] site is down",
|
||||
"body": "Getting connection refused on the main site. Started about 20 minutes ago. Nothing changed on our end as far as I know.",
|
||||
"initial_priority": "high",
|
||||
"current_priority": "high",
|
||||
"target_vm": "web_server",
|
||||
"linked_quest": "Q002",
|
||||
"tags": ["services", "web", "nginx"],
|
||||
"deadline_behavior": "escalates",
|
||||
"follow_up_ticket_ids": ["T002-followup"]
|
||||
}
|
||||
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"id": "T003-recurrence",
|
||||
"from": "Monitoring <alerts@axiomworks.internal>",
|
||||
"subject": "disk pressure returned on hermes",
|
||||
"body": "Disk pressure has returned on hermes. /var/log/nginx/access.log is growing again and the host is trending back toward saturation.",
|
||||
"initial_priority": "high",
|
||||
"current_priority": "high",
|
||||
"target_vm": "web_server",
|
||||
"linked_quest": "Q003",
|
||||
"tags": ["web", "disk", "nginx", "recurrence"],
|
||||
"deadline_behavior": "escalates",
|
||||
"_note": "Recurrence ticket emitted by I001 when the earlier partial fix allows log pressure to return."
|
||||
}
|
||||
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"id": "T003",
|
||||
"from": "Dave Okonkwo <d.okonkwo@axiomworks.internal>",
|
||||
"subject": "is the website slow for anyone else",
|
||||
"body": "Pages are loading really slowly for me. Sometimes they time out. I rebooted my laptop but it did not help. Is something wrong on the server side?",
|
||||
"initial_priority": "medium",
|
||||
"current_priority": "medium",
|
||||
"target_vm": "web_server",
|
||||
"linked_quest": "Q003",
|
||||
"tags": ["web", "disk", "nginx"],
|
||||
"deadline_behavior": "escalates",
|
||||
"_note": "Dave is reporting symptoms of the disk being nearly full causing nginx write failures and slowdowns. He thinks it's a network issue. He is wrong but his symptom report is accurate."
|
||||
}
|
||||
@@ -0,0 +1,13 @@
|
||||
{
|
||||
"id": "T004",
|
||||
"from": "Sarah Chen <s.chen@axiomworks.internal>",
|
||||
"subject": "deployment not applying",
|
||||
"body": "I pushed a change this morning and the site is still showing the old version. I confirmed the deploy script ran and it said it completed successfully. But the file timestamp on the server doesn't match what I deployed. Did something change in how deploys work?",
|
||||
"initial_priority": "medium",
|
||||
"current_priority": "medium",
|
||||
"target_vm": "web_server",
|
||||
"linked_quest": "Q004",
|
||||
"tags": ["deploy", "permissions", "web_server"],
|
||||
"deadline_behavior": "none",
|
||||
"_note": "Sarah correctly identifies the symptom but assumes the script is at fault. The script is fine. The permissions are the problem. Her description of the deploy 'completing successfully' is accurate — the script ran, it just could not overwrite root-owned files and silently skipped them."
|
||||
}
|
||||
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"id": "T005",
|
||||
"from": "Dave Okonkwo <d.okonkwo@axiomworks.internal>",
|
||||
"subject": "disk warning on hermes again",
|
||||
"body": "Got an alert that /var/backups is at 85%. I don't know if this is related to what was going on before. Probably fine but figured you should know.",
|
||||
"initial_priority": "low",
|
||||
"current_priority": "low",
|
||||
"target_vm": "web_server",
|
||||
"linked_quest": "Q005",
|
||||
"tags": [
|
||||
"disk",
|
||||
"backup",
|
||||
"web_server"
|
||||
],
|
||||
"deadline_behavior": "escalates"
|
||||
}
|
||||
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"id": "T006",
|
||||
"from": "Dave Okonkwo <d.okonkwo@axiomworks.internal>",
|
||||
"subject": "builds failing on vulcan",
|
||||
"body": "Getting signature errors every time I try to install anything on the build machine. Tried pacman -Syu and it fails partway through. I didn't change anything. It was working yesterday.",
|
||||
"initial_priority": "medium",
|
||||
"current_priority": "medium",
|
||||
"target_vm": "build_machine",
|
||||
"linked_quest": "Q006",
|
||||
"tags": [
|
||||
"pacman",
|
||||
"build_machine",
|
||||
"packages"
|
||||
],
|
||||
"deadline_behavior": "none"
|
||||
}
|
||||
@@ -0,0 +1,17 @@
|
||||
{
|
||||
"id": "T007",
|
||||
"from": "Priya Nair <p.nair@axiomworks.internal>",
|
||||
"subject": "locked out of hermes",
|
||||
"body": "I cannot SSH into hermes. Permission denied immediately. I was in the middle of something. Who ran a hardening script without telling anyone.",
|
||||
"initial_priority": "critical",
|
||||
"current_priority": "critical",
|
||||
"target_vm": "web_server",
|
||||
"linked_quest": "Q007",
|
||||
"tags": [
|
||||
"ssh",
|
||||
"access",
|
||||
"web_server",
|
||||
"security"
|
||||
],
|
||||
"deadline_behavior": "escalates"
|
||||
}
|
||||
@@ -0,0 +1,18 @@
|
||||
{
|
||||
"id": "T008",
|
||||
"from": "Sarah Chen <s.chen@axiomworks.internal>",
|
||||
"subject": "app is down after update",
|
||||
"body": "The deploy ran this morning and now the app won't start. It's returning nothing on 8080. The update pulled in a new package version. I don't know if that's the problem but the timing is suspicious.",
|
||||
"initial_priority": "high",
|
||||
"current_priority": "high",
|
||||
"target_vm": "web_server",
|
||||
"linked_quest": "Q008",
|
||||
"tags": [
|
||||
"app",
|
||||
"deploy",
|
||||
"packages",
|
||||
"web_server"
|
||||
],
|
||||
"deadline_behavior": "escalates",
|
||||
"_note": "Sarah correctly suspects the package update. She doesn't know the build machine is involved."
|
||||
}
|
||||
@@ -0,0 +1,41 @@
|
||||
{
|
||||
"id": "build_machine",
|
||||
"domain": "sc-build-machine",
|
||||
"hostname": "vulcan",
|
||||
"distro": "arch",
|
||||
"role": "Build/package/update quest target VM",
|
||||
"display_name": "Build Machine (vulcan)",
|
||||
"profile_type": "headless_server",
|
||||
"resource_budget": {
|
||||
"ram_mb": 384,
|
||||
"vcpus": 2,
|
||||
"disk_gb": 10,
|
||||
"note": "Slightly more CPU for build tasks. Still headless."
|
||||
},
|
||||
"network": {
|
||||
"mode": "quest",
|
||||
"libvirt_network": "sc-internal",
|
||||
"optional_outbound": "sc-pkg-mirror",
|
||||
"note": "Selective outbound access to package mirror for update quests."
|
||||
},
|
||||
"ssh_user": "player",
|
||||
"ssh_key": "~/.ssh/sc_host_key",
|
||||
"snapshots": {
|
||||
"baseline": "baseline.clean",
|
||||
"recovery": "baseline.recovery",
|
||||
"checkpoint_prefix": "checkpoint.shift-",
|
||||
"max_checkpoints": 5
|
||||
},
|
||||
"guest_helper": {
|
||||
"name": "ops-telemetry-cache",
|
||||
"path": "/usr/local/bin/ops-telemetry-cache",
|
||||
"trusted": false
|
||||
},
|
||||
"display": {
|
||||
"type": "vnc",
|
||||
"fallback": "spice"
|
||||
},
|
||||
"always_live": false,
|
||||
"quests": ["Q006", "Q008"],
|
||||
"note": "Arch Linux build machine. Named vulcan — the forge. Handles package/build/update quests."
|
||||
}
|
||||
@@ -0,0 +1,40 @@
|
||||
{
|
||||
"id": "web_server",
|
||||
"domain": "sc-web-server",
|
||||
"hostname": "hermes",
|
||||
"distro": "debian",
|
||||
"role": "Web/service quest target VM",
|
||||
"display_name": "Web Server (hermes)",
|
||||
"profile_type": "headless_server",
|
||||
"resource_budget": {
|
||||
"ram_mb": 256,
|
||||
"vcpus": 1,
|
||||
"disk_gb": 6,
|
||||
"note": "Lightweight headless Debian server. No desktop, no graphical tools needed."
|
||||
},
|
||||
"network": {
|
||||
"mode": "quest",
|
||||
"libvirt_network": "sc-internal"
|
||||
},
|
||||
"ssh_user": "player",
|
||||
"ssh_key": "~/.ssh/sc_host_key",
|
||||
"snapshots": {
|
||||
"baseline": "baseline.clean",
|
||||
"recovery": "baseline.recovery",
|
||||
"checkpoint_prefix": "checkpoint.shift-",
|
||||
"max_checkpoints": 5
|
||||
},
|
||||
"guest_helper": {
|
||||
"name": "yardd",
|
||||
"path": "/usr/local/bin/yardd",
|
||||
"trusted": false
|
||||
},
|
||||
"display": {
|
||||
"type": "vnc",
|
||||
"fallback": "spice",
|
||||
"note": "VNC preferred for headless terminal. Fallback to SPICE if VNC unavailable."
|
||||
},
|
||||
"always_live": false,
|
||||
"quests": ["Q002", "Q003", "Q004", "Q005", "Q007"],
|
||||
"note": "Primary target VM for web service quests. Hosted on Debian. Named hermes after the messenger."
|
||||
}
|
||||
@@ -0,0 +1,40 @@
|
||||
{
|
||||
"id": "workstation",
|
||||
"domain": "sc-workstation",
|
||||
"hostname": "ares",
|
||||
"distro": "debian",
|
||||
"role": "Player desktop workstation with browser HUD, terminal, and SSH entry point",
|
||||
"display_name": "Workstation (ares)",
|
||||
"profile_type": "desktop_xfce",
|
||||
"resource_budget": {
|
||||
"ram_mb": 768,
|
||||
"vcpus": 1,
|
||||
"disk_gb": 12,
|
||||
"note": "Lightweight XFCE desktop with Chromium HUD and Tilix terminal."
|
||||
},
|
||||
"network": {
|
||||
"mode": "quest",
|
||||
"libvirt_network": "sc-internal"
|
||||
},
|
||||
"ssh_user": "player",
|
||||
"management_user": "opsbridge",
|
||||
"ssh_key": "~/.ssh/sc_host_key",
|
||||
"snapshots": {
|
||||
"baseline": "baseline.day-one",
|
||||
"recovery": "baseline.recovery",
|
||||
"checkpoint_prefix": "checkpoint.shift-",
|
||||
"max_checkpoints": 5
|
||||
},
|
||||
"guest_helper": {
|
||||
"name": "atlas-index",
|
||||
"path": "/usr/local/bin/atlas-index",
|
||||
"trusted": false
|
||||
},
|
||||
"display": {
|
||||
"type": "spice",
|
||||
"video": "virtio",
|
||||
"note": "Player uses the real XFCE workstation desktop through SPICE with virtio video. QXL is available as the spice-qxl build mode for compatibility testing."
|
||||
},
|
||||
"always_live": true,
|
||||
"note": "The workstation VM stays live during gameplay. The browser opens the host-served HUD and Tilix provides real terminal access to the lab VMs."
|
||||
}
|
||||
@@ -0,0 +1,233 @@
|
||||
{
|
||||
"_schema_version": "1.1",
|
||||
"_description": "Central registry of all world flags. Every flag used in any quest, incident, or dialogue must be declared here. Flags not in this registry will fail content validation.",
|
||||
|
||||
"flags": [
|
||||
{
|
||||
"id": "player_ssh_configured",
|
||||
"description": "Player has added their public key to ~/.ssh/authorized_keys on the workstation with correct permissions.",
|
||||
"set_by": ["Q001"],
|
||||
"read_by": ["Q002", "Q003", "Q004", "Q005", "Q006", "Q007", "Q008"],
|
||||
"gates": ["quest_unlock:Q002", "quest_unlock:Q003", "quest_unlock:Q004"],
|
||||
"persists": true
|
||||
},
|
||||
{
|
||||
"id": "player_loose_permissions",
|
||||
"description": "Player set up authorized_keys but with overly permissive file or directory permissions.",
|
||||
"set_by": ["Q001"],
|
||||
"read_by": ["marcus-Q001"],
|
||||
"gates": [],
|
||||
"persists": true
|
||||
},
|
||||
{
|
||||
"id": "nginx_stable",
|
||||
"description": "Nginx is correctly configured, running, and enabled on hermes.",
|
||||
"set_by": ["Q002"],
|
||||
"read_by": ["Q003"],
|
||||
"gates": [],
|
||||
"persists": true,
|
||||
"conflicts_with": ["nginx_unstable"]
|
||||
},
|
||||
{
|
||||
"id": "nginx_unstable",
|
||||
"description": "Nginx is running but has a known fragility — not enabled on boot, or a quick-fix config.",
|
||||
"set_by": ["Q002"],
|
||||
"read_by": ["Q003"],
|
||||
"gates": [],
|
||||
"persists": true,
|
||||
"conflicts_with": ["nginx_stable"]
|
||||
},
|
||||
{
|
||||
"id": "hermes_web_healthy",
|
||||
"description": "The web server on hermes is responding to requests normally.",
|
||||
"set_by": ["Q002"],
|
||||
"read_by": ["Q003", "Q004"],
|
||||
"gates": [],
|
||||
"persists": true,
|
||||
"conflicts_with": ["hermes_web_down"]
|
||||
},
|
||||
{
|
||||
"id": "hermes_web_down",
|
||||
"description": "Nginx on hermes is inactive.",
|
||||
"set_by": ["Q002", "Q003"],
|
||||
"read_by": ["sarah-Q003-angry"],
|
||||
"gates": [],
|
||||
"persists": true,
|
||||
"conflicts_with": ["hermes_web_healthy"]
|
||||
},
|
||||
{
|
||||
"id": "hermes_logrotate_healthy",
|
||||
"description": "Nginx logrotate config exists and is correctly configured on hermes.",
|
||||
"set_by": ["Q003"],
|
||||
"read_by": ["I001"],
|
||||
"gates": [],
|
||||
"persists": true,
|
||||
"conflicts_with": ["hermes_log_pressure_pending"]
|
||||
},
|
||||
{
|
||||
"id": "hermes_disk_healthy",
|
||||
"description": "Disk utilization on hermes is below the alert threshold.",
|
||||
"set_by": ["Q003"],
|
||||
"read_by": ["I001"],
|
||||
"gates": [],
|
||||
"persists": false
|
||||
},
|
||||
{
|
||||
"id": "hermes_log_pressure_pending",
|
||||
"description": "Disk was cleared on hermes but logrotate is not configured. Log will grow again.",
|
||||
"set_by": ["Q003"],
|
||||
"read_by": ["I001"],
|
||||
"gates": ["incident_trigger:I001"],
|
||||
"persists": true,
|
||||
"conflicts_with": ["hermes_logrotate_healthy"]
|
||||
},
|
||||
{
|
||||
"id": "web_disk_pressure_active",
|
||||
"description": "Disk pressure on hermes is actively worsening due to unrotated logs.",
|
||||
"set_by": ["I001"],
|
||||
"read_by": [],
|
||||
"gates": [],
|
||||
"persists": false
|
||||
},
|
||||
{
|
||||
"id": "hermes_deploy_healthy",
|
||||
"description": "Web root ownership on hermes is correct and the deploy service can run without errors.",
|
||||
"set_by": ["Q004"],
|
||||
"read_by": [],
|
||||
"gates": [],
|
||||
"persists": true,
|
||||
"conflicts_with": ["hermes_deploy_partial"]
|
||||
},
|
||||
{
|
||||
"id": "hermes_deploy_partial",
|
||||
"description": "Web root top-level ownership is corrected but child files are still root-owned.",
|
||||
"set_by": ["Q004"],
|
||||
"read_by": [],
|
||||
"gates": [],
|
||||
"persists": true,
|
||||
"conflicts_with": ["hermes_deploy_healthy"]
|
||||
},
|
||||
{
|
||||
"id": "hermes_backup_healthy",
|
||||
"description": "Backup cron job runs as backup-agent, old files cleaned, disk below threshold.",
|
||||
"set_by": ["Q005"],
|
||||
"read_by": ["I002"],
|
||||
"gates": [],
|
||||
"persists": true,
|
||||
"conflicts_with": ["hermes_backup_partial", "hermes_backup_root_running"]
|
||||
},
|
||||
{
|
||||
"id": "hermes_backup_partial",
|
||||
"description": "Cron job user corrected but old root-owned backup files not cleaned up.",
|
||||
"set_by": ["Q005"],
|
||||
"read_by": ["I002"],
|
||||
"gates": ["incident_trigger:I002"],
|
||||
"persists": true,
|
||||
"conflicts_with": ["hermes_backup_healthy"]
|
||||
},
|
||||
{
|
||||
"id": "hermes_backup_root_running",
|
||||
"description": "Disk was cleared but the cron job is still running as root. Problem will recur.",
|
||||
"set_by": ["Q005"],
|
||||
"read_by": ["I002"],
|
||||
"gates": ["incident_trigger:I002"],
|
||||
"persists": true,
|
||||
"conflicts_with": ["hermes_backup_healthy"]
|
||||
},
|
||||
{
|
||||
"id": "vulcan_ntp_healthy",
|
||||
"description": "Time synchronization is active and enabled at boot on vulcan.",
|
||||
"set_by": ["Q006"],
|
||||
"read_by": ["Q008"],
|
||||
"gates": ["quest_unlock:Q008"],
|
||||
"persists": true,
|
||||
"conflicts_with": ["vulcan_ntp_fragile"]
|
||||
},
|
||||
{
|
||||
"id": "vulcan_ntp_fragile",
|
||||
"description": "NTP is running on vulcan but not enabled at boot.",
|
||||
"set_by": ["Q006"],
|
||||
"read_by": [],
|
||||
"gates": [],
|
||||
"persists": true,
|
||||
"conflicts_with": ["vulcan_ntp_healthy"]
|
||||
},
|
||||
{
|
||||
"id": "vulcan_builds_healthy",
|
||||
"description": "Package management on vulcan works without signature errors.",
|
||||
"set_by": ["Q006"],
|
||||
"read_by": ["Q008"],
|
||||
"gates": [],
|
||||
"persists": true
|
||||
},
|
||||
{
|
||||
"id": "hermes_ssh_hardened_correct",
|
||||
"description": "sshd on hermes uses AllowGroups with web-admin, correctly restricting access.",
|
||||
"set_by": ["Q007"],
|
||||
"read_by": [],
|
||||
"gates": [],
|
||||
"persists": true,
|
||||
"conflicts_with": ["hermes_ssh_allowusers_fragile", "hermes_ssh_unrestricted"]
|
||||
},
|
||||
{
|
||||
"id": "hermes_ssh_allowusers_fragile",
|
||||
"description": "sshd uses AllowUsers — works but requires manual updates for new users.",
|
||||
"set_by": ["Q007"],
|
||||
"read_by": [],
|
||||
"gates": [],
|
||||
"persists": true,
|
||||
"conflicts_with": ["hermes_ssh_hardened_correct", "hermes_ssh_unrestricted"]
|
||||
},
|
||||
{
|
||||
"id": "hermes_ssh_unrestricted",
|
||||
"description": "SSH hardening was removed entirely from hermes.",
|
||||
"set_by": ["Q007"],
|
||||
"read_by": [],
|
||||
"gates": [],
|
||||
"persists": true,
|
||||
"conflicts_with": ["hermes_ssh_hardened_correct", "hermes_ssh_allowusers_fragile"]
|
||||
},
|
||||
{
|
||||
"id": "priya_access_restored",
|
||||
"description": "Priya Nair can SSH to hermes again.",
|
||||
"set_by": ["Q007"],
|
||||
"read_by": ["priya-Q007"],
|
||||
"gates": [],
|
||||
"persists": true
|
||||
},
|
||||
{
|
||||
"id": "hermes_app_running",
|
||||
"description": "axiomworks-app is active and serving on hermes.",
|
||||
"set_by": ["Q008"],
|
||||
"read_by": [],
|
||||
"gates": [],
|
||||
"persists": true
|
||||
},
|
||||
{
|
||||
"id": "hermes_app_pinned_2-1-0",
|
||||
"description": "axiomworks-app is pinned to version 2.1.0 on hermes to avoid the broken 2.1.1.",
|
||||
"set_by": ["Q008"],
|
||||
"read_by": ["I003"],
|
||||
"gates": [],
|
||||
"persists": true
|
||||
},
|
||||
{
|
||||
"id": "vulcan_bad_build_known",
|
||||
"description": "The broken 2.1.1 build on vulcan has been identified but not yet fixed.",
|
||||
"set_by": ["Q008"],
|
||||
"read_by": [],
|
||||
"gates": [],
|
||||
"persists": true,
|
||||
"conflicts_with": ["vulcan_build_fixed"]
|
||||
},
|
||||
{
|
||||
"id": "vulcan_build_fixed",
|
||||
"description": "The broken 2.1.1 build was rebuilt correctly on vulcan and republished.",
|
||||
"set_by": ["Q008"],
|
||||
"read_by": [],
|
||||
"gates": [],
|
||||
"persists": true,
|
||||
"conflicts_with": ["vulcan_bad_build_known"]
|
||||
}
|
||||
]
|
||||
}
|
||||
Reference in New Issue
Block a user