chore: bootstrap lean sysadmin-chronicles repo
Import the runnable game code, content, docs, scripts, and repo guidance while leaving local agent state, dependency installs, build output, and backup copies out of the published tree.
This commit is contained in:
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"id": "marcus-Q001",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q001",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 1,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "The onboarding doc has your key and the path you need. It's in /etc/axiom/onboarding on ares once you're in. Or ask me and I'll paste it here. Either way."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "Start in your home directory. You need a .ssh folder if it does not exist yet. Then authorized_keys inside it."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "The permissions matter more than people expect. SSH will silently refuse a key if the file or the directory is group-writable. 700 on the folder, 600 on the file."
|
||||
},
|
||||
{
|
||||
"stage": "hint_3",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "mkdir -p ~/.ssh && chmod 700 ~/.ssh. Then echo your public key into ~/.ssh/authorized_keys and chmod 600 that file. That is the whole thing."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:player_ssh_configured",
|
||||
"body": "Good. You're in. I'll send you the next thing shortly. The coffee machine on this floor is broken, heads up."
|
||||
},
|
||||
{
|
||||
"stage": "complete-permissive",
|
||||
"trigger": "world_flag:player_loose_permissions",
|
||||
"body": "Key's in there. One thing though — check the permissions on that file. SSH is picky about it. Might not bite you today but it will eventually."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"id": "marcus-Q002",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q002",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 2,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "Sarah's ticket is real. The site's down. Hermes is the web server — you can SSH from ares. Have a look at what nginx is doing."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "If nginx won't start, it usually tells you why. Try nginx -t before you touch anything else."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "Whatever the error says, it will include a file path and a line number. Go look at that exact spot."
|
||||
},
|
||||
{
|
||||
"stage": "hint_3",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "Config syntax errors are usually small. Missing semicolons, wrong brackets, typos on directive names. Read it carefully."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:nginx_stable",
|
||||
"body": "Good. Sarah will see it come back up. Worth checking systemctl is-enabled nginx while you're there — if someone broke the config they may have been poking around other things too."
|
||||
},
|
||||
{
|
||||
"stage": "complete-not-enabled",
|
||||
"trigger": "world_flag:nginx_unstable",
|
||||
"body": "It's running. But if that machine reboots for any reason nginx won't come back up automatically. You might want to fix that before Sarah notices."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,44 @@
|
||||
{
|
||||
"id": "marcus-Q003",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q003",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 3,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "Dave's report is vague but something is wrong on hermes. I'd start by looking at resource utilization before assuming it's the application."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "Check disk. df -h is your friend. Web servers write logs constantly and nobody always remembers to set up rotation."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "If you find a big file, don't just delete it — figure out why it got that big. Is logrotate configured for nginx? Check /etc/logrotate.d/."
|
||||
},
|
||||
{
|
||||
"stage": "hint_3",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "The default nginx logrotate config is in the nginx package. dpkg -L nginx | grep logrotate might give you somewhere to start. Or just write a correct one — it's about ten lines."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:hermes_logrotate_healthy",
|
||||
"body": "Nice. That was the right call — clearing the space and fixing what caused it. Logrotate problems have a way of coming back if you don't actually fix them."
|
||||
},
|
||||
{
|
||||
"stage": "complete-norotate",
|
||||
"trigger": "world_flag:hermes_log_pressure_pending",
|
||||
"body": "Space is back. But if you didn't fix the rotation config that log is going to grow again. Something to keep an eye on."
|
||||
},
|
||||
{
|
||||
"stage": "complete-down",
|
||||
"trigger": "world_flag:hermes_web_down",
|
||||
"body": "nginx is inactive now? That's worse than the disk problem. Restarting it without fixing why it died isn't a fix, it's a delay. Check what happened before you start it again."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"id": "marcus-Q004",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q004",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 4,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "Sarah's deploy thing is interesting. If the script said it ran fine but the files didn't change, something is blocking the write. I'd look at ownership before I touch the script."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "ls -la on the web root. If those files are owned by root and the deploy runs as www-data, that's your problem."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "chown. And use -R unless you enjoy doing it twice."
|
||||
},
|
||||
{
|
||||
"stage": "hint_3",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "chown -R www-data:www-data /var/www/axiomworks. Then you can trigger the deploy service to confirm it takes."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:hermes_deploy_healthy",
|
||||
"body": "Good. Someone ran that deploy as root at some point. Worth figuring out who has sudo on hermes and whether they should."
|
||||
},
|
||||
{
|
||||
"stage": "complete-partial",
|
||||
"trigger": "world_flag:hermes_deploy_partial",
|
||||
"body": "Ownership is fixed on the directory but I'm not sure the files inside are correct. Sarah might still hit issues on the next deploy."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,44 @@
|
||||
{
|
||||
"id": "marcus-Q005",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q005",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 5,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "Dave's disk alert is on /var/backups this time, not /var/log. That's a different problem. Something to do with the backup job probably."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "Look at what owns the files in that directory. If it's root and the backup agent is supposed to manage them, someone ran something as the wrong user."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "Check /etc/cron.d/. Jobs in there can specify a user on the line. If there's no user field it defaults to root."
|
||||
},
|
||||
{
|
||||
"stage": "hint_3",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "The line format is: schedule user command. If yours is just: schedule command — that's the problem. Add the user field."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:hermes_backup_healthy",
|
||||
"body": "Good catch on the ownership cleanup too. A lot of people would have just fixed the cron line and left the old root-owned files sitting there."
|
||||
},
|
||||
{
|
||||
"stage": "complete-partial",
|
||||
"trigger": "world_flag:hermes_backup_partial",
|
||||
"body": "Cron's correct now. The old files are still owned by root though — the retention script won't be able to clean them up. Worth sorting that out before the disk fills again."
|
||||
},
|
||||
{
|
||||
"stage": "complete-wrong",
|
||||
"trigger": "world_flag:hermes_backup_root_running",
|
||||
"body": "Disk's clear. But what was actually running that job? If root is still running it that directory is going to fill up again."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"id": "marcus-Q006",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q006",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 6,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "Vulcan is Arch. Different from what you've been working on. Package manager is pacman, not apt. Same concepts, different commands. Signature errors usually mean keyring or clock problems."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "Check what time that machine thinks it is. timedatectl. If NTP isn't running the clock drifts and GPG signatures start looking like they're from the future."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "systemctl enable --now systemd-timesyncd. Then wait a moment for sync, and try pacman again. You may also need to refresh the keyring."
|
||||
},
|
||||
{
|
||||
"stage": "hint_3",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "pacman -S archlinux-keyring to refresh. Then pacman -Syu should work."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:vulcan_builds_healthy",
|
||||
"body": "Clock drift breaking pacman is one of those things that seems unrelated until you've seen it twice. You'll spot it immediately next time."
|
||||
},
|
||||
{
|
||||
"stage": "complete-fragile",
|
||||
"trigger": "world_flag:vulcan_ntp_fragile",
|
||||
"body": "Timesyncd is running and builds work. It's not enabled at boot though — worth fixing that so the next reboot doesn't put you back here."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,39 @@
|
||||
{
|
||||
"id": "marcus-Q007",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q007",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 7,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "Priya can't get into hermes. Something in the SSH config changed. Figure out what it was and restore her access without creating a new problem."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "sshd_config is where SSH restrictions live. Look for AllowUsers or AllowGroups. One of those is either missing her or was set wrong."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "AllowGroups is the right pattern — it scales. AllowUsers is a list you have to maintain manually. Either works, but think about which one you want to be maintaining in six months."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:hermes_ssh_hardened_correct",
|
||||
"body": "AllowGroups with web-admin. That's the correct way to do it. Users in the group get access, users not in the group don't. No list to maintain."
|
||||
},
|
||||
{
|
||||
"stage": "complete-fragile",
|
||||
"trigger": "world_flag:hermes_ssh_allowusers_fragile",
|
||||
"body": "Priya's back in. That AllowUsers list is going to need a line added every time someone new needs access. Worth switching to group-based before it becomes a problem."
|
||||
},
|
||||
{
|
||||
"stage": "complete-regression",
|
||||
"trigger": "world_flag:hermes_ssh_unrestricted",
|
||||
"body": "Access is restored but the hardening is gone. That restriction was there for a reason — SSH open to everyone on hermes isn't a great position to be in."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,44 @@
|
||||
{
|
||||
"id": "marcus-Q008",
|
||||
"character": "marcus",
|
||||
"quest_id": "Q008",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 8,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "App's down after an update. First question is always: what changed. Sarah says a new package version came in. I'd start by looking at whether the binary actually runs."
|
||||
},
|
||||
{
|
||||
"stage": "hint_1",
|
||||
"trigger": "player_requested_help",
|
||||
"body": "journalctl -u axiomworks-app. If it's failing immediately, it's probably the binary itself, not config. Try running it directly and see what the error is."
|
||||
},
|
||||
{
|
||||
"stage": "hint_2",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "If the binary is bad, figure out where the package came from. pacman -Qi axiomworks-app will show you the repo. If it's coming from vulcan, go look at what they built."
|
||||
},
|
||||
{
|
||||
"stage": "hint_3",
|
||||
"trigger": "player_requested_help_again",
|
||||
"body": "You can roll back with pacman -U /var/cache/pacman/pkg/ if the old package is still cached. Or go to the repo on vulcan and look for an older version."
|
||||
},
|
||||
{
|
||||
"stage": "complete-rollback",
|
||||
"trigger": "world_flag:hermes_app_pinned_2-1-0",
|
||||
"body": "Solid. Pinning the version means the next update cycle won't pull the broken one back in. Someone needs to fix that build on vulcan at some point though."
|
||||
},
|
||||
{
|
||||
"stage": "complete-unpinned",
|
||||
"trigger": "world_flag:hermes_app_running",
|
||||
"body": "App's running again. Is the version pinned? If not the next pacman -Syu is going to pull 2.1.1 back in and you'll be back here."
|
||||
},
|
||||
{
|
||||
"stage": "complete-rebuild",
|
||||
"trigger": "world_flag:vulcan_build_fixed",
|
||||
"body": "You fixed it at the source. That's the right call if you have time for it. What was wrong with the build?"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,19 @@
|
||||
{
|
||||
"id": "marcus-day-one",
|
||||
"character": "marcus",
|
||||
"quest_id": "",
|
||||
"series_id": "marcus-main",
|
||||
"series_position": 0,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "welcome",
|
||||
"trigger": "immediate",
|
||||
"body": "Welcome. You're replacing Dale. Nobody will tell you what Dale did because it's complicated. Your badge number is pending — Dave from Finance has your temp credentials. He's on three today."
|
||||
},
|
||||
{
|
||||
"stage": "setup",
|
||||
"trigger": "immediate",
|
||||
"body": "Your machine is ares. You'll need to set up SSH keys before anything else will work. I'll send you the first ticket once provisioning clears. Probably this morning."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"id": "priya-Q007-followup",
|
||||
"character": "priya",
|
||||
"quest_id": "Q007",
|
||||
"series_id": "priya-ops",
|
||||
"series_position": 2,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "after-action",
|
||||
"trigger": "world_flag:priya_access_restored",
|
||||
"body": "Access is back. Thank you. I can finish the incident review now without SSH getting in the way."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,29 @@
|
||||
{
|
||||
"id": "priya-Q007",
|
||||
"character": "priya",
|
||||
"quest_id": "Q007",
|
||||
"series_id": "priya-ops",
|
||||
"series_position": 1,
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "I need access to hermes restored. I was in the middle of investigating an error and now I can't get back in. Find out what changed and fix it."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:hermes_ssh_hardened_correct",
|
||||
"body": "Back in. AllowGroups is the right way to do it — using AllowUsers was going to be a maintenance problem. Good call."
|
||||
},
|
||||
{
|
||||
"stage": "complete-fragile",
|
||||
"trigger": "world_flag:hermes_ssh_allowusers_fragile",
|
||||
"body": "Access restored. That AllowUsers list is going to need updating every time someone new needs access. Might want to switch to group-based at some point."
|
||||
},
|
||||
{
|
||||
"stage": "complete-regression",
|
||||
"trigger": "world_flag:hermes_ssh_unrestricted",
|
||||
"body": "I'm back in. But it looks like all SSH restrictions are gone now. That hardening was probably there for a reason."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,21 @@
|
||||
{
|
||||
"id": "priya-shift-review",
|
||||
"character": "priya",
|
||||
"messages": [
|
||||
{
|
||||
"stage": "excellent",
|
||||
"trigger": "shift_review",
|
||||
"body": "Strong shift. You handled the queue cleanly and did not create extra work for anyone else."
|
||||
},
|
||||
{
|
||||
"stage": "ok",
|
||||
"trigger": "shift_review",
|
||||
"body": "Acceptable shift. The important thing is that the work moved forward and the environment stayed stable."
|
||||
},
|
||||
{
|
||||
"stage": "poor",
|
||||
"trigger": "shift_review",
|
||||
"body": "This shift needs review. Resolve the backlog cleanly next time and stop leaving avoidable mess behind."
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"id": "sarah-Q003-angry",
|
||||
"character": "sarah",
|
||||
"quest_id": "Q003",
|
||||
"messages": [
|
||||
{
|
||||
"stage": "nginx-killed",
|
||||
"trigger": "world_flag:hermes_web_down",
|
||||
"body": "The site is completely down now. It was slow before — now it's returning nothing. What happened?"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,22 @@
|
||||
{
|
||||
"id": "sarah-Q004",
|
||||
"character": "sarah",
|
||||
"quest_id": "Q004",
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "My last deploy ran without errors but nothing changed on the site. The script didn't fail, it just... didn't do anything. Files in /var/www are owned by root for some reason."
|
||||
},
|
||||
{
|
||||
"stage": "complete-clean",
|
||||
"trigger": "world_flag:hermes_deploy_healthy",
|
||||
"body": "Deploy's working again. I pushed a test change and it applied. Thanks for sorting the ownership — not sure how that happened but it's fixed now."
|
||||
},
|
||||
{
|
||||
"stage": "complete-partial",
|
||||
"trigger": "world_flag:hermes_deploy_partial",
|
||||
"body": "The top-level directory is writable now but the files inside it still aren't. Next deploy is going to fail on the individual files. Can you finish the ownership fix?"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1,27 @@
|
||||
{
|
||||
"id": "sarah-Q008",
|
||||
"character": "sarah",
|
||||
"quest_id": "Q008",
|
||||
"messages": [
|
||||
{
|
||||
"stage": "intro",
|
||||
"trigger": "quest_activated",
|
||||
"body": "The app is crashing immediately after the last update. I didn't push any config changes. It was the package — axiomworks-app 2.1.1 is broken. Whatever vulcan built, it doesn't work."
|
||||
},
|
||||
{
|
||||
"stage": "complete-pinned",
|
||||
"trigger": "world_flag:hermes_app_pinned_2-1-0",
|
||||
"body": "App's running. The apt pin means we won't accidentally pull 2.1.1 in again. Someone needs to sort out what went wrong on vulcan before we can upgrade properly."
|
||||
},
|
||||
{
|
||||
"stage": "complete-rebuilt",
|
||||
"trigger": "world_flag:vulcan_build_fixed",
|
||||
"body": "App's running and the build is fixed. That's the right fix. I was hoping someone would trace it back to the source rather than just rolling back and leaving it."
|
||||
},
|
||||
{
|
||||
"stage": "complete-unpinned",
|
||||
"trigger": "world_flag:hermes_app_running",
|
||||
"body": "App's running again. Is 2.1.0 pinned in apt preferences? If not the next update cycle is going to pull 2.1.1 back in and we'll be here again."
|
||||
}
|
||||
]
|
||||
}
|
||||
Reference in New Issue
Block a user