Files
pi-kit/pikit-prep-spec.md
2025-12-10 18:51:31 -05:00

392 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Pi-Kit DietPi Image Prep Spec
This file defines how to design a **prep script** for a DietPi-based Pi-Kit image.
The scripts job:
Prepare a running Pi-Kit system to be cloned as a “golden image” **without** removing any intentional software, configs, hostname, or passwords.
---
## 0. Context & Goals
**Starting point**
- OS: DietPi (Debian-based), already installed.
- Extra software: web stack, Pi-Kit dashboard, DNS/ad-blocker, DBs, monitoring, etc.
- System has been used for testing (logs, histories, test data, junk).
**Goal**
- Prepare system for cloning as a product image.
- **KEEP**:
- All intentionally installed packages/software.
- All custom configs (web, apps, DietPi configs, firewall).
- Current hostname.
- Existing passwords (system + services) as shipped defaults.
- **RESET/CLEAR**:
- Host-unique identity data (machine-id, SSH host keys, etc.).
- Logs, histories, caches.
- Test/personal accounts and data.
---
## 1. Discovery Phase (MUST HAPPEN BEFORE SCRIPT DESIGN)
Before writing any code, inspect the system and external docs.
The AI MUST:
1. **Detect installed components**
- Determine which key packages/services are present, e.g.:
- Web server (nginx, lighttpd, apache2, etc.).
- DNS/ad-blocker (Pi-hole or similar).
- DB engines (MariaDB, PostgreSQL, SQLite usage).
- Monitoring/metrics (Netdata, Uptime Kuma, etc.).
- Use this to decide which cleanup sections apply.
2. **Verify paths/layouts**
- For each service or category:
- Confirm relevant paths/directories actually exist.
- Do not assume standard paths without checking.
- Example: Only treat `/var/log/nginx` as Nginx logs if:
- Nginx is installed, AND
- That directory exists.
3. **Consult upstream docs (online)**
- Check current:
- DietPi docs and/or DietPi GitHub.
- Docs for major services (e.g. Pi-hole, Nginx, MariaDB, etc.).
- Use docs to confirm:
- Data vs config locations.
- Safe cache/log cleanup methods.
- Prefer documented behavior over guesses.
4. **Classify actions**
- For each potential cleanup:
- Mark as **safe** if clearly understood and documented.
- Mark as **uncertain** if layout deviates or docs are unclear.
- Plan to:
- Perform safe actions.
- Skip uncertain actions and surface them for manual review.
5. **Fail safe**
- If something doesnt match expectations:
- Do NOT plan a destructive operation on it.
- Flag it as “needs manual review” in the confirmation phase.
---
## 2. Identity & Host-Specific Secrets
**DO NOT CHANGE:**
- Hostname (whatever it currently is).
- Any existing passwords (system or service-level) that are part of the appliance defaults.
**RESET/CLEAR:**
1. **Machine identity**
- Clear:
- `/etc/machine-id`
- `/var/lib/dbus/machine-id` (if present)
- Rely on OS to recreate them on next boot.
2. **Random seed**
- Clear persisted random seed (e.g. `/var/lib/systemd/random-seed`) so each clone gets unique entropy.
3. **SSH host keys**
- Remove all SSH **host key** files (server keys only).
- Leave user SSH keypairs unless explicitly identified as dev/test and safe to remove.
4. **SSH known_hosts**
- Clear `known_hosts` for:
- `root`
- `dietpi` (or primary DietPi user)
- Any other persistent users
5. **VPN keys (conditional)**
- If keys are meant to be unique per device:
- Remove WireGuard/OpenVPN private keys and per-device configs embedding them.
- If the design requires fixed server keys:
- KEEP server keys.
- REMOVE test/client keys/profiles that are tied to dev use.
6. **TLS certificates (conditional)**
- REMOVE:
- Lets Encrypt/ACME certs tied to personal domains.
- Per-device self-signed certs that should regenerate.
- KEEP:
- Shared CAs/certs only if explicitly part of product design.
---
## 3. Users & Personal Traces
1. **Accounts**
- KEEP:
- Accounts that are part of the product.
- REMOVE:
- Test-only accounts (users created for dev/debug).
2. **Shell histories**
- Clear shell histories for all remaining users:
- `root`, `dietpi`, others that stay.
3. **Home directories**
- For users that remain:
- KEEP:
- Intentional config/dotfiles (shell rc, app config, etc.).
- REMOVE:
- Downloads, random files, scratch notes.
- Editor backup/swap files, stray temp files.
- Debug dumps, one-off scripts not part of product.
- For users that are removed:
- Delete their home dirs entirely.
4. **SSH client keys**
- REMOVE:
- Clearly personal/test keys (e.g. with your email in comments).
- KEEP:
- Only keys explicitly required by product design.
---
## 4. Logs & Telemetry
1. **System logs**
- Clear:
- Systemd journal (persistent logs).
- `/var/log` files + rotated/compressed variants, where safe.
2. **Service logs**
- For installed services (web servers, DNS/ad-blockers, DBs, etc.):
- Clear their log files and rotated versions.
3. **Monitoring/metrics**
- For tools like Netdata, Uptime Kuma, etc.:
- KEEP:
- Config, target definitions.
- CLEAR:
- Historical metric/alert data (TSDBs, history files, etc.).
---
## 5. Package Manager & Caches
1. **APT**
- Clear:
- Downloaded `.deb` archives.
- Safe APT caches (as per documentation).
2. **Other caches**
- Under `/var/cache` and `~/.cache`:
- CLEAR:
- Caches known to be safe and auto-regenerated.
- DO NOT CLEAR:
- Caches that are required for correct functioning or very expensive to rebuild, unless docs confirm safety.
3. **Temp directories**
- Empty:
- `/tmp`
- `/var/tmp`
4. **Crash dumps**
- Remove crash dumps and core files (e.g. `/var/crash` and similar locations).
---
## 6. Service Data vs Config (Per-App Logic)
General rule:
> Keep configuration & structure. Remove dev/test data, history, and personal content.
The AI must apply this using detected services + docs.
### 6.1 Web Servers (nginx / lighttpd / apache2)
- KEEP:
- Main config and site configs that define Pi-Kit behavior.
- App code in `/var/www/...` (or equivalent Pi-Kit web root).
- CLEAR:
- Access/error logs.
- Non-critical caches if docs confirm theyre safe to recreate.
### 6.2 DNS / Ad-blockers (Pi-hole or similar)
- KEEP:
- Upstream DNS settings.
- Blocklists / adlists / local DNS overrides.
- DHCP config if it is part of the products behavior.
- CLEAR:
- Query history / statistics DB.
- Log files.
- DO NOT:
- Change the current admin password (it is the product default).
### 6.3 Databases (MariaDB, PostgreSQL, SQLite, etc.)
- KEEP:
- DB schema.
- Seed/default data required for every user.
- REMOVE/RESET:
- Dev/test user accounts (with your email, etc.).
- Test content/records not meant for production image.
- Access tokens, session records, API keys tied to dev use.
- For SQLite-based apps:
- Decide per app (based on docs) whether to:
- Ship a pre-seeded “clean” DB, OR
- Let it auto-create DB on first run.
### 6.4 Other services (Nextcloud, Jellyfin, Gotify, Uptime Kuma, etc.)
For each detected service:
- KEEP:
- Global config, ports, base URLs, application settings needed for Pi-Kit.
- CLEAR:
- Personal/dev user accounts.
- Your media/content (unless intentionally shipping sample content).
- Notification endpoints tied to your own email / Gotify / Telegram, unless explicitly desired.
If docs or structure are unclear, mark cleanup as **uncertain** and surface in confirmation instead of guessing.
---
## 7. Networking & Firewall
**HARD CONSTRAINTS:**
- Do NOT modify hostname.
- Do NOT weaken/remove the product firewall rules.
1. **Firewall**
- Detect firewall system in use (iptables, nftables, UFW, etc.).
- KEEP:
- All persistent firewall configs that define Pi-Kits security behavior.
- DO NOT:
- Flush or reset firewall rules unless its clearly a dev-only configuration (and thats confirmed).
2. **Other networking state**
- Safe to CLEAR:
- DHCP lease files.
- DNS caches.
- DO NOT ALTER:
- Static IP/bridge/VLAN config that appears to be part of the intended appliance setup.
---
## 8. DietPi-Specific State & First-Boot Behavior
1. **DietPi automation/config**
- Identify DietPi automation configuration (e.g. `dietpi.txt`, related files).
- KEEP:
- The intended defaults (locale, timezone, etc.).
- Any automation that is part of Pi-Kit behavior.
- AVOID:
- Re-triggering DietPis generic first-boot flow unless that is intentionally desired.
2. **DietPi logs/temp**
- CLEAR:
- DietPi-specific logs and temp files.
- KEEP:
- All DietPi configuration and automation files.
3. **Pi-Kit first-boot logic**
- Ensure any Pi-Kit specific first-run services/hooks are:
- Enabled.
- Not dependent on data being cleaned (e.g., they must not require removed dev tokens/paths).
---
## 9. Shell & Tooling State
1. **Tool caches**
- For root and main user(s), CLEAR:
- Safe caches in `~/.cache` (pip, npm, cargo, etc.), if not needed at runtime.
- Avoid clearing caches that are critical or painful to rebuild unless doc-backed.
2. **Build artifacts**
- REMOVE:
- Source trees, build directories, and other dev artifacts that are not part of final product.
3. **Cronjobs / timers**
- Audit:
- User crontabs.
- System crontabs.
- Systemd timers.
- KEEP:
- Jobs that are part of Pi-Kit behavior.
- REMOVE:
- Jobs/timers clearly used for dev/testing only.
---
## 10. Implementation Requirements (For the Future Script)
When generating the actual script, the AI MUST:
1. **Error handling**
- Check exit statuses where relevant.
- Handle missing paths/directories gracefully:
- If a path doesnt exist, skip and log; do not fail hard.
- Avoid wide-destructive operations without validation:
- No “blind” deletions on unverified globs.
2. **Idempotency**
- Script can run multiple times without progressively breaking the system.
- After repeated runs, image should remain valid and “clean”.
3. **Conservative behavior**
- If uncertain about an operation:
- Do NOT perform it.
- Log a warning and mark for manual review.
4. **Logging**
- For each major category (identity, logs, caches, per-service cleanup, etc.):
- Log what was targeted and outcome:
- `cleaned`
- `skipped (not installed/not found)`
- `skipped (uncertain; manual review)`
- Provide a summary at the end.
---
## 11. Mandatory Pre-Script Confirmation Step
**Before writing any script, the AI MUST:**
1. **Present a system-specific plan**
- Based on discovery + docs, list:
- Exactly which paths, files, DBs, and data types it intends to:
- Remove
- Reset
- Leave untouched
- For each item or group: a short explanation of **why**.
2. **Highlight conflicts / ambiguities**
- If any cleanup might:
- Affect passwords,
- Affect hostname,
- Affect firewall rules,
- Or contradict this spec in any way,
- The AI must:
- Call it out explicitly.
- Explain tradeoffs and propose a safe option.
3. **Highlight extra opportunities**
- If the AI finds additional cleanup opportunities not explicitly listed here (e.g., new DietPi features, new log paths):
- Describe them clearly.
- Explain pros/cons of adding them.
- Ask whether to include them.
4. **Wait for explicit approval**
- Do NOT generate the script until:
- The user (me) has reviewed the plan.
- Conflicts and extra opportunities have been discussed.
- Explicit approval (with any modifications) has been given.
Only after that confirmation may the AI produce the actual prep script.
---