# Pi-Kit DietPi Image Prep Spec This file defines how to design a **prep script** for a DietPi-based Pi-Kit image. The script’s job: Prepare a running Pi-Kit system to be cloned as a “golden image” **without** removing any intentional software, configs, hostname, or passwords. --- ## 0. Context & Goals **Starting point** - OS: DietPi (Debian-based), already installed. - Extra software: web stack, Pi-Kit dashboard, DNS/ad-blocker, DBs, monitoring, etc. - System has been used for testing (logs, histories, test data, junk). **Goal** - Prepare system for cloning as a product image. - **KEEP**: - All intentionally installed packages/software. - All custom configs (web, apps, DietPi configs, firewall). - Current hostname. - Existing passwords (system + services) as shipped defaults. - **RESET/CLEAR**: - Host-unique identity data (machine-id, SSH host keys, etc.). - Logs, histories, caches. - Test/personal accounts and data. --- ## 1. Discovery Phase (MUST HAPPEN BEFORE SCRIPT DESIGN) Before writing any code, inspect the system and external docs. The AI MUST: 1. **Detect installed components** - Determine which key packages/services are present, e.g.: - Web server (nginx, lighttpd, apache2, etc.). - DNS/ad-blocker (Pi-hole or similar). - DB engines (MariaDB, PostgreSQL, SQLite usage). - Monitoring/metrics (Netdata, Uptime Kuma, etc.). - Use this to decide which cleanup sections apply. 2. **Verify paths/layouts** - For each service or category: - Confirm relevant paths/directories actually exist. - Do not assume standard paths without checking. - Example: Only treat `/var/log/nginx` as Nginx logs if: - Nginx is installed, AND - That directory exists. 3. **Consult upstream docs (online)** - Check current: - DietPi docs and/or DietPi GitHub. - Docs for major services (e.g. Pi-hole, Nginx, MariaDB, etc.). - Use docs to confirm: - Data vs config locations. - Safe cache/log cleanup methods. - Prefer documented behavior over guesses. 4. **Classify actions** - For each potential cleanup: - Mark as **safe** if clearly understood and documented. - Mark as **uncertain** if layout deviates or docs are unclear. - Plan to: - Perform safe actions. - Skip uncertain actions and surface them for manual review. 5. **Fail safe** - If something doesn’t match expectations: - Do NOT plan a destructive operation on it. - Flag it as “needs manual review” in the confirmation phase. --- ## 2. Identity & Host-Specific Secrets **DO NOT CHANGE:** - Hostname (whatever it currently is). - Any existing passwords (system or service-level) that are part of the appliance defaults. **RESET/CLEAR:** 1. **Machine identity** - Clear: - `/etc/machine-id` - `/var/lib/dbus/machine-id` (if present) - Rely on OS to recreate them on next boot. 2. **Random seed** - Clear persisted random seed (e.g. `/var/lib/systemd/random-seed`) so each clone gets unique entropy. 3. **SSH host keys** - Remove all SSH **host key** files (server keys only). - Leave user SSH keypairs unless explicitly identified as dev/test and safe to remove. 4. **SSH known_hosts** - Clear `known_hosts` for: - `root` - `dietpi` (or primary DietPi user) - Any other persistent users 5. **VPN keys (conditional)** - If keys are meant to be unique per device: - Remove WireGuard/OpenVPN private keys and per-device configs embedding them. - If the design requires fixed server keys: - KEEP server keys. - REMOVE test/client keys/profiles that are tied to dev use. 6. **TLS certificates (conditional)** - REMOVE: - Let’s Encrypt/ACME certs tied to personal domains. - Per-device self-signed certs that should regenerate. - KEEP: - Shared CAs/certs only if explicitly part of product design. --- ## 3. Users & Personal Traces 1. **Accounts** - KEEP: - Accounts that are part of the product. - REMOVE: - Test-only accounts (users created for dev/debug). 2. **Shell histories** - Clear shell histories for all remaining users: - `root`, `dietpi`, others that stay. 3. **Home directories** - For users that remain: - KEEP: - Intentional config/dotfiles (shell rc, app config, etc.). - REMOVE: - Downloads, random files, scratch notes. - Editor backup/swap files, stray temp files. - Debug dumps, one-off scripts not part of product. - For users that are removed: - Delete their home dirs entirely. 4. **SSH client keys** - REMOVE: - Clearly personal/test keys (e.g. with your email in comments). - KEEP: - Only keys explicitly required by product design. --- ## 4. Logs & Telemetry 1. **System logs** - Clear: - Systemd journal (persistent logs). - `/var/log` files + rotated/compressed variants, where safe. 2. **Service logs** - For installed services (web servers, DNS/ad-blockers, DBs, etc.): - Clear their log files and rotated versions. 3. **Monitoring/metrics** - For tools like Netdata, Uptime Kuma, etc.: - KEEP: - Config, target definitions. - CLEAR: - Historical metric/alert data (TSDBs, history files, etc.). --- ## 5. Package Manager & Caches 1. **APT** - Clear: - Downloaded `.deb` archives. - Safe APT caches (as per documentation). 2. **Other caches** - Under `/var/cache` and `~/.cache`: - CLEAR: - Caches known to be safe and auto-regenerated. - DO NOT CLEAR: - Caches that are required for correct functioning or very expensive to rebuild, unless docs confirm safety. 3. **Temp directories** - Empty: - `/tmp` - `/var/tmp` 4. **Crash dumps** - Remove crash dumps and core files (e.g. `/var/crash` and similar locations). --- ## 6. Service Data vs Config (Per-App Logic) General rule: > Keep configuration & structure. Remove dev/test data, history, and personal content. The AI must apply this using detected services + docs. ### 6.1 Web Servers (nginx / lighttpd / apache2) - KEEP: - Main config and site configs that define Pi-Kit behavior. - App code in `/var/www/...` (or equivalent Pi-Kit web root). - CLEAR: - Access/error logs. - Non-critical caches if docs confirm they’re safe to recreate. ### 6.2 DNS / Ad-blockers (Pi-hole or similar) - KEEP: - Upstream DNS settings. - Blocklists / adlists / local DNS overrides. - DHCP config if it is part of the product’s behavior. - CLEAR: - Query history / statistics DB. - Log files. - DO NOT: - Change the current admin password (it is the product default). ### 6.3 Databases (MariaDB, PostgreSQL, SQLite, etc.) - KEEP: - DB schema. - Seed/default data required for every user. - REMOVE/RESET: - Dev/test user accounts (with your email, etc.). - Test content/records not meant for production image. - Access tokens, session records, API keys tied to dev use. - For SQLite-based apps: - Decide per app (based on docs) whether to: - Ship a pre-seeded “clean” DB, OR - Let it auto-create DB on first run. ### 6.4 Other services (Nextcloud, Jellyfin, Gotify, Uptime Kuma, etc.) For each detected service: - KEEP: - Global config, ports, base URLs, application settings needed for Pi-Kit. - CLEAR: - Personal/dev user accounts. - Your media/content (unless intentionally shipping sample content). - Notification endpoints tied to your own email / Gotify / Telegram, unless explicitly desired. If docs or structure are unclear, mark cleanup as **uncertain** and surface in confirmation instead of guessing. --- ## 7. Networking & Firewall **HARD CONSTRAINTS:** - Do NOT modify hostname. - Do NOT weaken/remove the product firewall rules. 1. **Firewall** - Detect firewall system in use (iptables, nftables, UFW, etc.). - KEEP: - All persistent firewall configs that define Pi-Kit’s security behavior. - DO NOT: - Flush or reset firewall rules unless it’s clearly a dev-only configuration (and that’s confirmed). 2. **Other networking state** - Safe to CLEAR: - DHCP lease files. - DNS caches. - DO NOT ALTER: - Static IP/bridge/VLAN config that appears to be part of the intended appliance setup. --- ## 8. DietPi-Specific State & First-Boot Behavior 1. **DietPi automation/config** - Identify DietPi automation configuration (e.g. `dietpi.txt`, related files). - KEEP: - The intended defaults (locale, timezone, etc.). - Any automation that is part of Pi-Kit behavior. - AVOID: - Re-triggering DietPi’s generic first-boot flow unless that is intentionally desired. 2. **DietPi logs/temp** - CLEAR: - DietPi-specific logs and temp files. - KEEP: - All DietPi configuration and automation files. 3. **Pi-Kit first-boot logic** - Ensure any Pi-Kit specific first-run services/hooks are: - Enabled. - Not dependent on data being cleaned (e.g., they must not require removed dev tokens/paths). --- ## 9. Shell & Tooling State 1. **Tool caches** - For root and main user(s), CLEAR: - Safe caches in `~/.cache` (pip, npm, cargo, etc.), if not needed at runtime. - Avoid clearing caches that are critical or painful to rebuild unless doc-backed. 2. **Build artifacts** - REMOVE: - Source trees, build directories, and other dev artifacts that are not part of final product. 3. **Cronjobs / timers** - Audit: - User crontabs. - System crontabs. - Systemd timers. - KEEP: - Jobs that are part of Pi-Kit behavior. - REMOVE: - Jobs/timers clearly used for dev/testing only. --- ## 10. Implementation Requirements (For the Future Script) When generating the actual script, the AI MUST: 1. **Error handling** - Check exit statuses where relevant. - Handle missing paths/directories gracefully: - If a path doesn’t exist, skip and log; do not fail hard. - Avoid wide-destructive operations without validation: - No “blind” deletions on unverified globs. 2. **Idempotency** - Script can run multiple times without progressively breaking the system. - After repeated runs, image should remain valid and “clean”. 3. **Conservative behavior** - If uncertain about an operation: - Do NOT perform it. - Log a warning and mark for manual review. 4. **Logging** - For each major category (identity, logs, caches, per-service cleanup, etc.): - Log what was targeted and outcome: - `cleaned` - `skipped (not installed/not found)` - `skipped (uncertain; manual review)` - Provide a summary at the end. --- ## 11. Mandatory Pre-Script Confirmation Step **Before writing any script, the AI MUST:** 1. **Present a system-specific plan** - Based on discovery + docs, list: - Exactly which paths, files, DBs, and data types it intends to: - Remove - Reset - Leave untouched - For each item or group: a short explanation of **why**. 2. **Highlight conflicts / ambiguities** - If any cleanup might: - Affect passwords, - Affect hostname, - Affect firewall rules, - Or contradict this spec in any way, - The AI must: - Call it out explicitly. - Explain tradeoffs and propose a safe option. 3. **Highlight extra opportunities** - If the AI finds additional cleanup opportunities not explicitly listed here (e.g., new DietPi features, new log paths): - Describe them clearly. - Explain pros/cons of adding them. - Ask whether to include them. 4. **Wait for explicit approval** - Do NOT generate the script until: - The user (me) has reviewed the plan. - Conflicts and extra opportunities have been discussed. - Explicit approval (with any modifications) has been given. Only after that confirmation may the AI produce the actual prep script. ---