392 lines
12 KiB
Markdown
392 lines
12 KiB
Markdown
# Pi-Kit DietPi Image Prep Spec
|
||
|
||
This file defines how to design a **prep script** for a DietPi-based Pi-Kit image.
|
||
|
||
The script’s job:
|
||
Prepare a running Pi-Kit system to be cloned as a “golden image” **without** removing any intentional software, configs, hostname, or passwords.
|
||
|
||
---
|
||
|
||
## 0. Context & Goals
|
||
|
||
**Starting point**
|
||
|
||
- OS: DietPi (Debian-based), already installed.
|
||
- Extra software: web stack, Pi-Kit dashboard, DNS/ad-blocker, DBs, monitoring, etc.
|
||
- System has been used for testing (logs, histories, test data, junk).
|
||
|
||
**Goal**
|
||
|
||
- Prepare system for cloning as a product image.
|
||
- **KEEP**:
|
||
- All intentionally installed packages/software.
|
||
- All custom configs (web, apps, DietPi configs, firewall).
|
||
- Current hostname.
|
||
- Existing passwords (system + services) as shipped defaults.
|
||
- **RESET/CLEAR**:
|
||
- Host-unique identity data (machine-id, SSH host keys, etc.).
|
||
- Logs, histories, caches.
|
||
- Test/personal accounts and data.
|
||
|
||
---
|
||
|
||
## 1. Discovery Phase (MUST HAPPEN BEFORE SCRIPT DESIGN)
|
||
|
||
Before writing any code, inspect the system and external docs.
|
||
|
||
The AI MUST:
|
||
|
||
1. **Detect installed components**
|
||
- Determine which key packages/services are present, e.g.:
|
||
- Web server (nginx, lighttpd, apache2, etc.).
|
||
- DNS/ad-blocker (Pi-hole or similar).
|
||
- DB engines (MariaDB, PostgreSQL, SQLite usage).
|
||
- Monitoring/metrics (Netdata, Uptime Kuma, etc.).
|
||
- Use this to decide which cleanup sections apply.
|
||
|
||
2. **Verify paths/layouts**
|
||
- For each service or category:
|
||
- Confirm relevant paths/directories actually exist.
|
||
- Do not assume standard paths without checking.
|
||
- Example: Only treat `/var/log/nginx` as Nginx logs if:
|
||
- Nginx is installed, AND
|
||
- That directory exists.
|
||
|
||
3. **Consult upstream docs (online)**
|
||
- Check current:
|
||
- DietPi docs and/or DietPi GitHub.
|
||
- Docs for major services (e.g. Pi-hole, Nginx, MariaDB, etc.).
|
||
- Use docs to confirm:
|
||
- Data vs config locations.
|
||
- Safe cache/log cleanup methods.
|
||
- Prefer documented behavior over guesses.
|
||
|
||
4. **Classify actions**
|
||
- For each potential cleanup:
|
||
- Mark as **safe** if clearly understood and documented.
|
||
- Mark as **uncertain** if layout deviates or docs are unclear.
|
||
- Plan to:
|
||
- Perform safe actions.
|
||
- Skip uncertain actions and surface them for manual review.
|
||
|
||
5. **Fail safe**
|
||
- If something doesn’t match expectations:
|
||
- Do NOT plan a destructive operation on it.
|
||
- Flag it as “needs manual review” in the confirmation phase.
|
||
|
||
---
|
||
|
||
## 2. Identity & Host-Specific Secrets
|
||
|
||
**DO NOT CHANGE:**
|
||
|
||
- Hostname (whatever it currently is).
|
||
- Any existing passwords (system or service-level) that are part of the appliance defaults.
|
||
|
||
**RESET/CLEAR:**
|
||
|
||
1. **Machine identity**
|
||
- Clear:
|
||
- `/etc/machine-id`
|
||
- `/var/lib/dbus/machine-id` (if present)
|
||
- Rely on OS to recreate them on next boot.
|
||
|
||
2. **Random seed**
|
||
- Clear persisted random seed (e.g. `/var/lib/systemd/random-seed`) so each clone gets unique entropy.
|
||
|
||
3. **SSH host keys**
|
||
- Remove all SSH **host key** files (server keys only).
|
||
- Leave user SSH keypairs unless explicitly identified as dev/test and safe to remove.
|
||
|
||
4. **SSH known_hosts**
|
||
- Clear `known_hosts` for:
|
||
- `root`
|
||
- `dietpi` (or primary DietPi user)
|
||
- Any other persistent users
|
||
|
||
5. **VPN keys (conditional)**
|
||
- If keys are meant to be unique per device:
|
||
- Remove WireGuard/OpenVPN private keys and per-device configs embedding them.
|
||
- If the design requires fixed server keys:
|
||
- KEEP server keys.
|
||
- REMOVE test/client keys/profiles that are tied to dev use.
|
||
|
||
6. **TLS certificates (conditional)**
|
||
- REMOVE:
|
||
- Let’s Encrypt/ACME certs tied to personal domains.
|
||
- Per-device self-signed certs that should regenerate.
|
||
- KEEP:
|
||
- Shared CAs/certs only if explicitly part of product design.
|
||
|
||
---
|
||
|
||
## 3. Users & Personal Traces
|
||
|
||
1. **Accounts**
|
||
- KEEP:
|
||
- Accounts that are part of the product.
|
||
- REMOVE:
|
||
- Test-only accounts (users created for dev/debug).
|
||
|
||
2. **Shell histories**
|
||
- Clear shell histories for all remaining users:
|
||
- `root`, `dietpi`, others that stay.
|
||
|
||
3. **Home directories**
|
||
- For users that remain:
|
||
- KEEP:
|
||
- Intentional config/dotfiles (shell rc, app config, etc.).
|
||
- REMOVE:
|
||
- Downloads, random files, scratch notes.
|
||
- Editor backup/swap files, stray temp files.
|
||
- Debug dumps, one-off scripts not part of product.
|
||
- For users that are removed:
|
||
- Delete their home dirs entirely.
|
||
|
||
4. **SSH client keys**
|
||
- REMOVE:
|
||
- Clearly personal/test keys (e.g. with your email in comments).
|
||
- KEEP:
|
||
- Only keys explicitly required by product design.
|
||
|
||
---
|
||
|
||
## 4. Logs & Telemetry
|
||
|
||
1. **System logs**
|
||
- Clear:
|
||
- Systemd journal (persistent logs).
|
||
- `/var/log` files + rotated/compressed variants, where safe.
|
||
|
||
2. **Service logs**
|
||
- For installed services (web servers, DNS/ad-blockers, DBs, etc.):
|
||
- Clear their log files and rotated versions.
|
||
|
||
3. **Monitoring/metrics**
|
||
- For tools like Netdata, Uptime Kuma, etc.:
|
||
- KEEP:
|
||
- Config, target definitions.
|
||
- CLEAR:
|
||
- Historical metric/alert data (TSDBs, history files, etc.).
|
||
|
||
---
|
||
|
||
## 5. Package Manager & Caches
|
||
|
||
1. **APT**
|
||
- Clear:
|
||
- Downloaded `.deb` archives.
|
||
- Safe APT caches (as per documentation).
|
||
|
||
2. **Other caches**
|
||
- Under `/var/cache` and `~/.cache`:
|
||
- CLEAR:
|
||
- Caches known to be safe and auto-regenerated.
|
||
- DO NOT CLEAR:
|
||
- Caches that are required for correct functioning or very expensive to rebuild, unless docs confirm safety.
|
||
|
||
3. **Temp directories**
|
||
- Empty:
|
||
- `/tmp`
|
||
- `/var/tmp`
|
||
|
||
4. **Crash dumps**
|
||
- Remove crash dumps and core files (e.g. `/var/crash` and similar locations).
|
||
|
||
---
|
||
|
||
## 6. Service Data vs Config (Per-App Logic)
|
||
|
||
General rule:
|
||
|
||
> Keep configuration & structure. Remove dev/test data, history, and personal content.
|
||
|
||
The AI must apply this using detected services + docs.
|
||
|
||
### 6.1 Web Servers (nginx / lighttpd / apache2)
|
||
|
||
- KEEP:
|
||
- Main config and site configs that define Pi-Kit behavior.
|
||
- App code in `/var/www/...` (or equivalent Pi-Kit web root).
|
||
- CLEAR:
|
||
- Access/error logs.
|
||
- Non-critical caches if docs confirm they’re safe to recreate.
|
||
|
||
### 6.2 DNS / Ad-blockers (Pi-hole or similar)
|
||
|
||
- KEEP:
|
||
- Upstream DNS settings.
|
||
- Blocklists / adlists / local DNS overrides.
|
||
- DHCP config if it is part of the product’s behavior.
|
||
- CLEAR:
|
||
- Query history / statistics DB.
|
||
- Log files.
|
||
- DO NOT:
|
||
- Change the current admin password (it is the product default).
|
||
|
||
### 6.3 Databases (MariaDB, PostgreSQL, SQLite, etc.)
|
||
|
||
- KEEP:
|
||
- DB schema.
|
||
- Seed/default data required for every user.
|
||
- REMOVE/RESET:
|
||
- Dev/test user accounts (with your email, etc.).
|
||
- Test content/records not meant for production image.
|
||
- Access tokens, session records, API keys tied to dev use.
|
||
- For SQLite-based apps:
|
||
- Decide per app (based on docs) whether to:
|
||
- Ship a pre-seeded “clean” DB, OR
|
||
- Let it auto-create DB on first run.
|
||
|
||
### 6.4 Other services (Nextcloud, Jellyfin, Gotify, Uptime Kuma, etc.)
|
||
|
||
For each detected service:
|
||
|
||
- KEEP:
|
||
- Global config, ports, base URLs, application settings needed for Pi-Kit.
|
||
- CLEAR:
|
||
- Personal/dev user accounts.
|
||
- Your media/content (unless intentionally shipping sample content).
|
||
- Notification endpoints tied to your own email / Gotify / Telegram, unless explicitly desired.
|
||
|
||
If docs or structure are unclear, mark cleanup as **uncertain** and surface in confirmation instead of guessing.
|
||
|
||
---
|
||
|
||
## 7. Networking & Firewall
|
||
|
||
**HARD CONSTRAINTS:**
|
||
|
||
- Do NOT modify hostname.
|
||
- Do NOT weaken/remove the product firewall rules.
|
||
|
||
1. **Firewall**
|
||
- Detect firewall system in use (iptables, nftables, UFW, etc.).
|
||
- KEEP:
|
||
- All persistent firewall configs that define Pi-Kit’s security behavior.
|
||
- DO NOT:
|
||
- Flush or reset firewall rules unless it’s clearly a dev-only configuration (and that’s confirmed).
|
||
|
||
2. **Other networking state**
|
||
- Safe to CLEAR:
|
||
- DHCP lease files.
|
||
- DNS caches.
|
||
- DO NOT ALTER:
|
||
- Static IP/bridge/VLAN config that appears to be part of the intended appliance setup.
|
||
|
||
---
|
||
|
||
## 8. DietPi-Specific State & First-Boot Behavior
|
||
|
||
1. **DietPi automation/config**
|
||
- Identify DietPi automation configuration (e.g. `dietpi.txt`, related files).
|
||
- KEEP:
|
||
- The intended defaults (locale, timezone, etc.).
|
||
- Any automation that is part of Pi-Kit behavior.
|
||
- AVOID:
|
||
- Re-triggering DietPi’s generic first-boot flow unless that is intentionally desired.
|
||
|
||
2. **DietPi logs/temp**
|
||
- CLEAR:
|
||
- DietPi-specific logs and temp files.
|
||
- KEEP:
|
||
- All DietPi configuration and automation files.
|
||
|
||
3. **Pi-Kit first-boot logic**
|
||
- Ensure any Pi-Kit specific first-run services/hooks are:
|
||
- Enabled.
|
||
- Not dependent on data being cleaned (e.g., they must not require removed dev tokens/paths).
|
||
|
||
---
|
||
|
||
## 9. Shell & Tooling State
|
||
|
||
1. **Tool caches**
|
||
- For root and main user(s), CLEAR:
|
||
- Safe caches in `~/.cache` (pip, npm, cargo, etc.), if not needed at runtime.
|
||
- Avoid clearing caches that are critical or painful to rebuild unless doc-backed.
|
||
|
||
2. **Build artifacts**
|
||
- REMOVE:
|
||
- Source trees, build directories, and other dev artifacts that are not part of final product.
|
||
|
||
3. **Cronjobs / timers**
|
||
- Audit:
|
||
- User crontabs.
|
||
- System crontabs.
|
||
- Systemd timers.
|
||
- KEEP:
|
||
- Jobs that are part of Pi-Kit behavior.
|
||
- REMOVE:
|
||
- Jobs/timers clearly used for dev/testing only.
|
||
|
||
---
|
||
|
||
## 10. Implementation Requirements (For the Future Script)
|
||
|
||
When generating the actual script, the AI MUST:
|
||
|
||
1. **Error handling**
|
||
- Check exit statuses where relevant.
|
||
- Handle missing paths/directories gracefully:
|
||
- If a path doesn’t exist, skip and log; do not fail hard.
|
||
- Avoid wide-destructive operations without validation:
|
||
- No “blind” deletions on unverified globs.
|
||
|
||
2. **Idempotency**
|
||
- Script can run multiple times without progressively breaking the system.
|
||
- After repeated runs, image should remain valid and “clean”.
|
||
|
||
3. **Conservative behavior**
|
||
- If uncertain about an operation:
|
||
- Do NOT perform it.
|
||
- Log a warning and mark for manual review.
|
||
|
||
4. **Logging**
|
||
- For each major category (identity, logs, caches, per-service cleanup, etc.):
|
||
- Log what was targeted and outcome:
|
||
- `cleaned`
|
||
- `skipped (not installed/not found)`
|
||
- `skipped (uncertain; manual review)`
|
||
- Provide a summary at the end.
|
||
|
||
---
|
||
|
||
## 11. Mandatory Pre-Script Confirmation Step
|
||
|
||
**Before writing any script, the AI MUST:**
|
||
|
||
1. **Present a system-specific plan**
|
||
- Based on discovery + docs, list:
|
||
- Exactly which paths, files, DBs, and data types it intends to:
|
||
- Remove
|
||
- Reset
|
||
- Leave untouched
|
||
- For each item or group: a short explanation of **why**.
|
||
|
||
2. **Highlight conflicts / ambiguities**
|
||
- If any cleanup might:
|
||
- Affect passwords,
|
||
- Affect hostname,
|
||
- Affect firewall rules,
|
||
- Or contradict this spec in any way,
|
||
- The AI must:
|
||
- Call it out explicitly.
|
||
- Explain tradeoffs and propose a safe option.
|
||
|
||
3. **Highlight extra opportunities**
|
||
- If the AI finds additional cleanup opportunities not explicitly listed here (e.g., new DietPi features, new log paths):
|
||
- Describe them clearly.
|
||
- Explain pros/cons of adding them.
|
||
- Ask whether to include them.
|
||
|
||
4. **Wait for explicit approval**
|
||
- Do NOT generate the script until:
|
||
- The user (me) has reviewed the plan.
|
||
- Conflicts and extra opportunities have been discussed.
|
||
- Explicit approval (with any modifications) has been given.
|
||
|
||
Only after that confirmation may the AI produce the actual prep script.
|
||
|
||
---
|