Operations Runbooks
This page describes how to make common changes safely and how to perform routine maintenance without drifting from the documented architecture.
Change Principles
- Change one layer at a time.
- Confirm the current state before editing anything.
- Update the matching config file first, then the service.
- Keep a redacted export or diff of the change.
- Verify the new state after the restart or rollout.
- Update the docs immediately after the change is confirmed.
Standard Change Workflow
Use this sequence for Proxmox, pfSense, Docker, and DNS changes:
- Identify the source of truth for the component.
- Capture a read-only snapshot of the current config or runtime state.
- Edit only the relevant config file or UI setting.
- Apply the change in the smallest possible scope.
- Confirm health, routes, listeners, and logs.
- Roll back if the expected state does not appear.
- Update the relevant inventory, dependency, and service pages if the change affects architecture.
pfSense Changes
Safe edits
- Edit the pfSense config only through the UI or a backup/restore cycle.
- Prefer changing aliases, rules, or NAT entries individually.
- Keep a fresh
config.redacted.xmlafter major updates.
Before you touch pfSense
- Confirm which interface, alias, or rule needs to change.
- Confirm whether the change affects WAN failover, DNS redirection, proxy redirection, or WireGuard.
- Take a backup export before making the change.
After the change
- Validate interface status and gateway health.
- Confirm LAN clients still resolve through Pi-hole.
- Confirm HTTP/HTTPS redirection still matches the intended proxy behavior.
- Confirm WireGuard, if relevant, still starts on boot.
Docker Changes
Safe edits
- Edit the matching
/root/<project>/docker-compose.ymland.env. - Keep secret values out of the docs and out of the repo.
- Restart only the stack you changed.
Before you touch Docker
- Confirm the stack is current and not orphaned.
- Check
docker ps, the compose file, and the mount layout. - Check whether the container uses Traefik labels or direct port publishing.
After the change
- Check
docker psfor restart loops. - Check
docker logsfor the changed service. - Confirm the expected Traefik route, if the service is public.
- Confirm any dependent database or backend service is healthy.
Common Docker updates
- Update image tag
- Change environment values
- Adjust a bind mount or volume
- Add or remove a route in Traefik
- Recreate only the affected stack
Proxmox Changes
Safe edits
- Use
qm config <VMID>andpct config <CTID>as the authoritative read-only view. - Make one guest or one storage change at a time.
- Avoid changing the host network if the pfSense VM is the boundary.
Before you touch Proxmox
- Confirm which guest or storage target is involved.
- Confirm whether the guest is running or stopped.
- Confirm whether the change affects a bridge, a VLAN, or a storage mount.
After the change
- Confirm the guest still starts cleanly.
- Confirm the expected bridge or storage is still present.
- Confirm the guest has the expected IP and role.
DNS Changes
Safe edits
- Treat the Pi-hole LXC as the client-facing DNS source.
- If a new static record is needed, add it in the authoritative DNS source, not ad hoc on clients.
- Keep DNS redirection and DHCP handing out the same resolver address.
Before you touch DNS
- Confirm whether the change belongs on Pi-hole, pfSense, or both.
- Confirm the target host/IP is stable.
- Confirm whether the change affects internal-only or DMZ hosts.
After the change
- Verify resolution from a client on LAN and from a host in DMZ.
- Verify any firewall DNS redirect rules still catch the intended traffic.
Headscale Client Onboarding
Use Tailscale and Headscale Client Onboarding
when adding infrastructure hosts to the mesh. The standard pattern is official
Tailscale packages, Headscale pre-auth keys, the infra Headscale user for
servers, and --accept-dns=false on Proxmox or other infrastructure hosts
unless DNS behavior is intentionally being changed.
Subnet-router changes must advertise only validated narrow routes. The current
approved DMZ route is 192.168.2.0/24, served by CT 105 ts-router; do not
advertise 192.168.100.0/24 or broad private ranges. Keep Tailscale SNAT
enabled unless OPNsense static routes and firewall rules for 100.64.0.0/10
are explicitly approved and validated.
Maintenance Checklist
Daily or as-needed
- Check whether any Docker stack is in a restart loop.
- Check whether pfSense gateways are healthy.
- Check whether the Proxmox host and key guests are up.
Weekly
- Review container health and logs for the core stack.
- Check the backup storage target for available space.
- Confirm the docs repo still builds after any stack changes.
Monthly
- Review orphaned Docker stacks again.
- Review whether any stale hardware or legacy pages need additional cleanup.
- Verify the redacted pfSense export still reflects the live firewall config.
Rollback Notes
- For pfSense, restore the previous exported config if a rule or interface change breaks connectivity.
- For Docker, revert the compose file and recreate the affected stack.
- For Proxmox, reverse the guest or storage change and recheck the guest state.
- For DNS, revert the DNS record or redirect rule and confirm clients resolve again.
Documentation Rule
Any change that affects routing, publishing, auth, storage, or backups should be reflected in:
- the relevant page under
docs/current/ - the relevant page with a clearly marked To be verified gap
- the commit history, so future operators can trace when the state changed