Restoration Runbook
This is the current disaster recovery restore path for the KH3 infrastructure. Use it with the script usage guide and validation checklist:
The active rebuild path for CT 101 is now the rootless Podman path documented
in Rootless Podman Restore Runbook. The Docker
restore notes on this page remain useful because they record the verified backup
artifact facts and the errors discovered during the first restore exercise.
The restore path is not only documentation. The scripts validate SSH access, stage the real NAS backup by tar over SSH, validate backup artifacts, copy the staged backup into the target LXC through Proxmox, restore the selected services, and verify the restored service state.
Current Evidence
Read-only checks on June 16, 2026 confirmed:
ssh pvereaches the Proxmox host andpct,qm, andpvesmare present.- The live short hostname returned by that alias was
pve. Older notes call the hostpve02, so the restore answer file accepts both by default. ssh -p 2242 nascan see/volume1/vm_backup/proxmox-rebuild-20260614-191305.- The NAS backup directory displays broad mode bits through the NAS/CIFS view; Synology ACLs must be checked before treating staged files as protected.
- The
websitesdirectory exists but was empty in the verified backup set. - Current Proxmox guest listing did not match older notes:
qm listshowed pfSense as VM100, and no LXCs were listed. Treat this as live-state drift to investigate before running a destructive rebuild step.
Backup Source and Staging
| Item | Value |
|---|---|
| Backup source | nas:/volume1/vm_backup/proxmox-rebuild-20260614-191305 |
| NAS access | ssh -p 2242 nas |
| Workstation stage | /tmp/proxmox-rebuild-20260614-191305-staged by default |
| Workstation marker | .copy-complete inside the staged directory |
| Docker LXC stage | /opt/docker/recovery/proxmox-rebuild-20260614-191305 |
| Docker LXC marker | .copy-complete inside the LXC staged directory |
| Docker compose target | /opt/docker/compose |
| Docker data target | /opt/docker/volumes |
| Podman LXC stage | /opt/podman/restore/proxmox-rebuild-20260614-191305 |
| Podman data target | /opt/podman/volumes |
| Podman env target | /opt/podman/env |
The stage copy deliberately uses tar over SSH, not SCP or SFTP:
ssh -p 2242 nas 'tar -C /volume1/vm_backup/proxmox-rebuild-20260614-191305 -cf - .' |
tar -C /tmp/proxmox-rebuild-20260614-191305-staged -xf -
Use scripts/restore/01-stage-backup.sh instead of typing this manually. The
script writes .copy-complete only after tar extraction succeeds. A normal
rerun reuses the staged copy. Set FORCE_BACKUP_COPY=1 only when replacing a
known bad or obsolete staged copy.
Active Podman Script Order
Run from the administration workstation unless noted otherwise:
cp scripts/podman/podman-answer.env.example scripts/podman/podman-answer.env
chmod 600 scripts/podman/podman-answer.env
Edit the answer file and keep it out of Git. For the current restored test
target, CT 101 is podman-lxc at 192.168.2.100/24 with gateway
192.168.2.1. The nameserver may temporarily be 1.1.1.1 only while internal
DNS at 192.168.2.2 is unavailable.
Create or converge the Podman LXC from the Proxmox host:
ssh pve 'mkdir -p /root/kh3-podman-restore'
scp scripts/podman/common.sh scripts/podman/00-create-podman-lxc-101.sh scripts/podman/podman-answer.env pve:/root/kh3-podman-restore/
ssh pve 'bash /root/kh3-podman-restore/00-create-podman-lxc-101.sh --answer-file /root/kh3-podman-restore/podman-answer.env'
Bootstrap, generate Quadlets, stage the backup, restore data, and validate:
ssh pve 'pct push 101 /root/kh3-podman-restore/podman-answer.env /root/podman-answer.env --perms 0600'
ssh pve 'pct exec 101 -- bash -s -- --answer-file /root/podman-answer.env' < scripts/podman/01-bootstrap-rootless-podman-lxc.sh
ssh pve 'pct exec 101 -- bash -s -- --answer-file /root/podman-answer.env' < scripts/podman/03-generate-rootless-quadlets.sh
scripts/podman/02-stage-backup-to-podman-lxc.sh --answer-file scripts/podman/podman-answer.env
Only after staging is complete and the operator is ready to restore service data, set:
CONFIRM_PODMAN_RESTORE=restore-podman-services
Push the updated answer file, then run:
scp scripts/podman/podman-answer.env pve:/root/kh3-podman-restore/podman-answer.env
ssh pve 'pct push 101 /root/kh3-podman-restore/podman-answer.env /root/podman-answer.env --perms 0600'
ssh pve 'pct exec 101 -- bash -s -- --answer-file /root/podman-answer.env' < scripts/podman/04-restore-rootless-services.sh
ssh pve 'pct exec 101 -- bash -s -- --answer-file /root/podman-answer.env' < scripts/podman/05-validate-rootless-services.sh
Historical Docker Script Order
Run from the administration workstation unless noted otherwise:
cp scripts/restore/restore-answer.env.example scripts/restore/restore-answer.env
chmod 600 scripts/restore/restore-answer.env
Edit the answer file and keep it out of Git. Then run:
scripts/restore/00-preflight-access.sh --answer-file scripts/restore/restore-answer.env
scripts/restore/01-stage-backup.sh --answer-file scripts/restore/restore-answer.env
scripts/restore/02-validate-artifacts.sh --answer-file scripts/restore/restore-answer.env
If CT 101 docker-host has just been created and
scripts/bootstrap-debian13-docker-lxc.sh has printed
Docker installation verified, this is the next checkpoint. Continue from the
three restore wrapper commands above. They run from the administration
workstation and use ssh pve plus pct exec; direct SSH to docker-host is
not required.
Only after the new Docker LXC exists, Docker is installed, artifact validation passes, and the operator is ready to start services, set:
CONFIRM_RESTORE=restore-docker-services
Then run:
scripts/restore/03-restore-docker-services.sh --answer-file scripts/restore/restore-answer.env
scripts/restore/04-validate-services.sh --answer-file scripts/restore/restore-answer.env
scripts/restore/run-restore.sh runs all stages in order, but do not use it for
the first exercise. Step through the individual scripts so failures are easier
to understand.
Podman Step Results
| Step | Expected result | Safe rerun behavior |
|---|---|---|
00-create-podman-lxc-101.sh |
CT 101 podman-lxc exists, remains unprivileged, and has the narrow /dev/net/tun passthrough needed for rootless Podman networking |
Accepts the existing correct CT and converges settings |
01-bootstrap-rootless-podman-lxc.sh |
podsvc exists, rootless Podman works, /etc/subuid and /etc/subgid use podsvc:10000:50000, and /opt/podman exists |
Keeps the existing service user and directory tree |
02-stage-backup-to-podman-lxc.sh |
Local and LXC staged backups contain .copy-complete |
Does not recopy when markers exist unless FORCE_BACKUP_COPY=1 |
03-generate-rootless-quadlets.sh |
Rootless Quadlets exist under /home/podsvc/.config/containers/systemd and env files exist under /opt/podman/env |
Replaces Quadlet files but does not overwrite edited env files |
04-restore-rootless-services.sh |
Application archives are restored, PostgreSQL dumps are imported, ownership is mapped with podman unshare, and services start |
Skips marked restores unless FORCE_RESTORE=1 |
05-validate-rootless-services.sh |
Required containers run, PostgreSQL is healthy, databases exist, direct HTTP ports answer, and no user unit is failed | Read-only; safe to rerun |
Docker Step Results
| Step | Expected result | Safe rerun behavior |
|---|---|---|
00-preflight-access.sh |
Proxmox hostname, paths for pct and qm, active storage output, NAS file list |
Read-only; safe to rerun |
01-stage-backup.sh |
staged backup complete or using existing staged backup |
Does not recopy when .copy-complete exists unless forced |
02-validate-artifacts.sh |
OK lines for archives, dumps, key app files, and a warning if websites are empty |
Read-only against staged backup; safe to rerun |
03-restore-docker-services.sh |
LXC identity check, staged copy to LXC, service restore logs | Does not recopy or re-import data when markers exist unless forced |
04-validate-services.sh |
Network, PostgreSQL, database, and container OK lines |
Read-only; safe to rerun |
Destructive or Potentially Destructive Actions
- Creating or reconfiguring CT
101can affect an existing container if the ID is reused incorrectly. The creation script refuses a wrong hostname. - The Podman CT creation path intentionally keeps CT
101unprivileged. The/dev/net/tunpassthrough is for rootless Podman networking and is not a reason to switch the application containers to rootful mode. 04-restore-rootless-services.shwrites application archives into/opt/podman/volumes, rewrites recovered env files under/opt/podman/env, and imports PostgreSQL dumps. Normal reruns use markers;FORCE_RESTORE=1intentionally reapplies data and should be treated as destructive.03-restore-docker-services.shstarts and recreates Docker containers in the target LXC.- PostgreSQL logical restore uses
pg_restore --clean --if-existsinside the restored database. It is guarded by.logical-restore-complete; setFORCE_RESTORE=1only when intentionally reimporting. - Application archive restore writes into
/opt/docker/volumes/giteaand/opt/docker/volumes/vaultwarden. Existing data causes a skip marker on normal reruns. - Never run restore scripts against the old production Docker host unless the answer file explicitly points to the intended test target and the operator accepts the risk.
Never Do This
- Do not restore from redacted pfSense XML.
- Do not edit
known_hostsblindly if SSH reports a host key mismatch. Confirm the host identity from console access or a trusted administrator, then remove only the obsolete key for that host. - Do not use
ssh dockeras a required path. It is optional and guarded by a timeout because it previously hung. - Do not start services while
.envfiles still containreplace-with-restored-secret. - Do not start Podman services while
/opt/podman/env/*.envfiles still containCHANGE_ME_BEFORE_START. - Do not use
StrictHostKeyChecking=noafter the NAS IP or host key changes. Confirm the fingerprint, then accept the new key deliberately. - Do not copy
/var/lib/dockeras the recovery source. - Do not delete the NAS backup or
/tmpstaged copy until restore validation is complete and a second backup exists.
Resume After Interruption
- Rerun
00-preflight-access.sh. - Rerun
01-stage-backup.sh. If.copy-completeexists, it reuses the staged copy. - Rerun
02-validate-artifacts.shand review the report path it prints. - For service restore, leave
FORCE_RESTORE=0unless a failed import must be intentionally replaced. - Rerun
03-restore-docker-services.sh; it reuses the LXC staged copy and skips marked data/database restores. - Rerun
04-validate-services.sh.
If a marker exists but the preceding output shows an incomplete operation, preserve the failed staged directory for inspection, create a new stage path in the answer file, and rerun from staging. Do not remove evidence during a real incident until an administrator has reviewed it.