Skip to content

Recovery Script Usage

This page names the current scripts and where each command runs.

Current Restore Scripts

Script Run from Purpose
scripts/restore/00-preflight-access.sh Administration workstation Validate ssh pve, ssh -p 2242 nas, NAS path, optional guarded ssh docker, and target CT identity
scripts/restore/01-stage-backup.sh Administration workstation Copy the NAS backup to local /tmp with tar over SSH and write .copy-complete
scripts/restore/02-validate-artifacts.sh Administration workstation Validate archive shape, database dump shape, and required service artifacts
scripts/restore/03-restore-docker-services.sh Administration workstation via ssh pve and pct exec Copy staged backup into CT 101 and run the inner Docker restore
scripts/restore/04-validate-services.sh Administration workstation via ssh pve and pct exec Validate networks, PostgreSQL, restored databases, containers, and Traefik availability
scripts/recover-docker-services.sh Inside target Docker LXC, normally invoked by stage 03 Restore Docker networks, compose files, env files, archives, PostgreSQL dumps, and services

The older root-level helper scripts are still relevant:

Script Run from Purpose
scripts/create-docker-host-lxc-101.sh Proxmox host as root Create or update CT 101 docker-host; refuses an existing wrong hostname
scripts/bootstrap-debian13-docker-lxc.sh Inside the new Docker LXC as root Install Docker Engine and create /opt/docker/compose and /opt/docker/volumes
scripts/proxmox-host-maintenance.sh Proxmox host as root Optional host maintenance; not part of a restore unless planned

Create the Docker LXC

scripts/create-docker-host-lxc-101.sh must run on the Proxmox host as root. From the administration workstation, the simplest safe pattern is to stream the local script over the documented SSH alias:

ssh pve 'bash -s' < scripts/create-docker-host-lxc-101.sh

Run this read-only check first:

ssh pve 'hostname; command -v pct; command -v pveam; pct status 101 || true; pct config 101 || true'

If CT 101 already exists, the script accepts it only when its hostname is the expected docker-host. A different hostname causes the script to stop before changing that container.

To run with explicit settings, copy the answer file to Proxmox, edit it there, then stream the script:

scp scripts/recovery-answer.env.example pve:/root/recovery-answer.env
ssh pve 'nano /root/recovery-answer.env'
ssh pve 'bash -s -- --answer-file /root/recovery-answer.env' < scripts/create-docker-host-lxc-101.sh

The answer file path in that command is a path on the Proxmox host, not on the workstation.

The create script downloads the Debian LXC template when needed. It queries Proxmox template metadata with pveam available --section system, selects the latest debian-13-standard_*_amd64.tar.zst, checks the configured TEMPLATE_STORAGE, and runs pveam download only if the template is missing.

The created container is unprivileged:

--unprivileged 1

The script always enables the standard Docker-in-unprivileged-LXC features:

nesting=1,keyctl=1

Those are applied at creation and again with pct set, so rerunning the script keeps the container converged.

Broader LXC relaxations are available only when explicitly requested:

RELAXED_LXC_SECURITY=1

That setting appends these Proxmox LXC config lines if they are missing:

lxc.apparmor.profile: unconfined
lxc.cgroup2.devices.allow: a
lxc.cap.drop:

Leave RELAXED_LXC_SECURITY=0 unless Docker workloads require those broader permissions. The script does not currently set a low-port sysctl such as net.ipv4.ip_unprivileged_port_start. For rootful Docker inside the LXC, published ports such as 80 and 443 are normally handled by Docker running as root inside the container. If a restored workload still cannot bind low ports, record the failure and decide whether RELAXED_LXC_SECURITY=1 is justified.

After CT creation succeeds, install Docker from inside the new LXC with scripts/bootstrap-debian13-docker-lxc.sh. That bootstrap script does not download the LXC image; it installs Docker Engine in an already-created Debian container.

Bootstrap Docker in CT 101

After the create script reports CT 101 is ready for bootstrap, run the Docker bootstrap from the administration workstation through Proxmox:

ssh pve 'pct exec 101 -- bash -s' < scripts/bootstrap-debian13-docker-lxc.sh

If the bootstrap needs values from /root/recovery-answer.env, remember that pct exec runs inside CT 101. A file at /root/recovery-answer.env on the Proxmox host is not automatically visible inside the LXC. Push it into the LXC first:

ssh pve 'pct push 101 /root/recovery-answer.env /root/recovery-answer.env --perms 0600'
ssh pve 'pct exec 101 -- bash -s -- --answer-file /root/recovery-answer.env' < scripts/bootstrap-debian13-docker-lxc.sh

For the current script, the answer file is optional because TIMEZONE defaults to Africa/Accra.

The bootstrap script sets LANG=C.UTF-8 and LC_ALL=C.UTF-8 to avoid locale warnings inherited from the workstation. Earlier runs showed unsupported LC_CTYPE=UTF-8 warnings during pct, Perl, and APT operations; those warnings were noisy but not the cause of the Docker install failure.

The Debian 13 template used during the June 17, 2026 restore did not provide the previously listed software-properties-common package from the configured repositories. The bootstrap script no longer installs that package because Docker's official repository only needs ca-certificates, curl, gnupg, and the keyring-scoped source file the script writes.

Successful bootstrap output includes:

Docker Engine - Community
Docker Compose version
Docker installation verified

Verify the completed bootstrap with:

ssh pve 'pct exec 101 -- docker version; pct exec 101 -- docker compose version; pct exec 101 -- ls -ld /opt/docker/compose /opt/docker/volumes'

If direct SSH to docker-host asks for a password, do not assume one exists. The create script sets a root password only when PASSWORD_FILE is configured before CT creation. Use pct exec or pct enter through Proxmox for recovery work unless direct SSH access is intentionally configured later.

Template Download and IPv6

The template download may prefer IPv6 if the Proxmox host resolves both A and AAAA records for download.proxmox.com. During the June 17, 2026 restore, IPv6 connections repeatedly timed out. The immediate workaround was to stop the download and force IPv4 for the template:

ssh pve 'wget -4 -c -O /var/lib/vz/template/cache/debian-13-standard_13.1-2_amd64.tar.zst http://download.proxmox.com/images/system/debian-13-standard_13.1-2_amd64.tar.zst'

After the file downloads and verifies, rerun the create script. It will see the template in local storage and continue.

To make Proxmox prefer IPv4 generally without fully disabling IPv6, review and then uncomment this line in /etc/gai.conf:

precedence ::ffff:0:0/96  100

For wget only, inet4_only = on in /etc/wgetrc forces IPv4. Prefer the narrowest change that solves the download problem.

Current Backup Scripts

Script Run from Purpose
scripts/backup/00-prepare-backup-root.sh Proxmox host as root Create the timestamped backup root on the CIFS backup mount
scripts/backup/01-capture-proxmox.sh Proxmox host as root Capture host, storage, VM, and LXC definitions
scripts/backup/02-capture-dns.sh Proxmox host as root Capture DNS reference from CT 107 and optional Teleporter file
scripts/backup/03-capture-docker-definitions.sh Proxmox host as root Capture Docker compose/env/runtime reference from CT 100
scripts/backup/04-export-databases.sh Proxmox host as root Export PostgreSQL, MariaDB, and MongoDB logical backups
scripts/backup/05-archive-applications.sh Proxmox host as root Archive retained application directories with volatile exclusions
scripts/backup/06-archive-websites.sh Proxmox host as root Archive website working trees when repository state is not enough
scripts/backup/07-verify-backup.sh Proxmox host as root Verify archives, dumps, and checksums
scripts/backup/08-encrypt-and-copy-backup.sh Proxmox host as root Create encrypted archive when age is configured and copy to a second destination

Answer Files

Restore:

cp scripts/restore/restore-answer.env.example scripts/restore/restore-answer.env
chmod 600 scripts/restore/restore-answer.env

Backup, on Proxmox:

cp scripts/backup/backup-answer.env.example /root/backup-answer.env
chmod 600 /root/backup-answer.env

The answer files are shell syntax. Do not commit filled copies.

Dry Runs

Most wrapper scripts support:

--dry-run

Dry-run mode checks arguments and prints actions that would write data. It does not prove that every remote command will succeed, so use it before a real run, not instead of validation.

Host Identity Rules

  • Proxmox commands require pct, qm, and pvesm.
  • Accepted Proxmox short hostnames default to pve pve02 because the live SSH alias returned pve on June 16, 2026 while older docs said pve02.
  • Docker restore commands inside the LXC require docker and docker compose.
  • NAS validation requires the configured backup directory to exist.
  • Direct ssh docker is never required.

Marker Files

Marker Meaning
Workstation .copy-complete NAS backup was fully extracted into the local staged directory
LXC .copy-complete Local staged backup was fully streamed into the Docker LXC
/opt/docker/volumes/postgresql/.logical-restore-complete PostgreSQL logical dumps were restored
/opt/docker/volumes/gitea/.archive-restore-complete Forgejo archive data was applied or safely skipped because target data existed
/opt/docker/volumes/vaultwarden/.archive-restore-complete Vaultwarden archive data was applied or safely skipped because target data existed

Markers make reruns converge. Remove or bypass them only after preserving the failed state and deciding that a forced restore is correct.

Restore Troubleshooting Notes

When the staged backup is streamed from macOS into the LXC, GNU tar in Debian may print messages like:

tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.provenance'

Those are macOS extended attributes added by the workstation tar implementation. They are not recovery data and can be ignored when tar exits successfully.

During PostgreSQL restore, globals.sql is applied before database dumps. A fresh official PostgreSQL container already has the postgres role, and a rerun after a partial restore may already have application roles. The restore script skips only CREATE ROLE lines for roles that already exist and keeps the later ALTER ROLE statements so restored attributes and password hashes are still applied.

Recovered compose files often rely on a sibling .env file for values such as DOMAIN, FQDN, database hostnames, and service ports. Running docker compose -f /path/to/compose.yml from another directory can use the wrong environment source. The restore script therefore runs Compose with --project-directory set to the compose file directory, so each recovered stack loads its own .env.

Shell environment variables also override values from a Compose project's .env. During the June 17, 2026 restore, the wrapper-level DOMAIN=kh3group.com overrode Vaultwarden's recovered DOMAIN=https://pass.kh3group.com and caused Vaultwarden to restart with:

DOMAIN variable needs to contain the protocol

The restore script now unsets known recovered-project keys before invoking Docker Compose, so the per-project .env values are used. Avoid running docker compose config in shared terminals or logs because it expands secret values from .env.

The historical Docker host used database hostnames that may not exist in the rebuilt Docker network. During the June 17, 2026 restore, Forgejo and Vaultwarden were recovered with DB_HOST=db2, while the rebuilt PostgreSQL container was named postgresql. The restore script now normalizes these application .env values to the rebuilt service name:

Forgejo DB_HOST=postgresql:5432
Vaultwarden DB_HOST=postgresql

PostgreSQL custom-format dumps can restore tables with the expected application owner while leaving the database itself owned by postgres, because the fresh database was created by the restore process before pg_restore. Vaultwarden then fails during startup migrations with:

permission denied for schema public

After restoring each application dump, the restore script now sets the database and public schema ownership/grants for the matching application role.

The Forgejo runner runs as UID/GID 1000:996 in the captured compose file, but the rebuilt host may assign a different GID to the docker group. During the June 17, 2026 restore, /var/run/docker.sock was root:docker with GID 991, so the runner could read .runner but could not use the Docker socket:

permission denied while trying to connect to the Docker daemon socket

The restore script now rewrites the runner compose user group to the current Docker socket GID and normalizes runner-data ownership to 1000:<docker-gid> with mode 0750.

The restored runner registration file can also point at the public Forgejo URL, for example https://git.kh3group.com. Before Traefik is restored, that public route may return 502 Bad Gateway even while Forgejo is healthy on the Docker backend network. The restore script normalizes the runner registration address to the internal backend URL:

http://forgejo:3000

This lets the runner start without depending on external route validation. Validate the public Forgejo route separately after Traefik is restored.

If 03-restore-docker-services.sh fails after PostgreSQL has started but before .logical-restore-complete is written, leave FORCE_RESTORE=0, keep the staged backup markers, and rerun the same script after fixing the cause. The script should reuse the staged backup and converge.