pez-infra/ansible
Rasmus "Pez" Wejlgaard 9815f44b84
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
fix: stop masking failed service deploys; trim dead config (#119)
The docker_services and systemd_services roles ran their "start the
service" tasks with `failed_when: false`, so a container or unit that
failed to come up still reported the deploy as green. Drop it from both
start tasks so a broken deploy actually fails CI. The compose/unit *copy*
tasks keep `failed_when: false` — that's load-bearing for the
`item is not failed` filter that skips services without a compose/unit file.

Also:
- Remove a duplicate "Template service .env files" task in docker_services
  (second copy used a hardcoded path and didn't register; first one is the
  one the start task reads).
- Don't trigger a full fleet deploy on docs/markdown/workflow-only pushes
  to main — add docs/**, **/*.md and .github/** to paths-ignore.
- Drop the dangling `update-freebsd` Make target (playbook doesn't exist;
  fleet has no FreeBSD hosts).
2026-06-04 18:41:24 +01:00
..
dotfiles remove pr-test.yml 2026-03-28 13:11:34 +00:00
group_vars/all fix: add smb mount (#107) 2026-05-14 20:49:25 +01:00
inventory fix: update config for london-a for new proxmox install (#101) 2026-05-09 19:22:34 +01:00
playbooks fix: cleanup freebsd and alpine stuff (#105) 2026-05-12 22:43:12 +01:00
roles fix: stop masking failed service deploys; trim dead config (#119) 2026-06-04 18:41:24 +01:00
scripts only send email if something went wrong with backups (#60) 2026-04-06 18:33:07 +01:00
services fix: update octopus exporter (#113) 2026-05-26 20:56:07 +01:00
.ansible-lint fix: actually decomission nextcloud and TWDNE (#72) 2026-04-25 18:19:16 +01:00
.yamllint ignore all SOPS-encrypted files in yamllint 2026-03-28 18:50:08 +00:00
ansible.cfg adding london-c (#66) 2026-04-20 20:52:19 +01:00
deploy.yml fix: cleanup deploy.yml and share workflow (#108) 2026-05-15 20:17:28 +01:00
Makefile fix: stop masking failed service deploys; trim dead config (#119) 2026-06-04 18:41:24 +01:00
README.md fix: Documentation overhaul (#112) 2026-05-19 18:49:21 +01:00
requirements.yml initial commit 2026-03-28 12:39:41 +00:00

Ansible — Deploy & Maintain

One-command deploy playbook for rebuilding hosts from repo state.

Quick Start

cd ansible/

# Install dependencies
make deps

# Dry run — see what would change
make deploy-check

# Deploy everything
make deploy

# Deploy a single host
make deploy-host HOST=helsinki-a

Playbooks

Playbook Purpose Usage
deploy.yml Full host rebuild from repo make deploy or --limit <host>
playbooks/update-all.yml OS package updates (all hosts, apt) make update-all
playbooks/update-linux.yml Alias for update-all (apt) make update-linux
playbooks/docker-status.yml Show running containers make docker-status
playbooks/reboot.yml Safe reboot with pre-flight make reboot HOST=<host>
playbooks/zfs.yml ZFS scrub scheduling (london-b) ansible-playbook playbooks/zfs.yml

Deploy Stages

The deploy playbook runs in stages, each independently taggable (see deploy.yml):

  1. common / baseline — Baseline packages, SSH hardening, fish shell, dotfiles
  2. docker — Docker engine on container hosts (docker_hosts group)
  3. services — Per-host service deployment:
    • helsinki-a: Caddy + status-page + custom systemd units
    • docker_hosts: Docker Compose stacks from services/
    • nuremberg-a: poste.io mail (Docker)
    • london-b: media_stack + backup (rclone to B2)
    • copenhagen-a: MaNGOS systemd units + MariaDB
    • london-a: proxmox_ve (apt repo, nag patch, CIFS storage)
    • zfs_hosts: ZFS scrub scheduling

Observability (node_exporter, systemd_exporter, Grafana Alloy) is part of the common baseline — every host gets it.

Run a single stage: ansible-playbook deploy.yml --tags docker

Roles

Role Description
common Base packages, SSH hardening, fish shell, exporters, Alloy
dotfiles Shell config from dotfiles/
docker Docker engine install and setup
docker_services Deploy compose files from services/
caddy Caddy reverse proxy (helsinki-a)
status_page status.pez.sh generator script + cron
systemd_services Custom systemd units from services/
media_stack *Arr stack, Plex/Jellyfin, Samba, Syncthing on london-b
backup rclone-to-B2 cron job on london-b
mariadb Native MariaDB (used by MaNGOS on copenhagen-a)
proxmox_ve PVE no-subscription repo, UI lockdown, CIFS storage
zfs Weekly scrub cron on ZFS hosts

Inventory

Hosts are grouped by OS and role. All use Tailscale IPs, SSH as root. Per-host variables in inventory/host_vars/<hostname>.yml.

Safety Notes

  • london-b: Reboot playbook requires interactive confirmation (critical storage)
  • copenhagen-a: Reboot includes netplan pre-flight check (static IP verification)
  • All playbooks use ignore_unreachable: true for fleet operations
  • --check --diff is your friend — always dry-run first on production