The docs still described Cloudflare as DNS + CDN in front of helsinki-a, but that was dropped in #90 - pez.sh lives on Hetzner DNS via Terraform now and records point straight at the origin. Updated README, architecture, networking, getting-started and the nuremberg-a host doc to match, and noted that pez.solutions still resolves via Cloudflare outside Terraform. Also fixed while I was in there: - terraform/README: PagerDuty provider is ~> 3.32 (table said ~> 2.2), and the B2 secret keys are backblaze_keyID/backblaze_applicationKey - secrets docs: group_vars secrets file is .enc.yaml, dropped the FreeBSD install steps, the long-gone .sops.yaml placeholder note and the ANSIBLE_VAULT_PASS migration note, swapped the cloudflare_record example for hcloud - getting-started referenced ansible/scripts/sops-setup.sh which doesn't exist - added naveen.pez.sh to the subdomain tables and a note about the DNS-only records (mail, minecraft, wow, public)
6.6 KiB
Getting Started
How to work with this repo, deploy changes, and not break things.
Prerequisites
You'll need:
- Tailscale — installed and connected to the tailnet. All SSH access goes through Tailscale. No servers have SSH exposed on the public internet.
- SSH keys — set up for each host you need to access
- Ansible — for configuration management and deployments (
make depsfromansible/installs collections) - OpenTofu (or Terraform) — for Hetzner (servers + DNS), Grafana Cloud, and PagerDuty
- Docker — helpful to understand, since most services are containerised
- SOPS + age — for secrets encryption/decryption (see Secrets for setup)
- Git — obviously
- gh CLI — for GitHub operations (PRs, issues, etc.)
Clone the repo
git clone git@github.com:RWejlgaard/pez-infra.git
cd pez-infra
Repo Structure
pez-infra/
├── docs/ # You are here
├── ansible/ # Ansible playbooks, roles, inventory, and all managed files
│ ├── roles/ # Ansible roles (common, caddy, docker, media_stack, proxmox_ve, etc.)
│ ├── services/ # Docker Compose definitions and service configs
│ ├── dotfiles/ # Shell config (fish, nvim, tmux, git, etc.)
│ ├── playbooks/ # One-off playbooks (updates, reboots, status)
│ └── scripts/ # Utility and maintenance scripts
└── terraform/ # Terraform/OpenTofu for Hetzner (servers + DNS), Grafana Cloud, PagerDuty
Connecting to hosts
All access is via Tailscale, as root. Once you're on the tailnet, SSH using the Tailscale IP or hostname:
ssh root@helsinki-a # or ssh root@100.67.6.27
ssh root@london-a # Proxmox VE host
ssh root@london-b # storage / media
ssh root@london-c # Raspberry Pi
ssh root@copenhagen-a
ssh root@copenhagen-c # Raspberry Pi
ssh root@nuremberg-a
Common Tasks
Deploying configuration changes
Ansible handles deployments. The unified deploy.yml rebuilds a host from bare-metal-with-Tailscale to fully configured.
cd ansible/
# Install collections
make deps
# Dry run — see what would change
make deploy-check
# Deploy everything
make deploy
# Deploy a single host
make deploy-host HOST=london-b
# Or run a single stage
ansible-playbook deploy.yml --tags docker
Ansible also runs automatically via GitHub Actions on commits to the main branch — so a quick commit from your phone can fix a misconfiguration when you're out.
Other playbooks live under ansible/playbooks/:
| Playbook | Purpose |
|---|---|
update-all.yml |
OS package updates (all hosts) |
update-linux.yml |
Linux-only updates (apt) |
docker-status.yml |
Show running containers per host |
reboot.yml |
Safe reboot with pre-flight (interactive confirm for london-b) |
zfs.yml |
ZFS scrub scheduling |
Managing cloud + DNS + observability
Terraform manages Hetzner servers + DNS, Grafana Cloud (stack, fleet, dashboards, synthetic checks), and PagerDuty:
cd terraform
make init # initialize providers and B2 backend
make plan # preview changes
make apply # apply the changes
State lives in a Backblaze B2 bucket (pez-infra-tfstate) via the S3-compatible backend. Don't click around in the Hetzner or Grafana Cloud dashboards — if it's not in Terraform, it doesn't exist.
Adding a new service
- Create a Docker Compose file in
ansible/services/<service-name>/docker-compose.yml(or a systemd unit if it's native) - Add the host_var — list the service under
docker_services(orsystemd_services) inansible/inventory/host_vars/<host>.yml - Add the Caddy route — if it needs a public subdomain, add a block to
ansible/services/caddy/Caddyfile - Add a DNS record — add the subdomain to
terraform/hetzner/dns.tfand runtofu apply - Add monitoring — if the service has a metrics endpoint, scrape it via Alloy (
terraform/grafana/fleet_pipelines/) - Update docs — add the service to
docs/services.md(and the relevantdocs/hosts/<host>.mdpage)
Adding a new server
- Install the OS (Debian 13 or Ubuntu LTS preferred — see below)
- Set up SSH keys for
root - Install Tailscale and join the tailnet
- Add the host to
ansible/inventory/hosts.iniand createansible/inventory/host_vars/<host>.yml - Run
make deploy-host HOST=<new-host>fromansible/ - Register the host as a Grafana Fleet collector in
terraform/grafana/fleet_collectors.tfandtofu apply - Add a doc at
docs/hosts/<host>.mdand updatedocs/services.md+docs/architecture.md
That's it. The common role installs node_exporter, systemd_exporter, and Alloy as part of the baseline, so observability is automatic.
Working with ZFS (london-b)
# Check pool status
zpool status hdd
# Check usage
zfs list
# Scrub status (runs weekly on Sundays at 12:00)
zpool status hdd | grep scan
ZFS is set up with 3× RAIDZ1 vdevs of 4 drives each (12 drives total) on the hdd pool. Tolerates one drive failure per vdev. The long-term plan is to replace the 8 TB drives with 24 TB drives and grow the pool toward 24 drives / ~0.5 PB raw.
OS Choice
- Debian (12 or 13) is the default for new hosts — including the Raspberry Pis. Stable, well-supported by Ansible, predictable.
- Ubuntu LTS is on london-b and copenhagen-a (historical — both came up before the Debian standard).
- Proxmox VE (Debian Bookworm under the hood) on london-a.
- No more FreeBSD. london-a used to run FreeBSD for Prometheus/Grafana; that's all on Grafana Cloud now and london-a is Linux/Proxmox.
Alpine has been tried and rejected — the missing GNU binaries / systemd caused enough Ansible headaches to not be worth the size savings.
Secrets
Secrets are encrypted in-repo using SOPS + age. Encrypted files have .enc. in their extension (e.g. secrets.enc.yaml).
# Edit an encrypted file
sops ansible/services/authelia/config.enc.yml
# Decrypt to stdout
sops -d ansible/services/authelia/config.enc.yml
Full documentation: docs/secrets.md
Branching
mainis the production branch. Ansible runs from main via GitHub Actions.- Feature branches for changes, PRs for review.
- Branch naming:
<author>/PESO-<number>-<description>for Jira-tracked work.
Consolidated Repos
This monorepo replaces several standalone repos:
| Old repo | Now lives in |
|---|---|
| pez-ansible | ansible/ |
| pez-terraform | terraform/ |
| pez-grafana | terraform/grafana/ |
| pez-proxy | ansible/services/caddy/ |
| pez-docs | docs/ |
| server-scripts | ansible/scripts/ and ansible/roles/ |