From 0a357fc69abcb5ef9bda7f4aae6a0d6ab835551c Mon Sep 17 00:00:00 2001 From: "Rasmus \"Pez\" Wejlgaard" Date: Wed, 10 Jun 2026 20:59:23 +0100 Subject: [PATCH] docs: catch up with the Cloudflare to Hetzner DNS move, fix secrets/terraform drift (#130) The docs still described Cloudflare as DNS + CDN in front of helsinki-a, but that was dropped in #90 - pez.sh lives on Hetzner DNS via Terraform now and records point straight at the origin. Updated README, architecture, networking, getting-started and the nuremberg-a host doc to match, and noted that pez.solutions still resolves via Cloudflare outside Terraform. Also fixed while I was in there: - terraform/README: PagerDuty provider is ~> 3.32 (table said ~> 2.2), and the B2 secret keys are backblaze_keyID/backblaze_applicationKey - secrets docs: group_vars secrets file is .enc.yaml, dropped the FreeBSD install steps, the long-gone .sops.yaml placeholder note and the ANSIBLE_VAULT_PASS migration note, swapped the cloudflare_record example for hcloud - getting-started referenced ansible/scripts/sops-setup.sh which doesn't exist - added naveen.pez.sh to the subdomain tables and a note about the DNS-only records (mail, minecraft, wow, public) --- README.md | 14 +++++++------- docs/architecture.md | 16 ++++++++-------- docs/getting-started.md | 15 ++++++--------- docs/hosts/helsinki-a.md | 1 + docs/hosts/nuremberg-a.md | 2 +- docs/networking.md | 26 ++++++++++++++------------ docs/secrets.md | 19 +++++++------------ terraform/README.md | 6 +++--- 8 files changed, 47 insertions(+), 52 deletions(-) diff --git a/README.md b/README.md index 1bb80a3..f4d2946 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ Infrastructure-as-code monorepo for managing my homelab and cloud server fleet. ## What's in this repo - **Ansible** — Playbooks, roles, and inventory for configuring servers, deploying Docker-based services, and managing dotfiles -- **Terraform** — OpenTofu/Terraform configs for cloud resources (Hetzner Cloud, Cloudflare DNS, Grafana Cloud, PagerDuty) +- **Terraform** — OpenTofu/Terraform configs for cloud resources (Hetzner Cloud + DNS, Grafana Cloud, PagerDuty) - **Services** — Docker Compose definitions and config files for each self-hosted service - **Documentation** — Architecture decisions, networking topology, and operational guides @@ -13,7 +13,7 @@ Infrastructure-as-code monorepo for managing my homelab and cloud server fleet. ```mermaid graph TD - CF[Cloudflare
DNS + CDN] --> HEL[helsinki-a
Caddy proxy + SSO
Hetzner Cloud] + DNS[Hetzner DNS
pez.sh] --> HEL[helsinki-a
Caddy proxy + SSO
Hetzner Cloud] HEL --> TS{Tailscale mesh} TS --> LB[london-b
Storage, media
Docker + systemd] TS --> LA[london-a
Proxmox VE hypervisor] @@ -24,7 +24,7 @@ graph TD TS -.-> GC[Grafana Cloud
metrics, logs, traces] ``` -Traffic enters via Cloudflare DNS, hits a Caddy reverse proxy on a Hetzner cloud instance, and is forwarded to backend services running on various hosts connected over a Tailscale mesh network. Authentication for protected services is handled by Authelia with an LLDAP backend. Observability is shipped from every host to Grafana Cloud via Grafana Alloy. +DNS (Hetzner DNS for `pez.sh`, managed via Terraform) points directly at a Caddy reverse proxy on a Hetzner cloud instance, which terminates TLS and forwards to backend services running on various hosts connected over a Tailscale mesh network. Authentication for protected services is handled by Authelia with an LLDAP backend. Observability is shipped from every host to Grafana Cloud via Grafana Alloy. ### Hosts @@ -47,7 +47,7 @@ Traffic enters via Cloudflare DNS, hits a Caddy reverse proxy on a Hetzner cloud │ ├── dotfiles/ # Shell config (fish, nvim, tmux, git, etc.) │ ├── playbooks/ # One-off playbooks (updates, reboots, status) │ └── scripts/ # Utility and maintenance scripts -├── terraform/ # Terraform/OpenTofu for Hetzner, Cloudflare, Grafana Cloud, PagerDuty +├── terraform/ # Terraform/OpenTofu for Hetzner (servers + DNS), Grafana Cloud, PagerDuty └── docs/ # Architecture, networking, services, monitoring, and per-host docs ``` @@ -65,18 +65,18 @@ Traffic enters via Cloudflare DNS, hits a Caddy reverse proxy on a Hetzner cloud 1. **Clone:** `git clone git@github.com:RWejlgaard/pez-infra.git` 2. **Services:** Each service has its own directory under `ansible/services/` with a `docker-compose.yml` and config files 3. **Deploy:** `cd ansible && make deploy` runs the unified `deploy.yml` against the whole fleet (or `make deploy-host HOST=`) -4. **Infrastructure:** Terraform configs in `terraform/` manage Hetzner servers, Cloudflare DNS, Grafana Cloud, and PagerDuty +4. **Infrastructure:** Terraform configs in `terraform/` manage Hetzner servers + DNS, Grafana Cloud, and PagerDuty ### Secrets -Secrets are encrypted in-repo using [SOPS](https://github.com/getsops/sops) + [age](https://github.com/FiloSottile/age). Encrypted files use `.enc.` in their extension (e.g. `secrets.enc.yml`). See **[Secrets Management](docs/secrets.md)** for full setup and usage instructions. +Secrets are encrypted in-repo using [SOPS](https://github.com/getsops/sops) + [age](https://github.com/FiloSottile/age). Encrypted files use `.enc.` in their extension (e.g. `secrets.enc.yaml`). See **[Secrets Management](docs/secrets.md)** for full setup and usage instructions. ## Documentation Detailed documentation lives in [`docs/`](docs/): - **[Architecture](docs/architecture.md)** — Network topology, traffic flow, design principles -- **[Networking](docs/networking.md)** — Tailscale mesh, DNS flow, physical networking +- **[Networking](docs/networking.md)** — Tailscale mesh, DNS flow (Hetzner DNS), physical networking - **[Services](docs/services.md)** — Complete service map with ports, auth, and deployment info - **[Monitoring](docs/monitoring.md)** — Grafana Cloud, Alloy, synthetic checks, PagerDuty - **[Hosts](docs/hosts/)** — Per-host detail (hardware, services, quirks) diff --git a/docs/architecture.md b/docs/architecture.md index f8ef102..65f3a86 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -4,14 +4,14 @@ The infrastructure spans three physical locations (London, Copenhagen, Hetzner Cloud) connected by a Tailscale mesh network. All public traffic enters through a single Hetzner Cloud VPS (helsinki-a) running Caddy as a reverse proxy, which forwards requests over Tailscale to backend services running on physical servers in London and Copenhagen. -The setup is entirely self-hosted (with the exception of Hetzner Cloud VPSs, Cloudflare for DNS/CDN, and Grafana Cloud for observability). Most physical servers are old personal computers repurposed into server duty — cheaper than cloud, and I get a rack cabinet that doubles as a bedroom white noise machine. +The setup is entirely self-hosted (with the exception of Hetzner Cloud VPSs, Hetzner DNS, and Grafana Cloud for observability). Most physical servers are old personal computers repurposed into server duty — cheaper than cloud, and I get a rack cabinet that doubles as a bedroom white noise machine. ## Network Topology ```mermaid graph TD - CF["Cloudflare
DNS + CDN
*.pez.sh, *.pez.solutions"] - CF -->|HTTPS| HEL + DNS["DNS
Hetzner DNS: *.pez.sh
Cloudflare: *.pez.solutions"] + DNS -->|HTTPS| HEL HEL["helsinki-a
Hetzner Cloud VPS

Caddy (reverse proxy)
Authelia (SSO)
LLDAP (Authelia backend)
Bitwarden (Vaultwarden)
Forgejo"] @@ -34,12 +34,12 @@ graph TD All public-facing services follow the same pattern: ``` -User → Cloudflare (DNS + TLS) → helsinki-a (Caddy) → Backend (over Tailscale) +User → DNS (Hetzner DNS) → helsinki-a (Caddy, TLS) → Backend (over Tailscale) ``` -1. DNS for `pez.sh` and `pez.solutions` is managed by Cloudflare (provisioned via Terraform) -2. Cloudflare proxies traffic to helsinki-a -3. Caddy on helsinki-a terminates TLS and routes to the correct backend +1. DNS for `pez.sh` is managed by Hetzner DNS (provisioned via Terraform, `terraform/hetzner/dns.tf`); `pez.solutions` still resolves via Cloudflare (dashboard-managed) +2. Records point directly at helsinki-a's public IP — no CDN or proxying in front +3. Caddy on helsinki-a terminates TLS (Let's Encrypt) and routes to the correct backend 4. For protected services, Caddy calls Authelia first (`forward_auth`) 5. If authenticated (or no auth required), traffic is proxied over Tailscale to the backend @@ -80,5 +80,5 @@ Metrics, logs, and traces ship to **Grafana Cloud** from every host via **Grafan - **Self-hosted first.** Cloud VPSs only where it makes sense (public gateway, mail with clean IP reputation). Everything else runs on physical hardware I own. - **Tailscale as the backbone.** No ports exposed on residential IPs. All inter-server communication goes over the mesh. - **Ansible for everything.** If a server dies, reinstall the OS, install Tailscale, run `make deploy`. Roughly 30 minutes to full recovery. -- **Terraform for cloud + DNS.** Hetzner servers, Cloudflare records, Grafana Cloud configuration, and PagerDuty are all in code. No clicking around in dashboards. +- **Terraform for cloud + DNS.** Hetzner servers, DNS records, Grafana Cloud configuration, and PagerDuty are all in code. No clicking around in dashboards. - **Cattle, not pets (as much as possible).** The servers are technically pets — old hardware in specific locations — but the configs are cattle. Everything is reproducible from this repo. diff --git a/docs/getting-started.md b/docs/getting-started.md index 7a74b4c..510123e 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -9,9 +9,9 @@ You'll need: - **Tailscale** — installed and connected to the tailnet. All SSH access goes through Tailscale. No servers have SSH exposed on the public internet. - **SSH keys** — set up for each host you need to access - **Ansible** — for configuration management and deployments (`make deps` from `ansible/` installs collections) -- **OpenTofu** (or Terraform) — for Hetzner, Cloudflare, Grafana Cloud, and PagerDuty +- **OpenTofu** (or Terraform) — for Hetzner (servers + DNS), Grafana Cloud, and PagerDuty - **Docker** — helpful to understand, since most services are containerised -- **SOPS + age** — for secrets encryption/decryption (run `./ansible/scripts/sops-setup.sh`) +- **SOPS + age** — for secrets encryption/decryption (see [Secrets](secrets.md) for setup) - **Git** — obviously - **gh CLI** — for GitHub operations (PRs, issues, etc.) @@ -33,7 +33,7 @@ pez-infra/ │ ├── dotfiles/ # Shell config (fish, nvim, tmux, git, etc.) │ ├── playbooks/ # One-off playbooks (updates, reboots, status) │ └── scripts/ # Utility and maintenance scripts -└── terraform/ # Terraform/OpenTofu for Hetzner, Cloudflare, Grafana Cloud, PagerDuty +└── terraform/ # Terraform/OpenTofu for Hetzner (servers + DNS), Grafana Cloud, PagerDuty ``` ## Connecting to hosts @@ -89,7 +89,7 @@ Other playbooks live under `ansible/playbooks/`: ### Managing cloud + DNS + observability -Terraform manages Hetzner servers, Cloudflare DNS, Grafana Cloud (stack, fleet, dashboards, synthetic checks), and PagerDuty: +Terraform manages Hetzner servers + DNS, Grafana Cloud (stack, fleet, dashboards, synthetic checks), and PagerDuty: ```bash cd terraform @@ -98,7 +98,7 @@ make plan # preview changes make apply # apply the changes ``` -State lives in a Backblaze B2 bucket (`pez-infra-tfstate`) via the S3-compatible backend. Don't click around in the Cloudflare or Grafana Cloud dashboards — if it's not in Terraform, it doesn't exist. +State lives in a Backblaze B2 bucket (`pez-infra-tfstate`) via the S3-compatible backend. Don't click around in the Hetzner or Grafana Cloud dashboards — if it's not in Terraform, it doesn't exist. ### Adding a new service @@ -147,12 +147,9 @@ Alpine has been tried and rejected — the missing GNU binaries / systemd caused ## Secrets -Secrets are encrypted in-repo using **SOPS + age**. Encrypted files have `.enc.` in their extension (e.g. `secrets.enc.yml`). +Secrets are encrypted in-repo using **SOPS + age**. Encrypted files have `.enc.` in their extension (e.g. `secrets.enc.yaml`). ```bash -# First-time setup -./ansible/scripts/sops-setup.sh - # Edit an encrypted file sops ansible/services/authelia/config.enc.yml diff --git a/docs/hosts/helsinki-a.md b/docs/hosts/helsinki-a.md index 5d85e3a..55995de 100644 --- a/docs/hosts/helsinki-a.md +++ b/docs/hosts/helsinki-a.md @@ -45,6 +45,7 @@ Caddy also serves static content from `/srv/`: | `/srv/pez.sh` | pez.sh | — | | `/srv/pez.solutions` | pez.solutions | — | | `/srv/pez-signup` | signup.pez.solutions | — | +| `/srv/naveen` | naveen.pez.sh | — | ## Why Hetzner Cloud diff --git a/docs/hosts/nuremberg-a.md b/docs/hosts/nuremberg-a.md index 62e71fb..2ee9bde 100644 --- a/docs/hosts/nuremberg-a.md +++ b/docs/hosts/nuremberg-a.md @@ -28,7 +28,7 @@ Mail lives on its own VPS to isolate its IP reputation. If the IP gets flagged f ## DNS -Mail-related DNS records are managed via Cloudflare (Terraform): +Mail-related DNS records are managed in Hetzner DNS (Terraform, `terraform/hetzner/dns.tf`): - **MX** record for inbound mail routing - **SPF** for sender verification diff --git a/docs/networking.md b/docs/networking.md index 35bd65b..bff55aa 100644 --- a/docs/networking.md +++ b/docs/networking.md @@ -54,34 +54,33 @@ A stack of servers at my dad's place — acts as an off-site location. ## DNS Flow -All DNS is managed by Cloudflare, provisioned via Terraform. +DNS for `pez.sh` is managed by **Hetzner DNS**, provisioned via Terraform (`terraform/hetzner/dns.tf`). Cloudflare was dropped as DNS provider / CDN in April 2026 (PR #90) — records now point directly at the origin, with no proxying in front. ### Domains -- **pez.sh** — primary domain. Registered on Hover.com with nameservers pointed to Cloudflare. -- **pez.solutions** — alternate domain. Most services that have a `*.pez.sh` host also accept the matching `*.pez.solutions` host, so apps remain reachable if one TLD has trouble. +- **pez.sh** — primary domain. Registered on Hover.com with nameservers pointed to Hetzner DNS. All records in Terraform. +- **pez.solutions** — alternate domain. Still resolves via Cloudflare nameservers (dashboard-managed, not in Terraform). Most services that have a `*.pez.sh` host also accept the matching `*.pez.solutions` host, so apps remain reachable if one TLD has trouble. ### How a request reaches a service ```mermaid graph TD - Browser["1. Browser requests radarr.pez.sh"] --> CF - CF["2. Cloudflare resolves DNS
(proxied record)"] --> TLS - TLS["3. Cloudflare terminates TLS,
forwards to helsinki-a"] --> Caddy - Caddy["4. Caddy receives request"] --> AuthCheck{"5. Requires auth?"} + Browser["1. Browser requests radarr.pez.sh"] --> DNS + DNS["2. Hetzner DNS resolves
to helsinki-a's public IP"] --> Caddy + Caddy["3. Caddy terminates TLS,
receives request"] --> AuthCheck{"4. Requires auth?"} AuthCheck -->|YES| Authelia["forward_auth → Authelia
(localhost:9091)"] AuthCheck -->|NO| Proxy - Authelia -->|Authenticated| Proxy["6. Reverse-proxy to backend
over Tailscale
(e.g. london-b:7878)"] + Authelia -->|Authenticated| Proxy["5. Reverse-proxy to backend
over Tailscale
(e.g. london-b:7878)"] Authelia -->|Not authenticated| Redirect["Redirect to auth.pez.sh"] - Proxy --> Response["7. Response flows back:
backend → Caddy → Cloudflare → browser"] + Proxy --> Response["6. Response flows back:
backend → Caddy → browser"] ``` ### Public Subdomains -All subdomains are Cloudflare-proxied and terminate at helsinki-a. Hosts marked with both `pez.sh` and `pez.solutions` are reachable on either TLD. +All subdomains resolve directly to helsinki-a, where Caddy terminates TLS. Hosts marked with both `pez.sh` and `pez.solutions` are reachable on either TLD. | Subdomain | Backend | Auth | |---|---|---| @@ -94,6 +93,7 @@ All subdomains are Cloudflare-proxied and terminate at helsinki-a. Hosts marked | pez.sh | helsinki-a:/srv/pez.sh (static) | — | | pez.solutions | helsinki-a:/srv/pez.solutions (static) | — | | signup.pez.solutions | helsinki-a:/srv/pez-signup (static) | — | +| naveen.pez.sh | helsinki-a:/srv/naveen (static) | — | | london-a.pez.sh | london-a:8006 (Proxmox UI) | Proxmox login | | jellyfin.pez.sh / .solutions | london-b:8096 | Own auth | | plex.pez.sh / .solutions | london-b:32400 | Own auth | @@ -108,9 +108,11 @@ All subdomains are Cloudflare-proxied and terminate at helsinki-a. Hosts marked | soulseek.pez.sh / .solutions | london-b:5030 (slskd) | Authelia | | download.pez.sh / .solutions | london-b:9091 (Transmission) | Authelia | +A few `pez.sh` records bypass Caddy entirely: `mail` points at nuremberg-a, `minecraft` and `wow` point at copenhagen-a's public IP (game clients connect directly), and `public` is a CNAME to a Cloudflare R2 public bucket (`public.r2.dev`). + ### Mail DNS -nuremberg-a handles mail for pez.sh. DNS records managed via Cloudflare: +nuremberg-a handles mail for pez.sh. DNS records managed in Hetzner DNS (Terraform): - **MX** record pointing to nuremberg-a - **SPF** record for sender verification @@ -119,7 +121,7 @@ nuremberg-a handles mail for pez.sh. DNS records managed via Cloudflare: ### Caddy TLS -Caddy handles TLS termination for the Cloudflare-to-origin connection. Certificates are obtained and renewed automatically via ACME (Let's Encrypt). No manual cert management, no cron jobs, no renewals to think about. +Caddy terminates TLS for all public traffic. Certificates are obtained and renewed automatically via ACME (Let's Encrypt). No manual cert management, no cron jobs, no renewals to think about. Example Caddyfile block for a protected service: diff --git a/docs/secrets.md b/docs/secrets.md index 0a9177b..14aa0d2 100644 --- a/docs/secrets.md +++ b/docs/secrets.md @@ -16,7 +16,7 @@ Encrypted files use `.enc.` in their extension: services/authelia/config.enc.yml # encrypted YAML services//.enc.env # encrypted env file (convention) terraform/secrets.enc.yaml # encrypted Terraform vars -ansible/group_vars/all/secrets.enc.yml +ansible/group_vars/all/secrets.enc.yaml ``` Plaintext files MUST NOT contain secrets. The `.gitignore` blocks common secret filenames (`secrets.yml`, `vault.yml`, `secret.env`, etc.) as a safety net. @@ -34,9 +34,6 @@ apt install age # SOPS: download from https://github.com/getsops/sops/releases wget https://github.com/getsops/sops/releases/download/v3.9.4/sops_3.9.4_amd64.deb dpkg -i sops_3.9.4_amd64.deb - -# FreeBSD -pkg install age sops ``` ### Generate your age key @@ -52,7 +49,7 @@ SOPS automatically looks for keys in `~/.config/sops/age/keys.txt` (Linux/macOS) ### Add your public key to `.sops.yaml` -Replace the `age1TODO_PEZ_PUBLIC_KEY` placeholder in `.sops.yaml` with your actual public key. Commit the updated `.sops.yaml`. +Add your public key to the `creation_rules` in `.sops.yaml`, re-encrypt the affected files (see "Add a new recipient" below), and commit the updated `.sops.yaml`. ## Day-to-day usage @@ -111,11 +108,9 @@ In the workflow: env: SOPS_AGE_KEY: ${{ secrets.AGE_SECRET_KEY }} run: | - sops -d ansible/group_vars/all/secrets.enc.yml > ansible/group_vars/all/secrets.yml + sops -d ansible/group_vars/all/secrets.enc.yaml > ansible/group_vars/all/secrets.yml ``` -The existing `ANSIBLE_VAULT_PASS` secret can be retired once migration to SOPS is complete. - ## Terraform integration Use the [terraform-provider-sops](https://github.com/carlpett/terraform-provider-sops) to read encrypted values directly: @@ -128,8 +123,8 @@ data "sops_file" "secrets" { } # Use decrypted values -resource "cloudflare_record" "example" { - value = data.sops_file.secrets.data["cloudflare_api_token"] +provider "hcloud" { + token = data.sops_file.secrets.data["hetzner_token"] } ``` @@ -139,9 +134,9 @@ These are the types of secrets expected in this repo: | Category | Example | Location | |----------|---------|----------| -| Ansible vault vars | SSH keys, API tokens, passwords | `ansible/group_vars/*/secrets.enc.yml` | +| Ansible group vars | SSH keys, API tokens, passwords | `ansible/group_vars/all/secrets.enc.yaml` | | Docker env files | DB passwords, app secrets | `services/*/service.enc.env` | -| Terraform vars | Cloudflare API token, Azure creds | `terraform/secrets.enc.yaml` | +| Terraform vars | Hetzner token, Grafana Cloud tokens, B2 keys | `terraform/secrets.enc.yaml` | | Service configs | Authelia JWT secret, LLDAP admin pass | `services/*/config.enc.yml` | ## Security notes diff --git a/terraform/README.md b/terraform/README.md index 327a624..98dd68f 100644 --- a/terraform/README.md +++ b/terraform/README.md @@ -12,11 +12,11 @@ Infrastructure-as-code for cloud and edge services. Uses [OpenTofu](https://open Secrets are stored encrypted in `secrets.enc.yaml` via [SOPS](https://github.com/getsops/sops) and decrypted at plan/apply time into `secrets.yaml`. The Makefile handles decryption automatically. -Required secret keys: `hetzner_token`, `grafana_cloud_access_policy`, `grafana_synthetic_monitoring_access_token`, `grafana_fleet_management_auth`, `grafana_service_account_token`, `pagerduty_token`, `plex_token`, `backblaze_key_id`. +Required secret keys: `hetzner_token`, `grafana_cloud_access_policy`, `grafana_synthetic_monitoring_access_token`, `grafana_fleet_management_auth`, `grafana_service_account_token`, `pagerduty_token`, `plex_token`, `backblaze_keyID`, `backblaze_applicationKey`. ## State -State is stored in a Backblaze B2 bucket (`pez-infra-tfstate`) using an S3-compatible backend. Credentials are read from `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` environment variables. +State is stored in a Backblaze B2 bucket (`pez-infra-tfstate`) using an S3-compatible backend. Credentials are read from `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` environment variables — the Makefile exports them from the `backblaze_keyID` / `backblaze_applicationKey` secrets automatically. ## Usage @@ -33,5 +33,5 @@ make fmt # format all .tf files |----------|--------|---------| | Hetzner Cloud | `hetznercloud/hcloud` | `~> 1.45` | | Grafana | `grafana/grafana` | `~> 4.35` | -| PagerDuty | `pagerduty/pagerduty` | `~> 2.2` | +| PagerDuty | `pagerduty/pagerduty` | `~> 3.32` | | OpenTofu | — | `>= 1.6.0` |