docs: catch up with the Cloudflare to Hetzner DNS move, fix secrets/terraform drift

The docs still described Cloudflare as DNS + CDN in front of helsinki-a,
but that was dropped in #90 - pez.sh lives on Hetzner DNS via Terraform
now and records point straight at the origin. Updated README,
architecture, networking, getting-started and the nuremberg-a host doc
to match, and noted that pez.solutions still resolves via Cloudflare
outside Terraform.

Also fixed while I was in there:
- terraform/README: PagerDuty provider is ~> 3.32 (table said ~> 2.2),
  and the B2 secret keys are backblaze_keyID/backblaze_applicationKey
- secrets docs: group_vars secrets file is .enc.yaml, dropped the
  FreeBSD install steps, the long-gone .sops.yaml placeholder note and
  the ANSIBLE_VAULT_PASS migration note, swapped the cloudflare_record
  example for hcloud
- getting-started referenced ansible/scripts/sops-setup.sh which
  doesn't exist
- added naveen.pez.sh to the subdomain tables and a note about the
  DNS-only records (mail, minecraft, wow, public)
This commit is contained in:
Rasmus Wejlgaard 2026-06-10 19:35:53 +01:00
parent 0c00a3cb4d
commit 361133ec7e
8 changed files with 47 additions and 52 deletions

View file

@ -5,7 +5,7 @@ Infrastructure-as-code monorepo for managing my homelab and cloud server fleet.
## What's in this repo
- **Ansible** — Playbooks, roles, and inventory for configuring servers, deploying Docker-based services, and managing dotfiles
- **Terraform** — OpenTofu/Terraform configs for cloud resources (Hetzner Cloud, Cloudflare DNS, Grafana Cloud, PagerDuty)
- **Terraform** — OpenTofu/Terraform configs for cloud resources (Hetzner Cloud + DNS, Grafana Cloud, PagerDuty)
- **Services** — Docker Compose definitions and config files for each self-hosted service
- **Documentation** — Architecture decisions, networking topology, and operational guides
@ -13,7 +13,7 @@ Infrastructure-as-code monorepo for managing my homelab and cloud server fleet.
```mermaid
graph TD
CF[Cloudflare<br/>DNS + CDN] --> HEL[helsinki-a<br/>Caddy proxy + SSO<br/><i>Hetzner Cloud</i>]
DNS[Hetzner DNS<br/>pez.sh] --> HEL[helsinki-a<br/>Caddy proxy + SSO<br/><i>Hetzner Cloud</i>]
HEL --> TS{Tailscale mesh}
TS --> LB[london-b<br/>Storage, media<br/>Docker + systemd]
TS --> LA[london-a<br/>Proxmox VE hypervisor]
@ -24,7 +24,7 @@ graph TD
TS -.-> GC[Grafana Cloud<br/>metrics, logs, traces]
```
Traffic enters via Cloudflare DNS, hits a Caddy reverse proxy on a Hetzner cloud instance, and is forwarded to backend services running on various hosts connected over a Tailscale mesh network. Authentication for protected services is handled by Authelia with an LLDAP backend. Observability is shipped from every host to Grafana Cloud via Grafana Alloy.
DNS (Hetzner DNS for `pez.sh`, managed via Terraform) points directly at a Caddy reverse proxy on a Hetzner cloud instance, which terminates TLS and forwards to backend services running on various hosts connected over a Tailscale mesh network. Authentication for protected services is handled by Authelia with an LLDAP backend. Observability is shipped from every host to Grafana Cloud via Grafana Alloy.
### Hosts
@ -47,7 +47,7 @@ Traffic enters via Cloudflare DNS, hits a Caddy reverse proxy on a Hetzner cloud
│ ├── dotfiles/ # Shell config (fish, nvim, tmux, git, etc.)
│ ├── playbooks/ # One-off playbooks (updates, reboots, status)
│ └── scripts/ # Utility and maintenance scripts
├── terraform/ # Terraform/OpenTofu for Hetzner, Cloudflare, Grafana Cloud, PagerDuty
├── terraform/ # Terraform/OpenTofu for Hetzner (servers + DNS), Grafana Cloud, PagerDuty
└── docs/ # Architecture, networking, services, monitoring, and per-host docs
```
@ -65,18 +65,18 @@ Traffic enters via Cloudflare DNS, hits a Caddy reverse proxy on a Hetzner cloud
1. **Clone:** `git clone git@github.com:RWejlgaard/pez-infra.git`
2. **Services:** Each service has its own directory under `ansible/services/` with a `docker-compose.yml` and config files
3. **Deploy:** `cd ansible && make deploy` runs the unified `deploy.yml` against the whole fleet (or `make deploy-host HOST=<name>`)
4. **Infrastructure:** Terraform configs in `terraform/` manage Hetzner servers, Cloudflare DNS, Grafana Cloud, and PagerDuty
4. **Infrastructure:** Terraform configs in `terraform/` manage Hetzner servers + DNS, Grafana Cloud, and PagerDuty
### Secrets
Secrets are encrypted in-repo using [SOPS](https://github.com/getsops/sops) + [age](https://github.com/FiloSottile/age). Encrypted files use `.enc.` in their extension (e.g. `secrets.enc.yml`). See **[Secrets Management](docs/secrets.md)** for full setup and usage instructions.
Secrets are encrypted in-repo using [SOPS](https://github.com/getsops/sops) + [age](https://github.com/FiloSottile/age). Encrypted files use `.enc.` in their extension (e.g. `secrets.enc.yaml`). See **[Secrets Management](docs/secrets.md)** for full setup and usage instructions.
## Documentation
Detailed documentation lives in [`docs/`](docs/):
- **[Architecture](docs/architecture.md)** — Network topology, traffic flow, design principles
- **[Networking](docs/networking.md)** — Tailscale mesh, DNS flow, physical networking
- **[Networking](docs/networking.md)** — Tailscale mesh, DNS flow (Hetzner DNS), physical networking
- **[Services](docs/services.md)** — Complete service map with ports, auth, and deployment info
- **[Monitoring](docs/monitoring.md)** — Grafana Cloud, Alloy, synthetic checks, PagerDuty
- **[Hosts](docs/hosts/)** — Per-host detail (hardware, services, quirks)

View file

@ -4,14 +4,14 @@
The infrastructure spans three physical locations (London, Copenhagen, Hetzner Cloud) connected by a Tailscale mesh network. All public traffic enters through a single Hetzner Cloud VPS (helsinki-a) running Caddy as a reverse proxy, which forwards requests over Tailscale to backend services running on physical servers in London and Copenhagen.
The setup is entirely self-hosted (with the exception of Hetzner Cloud VPSs, Cloudflare for DNS/CDN, and Grafana Cloud for observability). Most physical servers are old personal computers repurposed into server duty — cheaper than cloud, and I get a rack cabinet that doubles as a bedroom white noise machine.
The setup is entirely self-hosted (with the exception of Hetzner Cloud VPSs, Hetzner DNS, and Grafana Cloud for observability). Most physical servers are old personal computers repurposed into server duty — cheaper than cloud, and I get a rack cabinet that doubles as a bedroom white noise machine.
## Network Topology
```mermaid
graph TD
CF["<b>Cloudflare</b><br/>DNS + CDN<br/>*.pez.sh, *.pez.solutions"]
CF -->|HTTPS| HEL
DNS["<b>DNS</b><br/>Hetzner DNS: *.pez.sh<br/>Cloudflare: *.pez.solutions"]
DNS -->|HTTPS| HEL
HEL["<b>helsinki-a</b><br/>Hetzner Cloud VPS<br/><br/>Caddy (reverse proxy)<br/>Authelia (SSO)<br/>LLDAP (Authelia backend)<br/>Bitwarden (Vaultwarden)<br/>Forgejo"]
@ -34,12 +34,12 @@ graph TD
All public-facing services follow the same pattern:
```
User → Cloudflare (DNS + TLS) → helsinki-a (Caddy) → Backend (over Tailscale)
User → DNS (Hetzner DNS) → helsinki-a (Caddy, TLS) → Backend (over Tailscale)
```
1. DNS for `pez.sh` and `pez.solutions` is managed by Cloudflare (provisioned via Terraform)
2. Cloudflare proxies traffic to helsinki-a
3. Caddy on helsinki-a terminates TLS and routes to the correct backend
1. DNS for `pez.sh` is managed by Hetzner DNS (provisioned via Terraform, `terraform/hetzner/dns.tf`); `pez.solutions` still resolves via Cloudflare (dashboard-managed)
2. Records point directly at helsinki-a's public IP — no CDN or proxying in front
3. Caddy on helsinki-a terminates TLS (Let's Encrypt) and routes to the correct backend
4. For protected services, Caddy calls Authelia first (`forward_auth`)
5. If authenticated (or no auth required), traffic is proxied over Tailscale to the backend
@ -80,5 +80,5 @@ Metrics, logs, and traces ship to **Grafana Cloud** from every host via **Grafan
- **Self-hosted first.** Cloud VPSs only where it makes sense (public gateway, mail with clean IP reputation). Everything else runs on physical hardware I own.
- **Tailscale as the backbone.** No ports exposed on residential IPs. All inter-server communication goes over the mesh.
- **Ansible for everything.** If a server dies, reinstall the OS, install Tailscale, run `make deploy`. Roughly 30 minutes to full recovery.
- **Terraform for cloud + DNS.** Hetzner servers, Cloudflare records, Grafana Cloud configuration, and PagerDuty are all in code. No clicking around in dashboards.
- **Terraform for cloud + DNS.** Hetzner servers, DNS records, Grafana Cloud configuration, and PagerDuty are all in code. No clicking around in dashboards.
- **Cattle, not pets (as much as possible).** The servers are technically pets — old hardware in specific locations — but the configs are cattle. Everything is reproducible from this repo.

View file

@ -9,9 +9,9 @@ You'll need:
- **Tailscale** — installed and connected to the tailnet. All SSH access goes through Tailscale. No servers have SSH exposed on the public internet.
- **SSH keys** — set up for each host you need to access
- **Ansible** — for configuration management and deployments (`make deps` from `ansible/` installs collections)
- **OpenTofu** (or Terraform) — for Hetzner, Cloudflare, Grafana Cloud, and PagerDuty
- **OpenTofu** (or Terraform) — for Hetzner (servers + DNS), Grafana Cloud, and PagerDuty
- **Docker** — helpful to understand, since most services are containerised
- **SOPS + age** — for secrets encryption/decryption (run `./ansible/scripts/sops-setup.sh`)
- **SOPS + age** — for secrets encryption/decryption (see [Secrets](secrets.md) for setup)
- **Git** — obviously
- **gh CLI** — for GitHub operations (PRs, issues, etc.)
@ -33,7 +33,7 @@ pez-infra/
│ ├── dotfiles/ # Shell config (fish, nvim, tmux, git, etc.)
│ ├── playbooks/ # One-off playbooks (updates, reboots, status)
│ └── scripts/ # Utility and maintenance scripts
└── terraform/ # Terraform/OpenTofu for Hetzner, Cloudflare, Grafana Cloud, PagerDuty
└── terraform/ # Terraform/OpenTofu for Hetzner (servers + DNS), Grafana Cloud, PagerDuty
```
## Connecting to hosts
@ -89,7 +89,7 @@ Other playbooks live under `ansible/playbooks/`:
### Managing cloud + DNS + observability
Terraform manages Hetzner servers, Cloudflare DNS, Grafana Cloud (stack, fleet, dashboards, synthetic checks), and PagerDuty:
Terraform manages Hetzner servers + DNS, Grafana Cloud (stack, fleet, dashboards, synthetic checks), and PagerDuty:
```bash
cd terraform
@ -98,7 +98,7 @@ make plan # preview changes
make apply # apply the changes
```
State lives in a Backblaze B2 bucket (`pez-infra-tfstate`) via the S3-compatible backend. Don't click around in the Cloudflare or Grafana Cloud dashboards — if it's not in Terraform, it doesn't exist.
State lives in a Backblaze B2 bucket (`pez-infra-tfstate`) via the S3-compatible backend. Don't click around in the Hetzner or Grafana Cloud dashboards — if it's not in Terraform, it doesn't exist.
### Adding a new service
@ -147,12 +147,9 @@ Alpine has been tried and rejected — the missing GNU binaries / systemd caused
## Secrets
Secrets are encrypted in-repo using **SOPS + age**. Encrypted files have `.enc.` in their extension (e.g. `secrets.enc.yml`).
Secrets are encrypted in-repo using **SOPS + age**. Encrypted files have `.enc.` in their extension (e.g. `secrets.enc.yaml`).
```bash
# First-time setup
./ansible/scripts/sops-setup.sh
# Edit an encrypted file
sops ansible/services/authelia/config.enc.yml

View file

@ -45,6 +45,7 @@ Caddy also serves static content from `/srv/`:
| `/srv/pez.sh` | pez.sh | — |
| `/srv/pez.solutions` | pez.solutions | — |
| `/srv/pez-signup` | signup.pez.solutions | — |
| `/srv/naveen` | naveen.pez.sh | — |
## Why Hetzner Cloud

View file

@ -28,7 +28,7 @@ Mail lives on its own VPS to isolate its IP reputation. If the IP gets flagged f
## DNS
Mail-related DNS records are managed via Cloudflare (Terraform):
Mail-related DNS records are managed in Hetzner DNS (Terraform, `terraform/hetzner/dns.tf`):
- **MX** record for inbound mail routing
- **SPF** for sender verification

View file

@ -54,34 +54,33 @@ A stack of servers at my dad's place — acts as an off-site location.
## DNS Flow
All DNS is managed by Cloudflare, provisioned via Terraform.
DNS for `pez.sh` is managed by **Hetzner DNS**, provisioned via Terraform (`terraform/hetzner/dns.tf`). Cloudflare was dropped as DNS provider / CDN in April 2026 (PR #90) — records now point directly at the origin, with no proxying in front.
### Domains
- **pez.sh** — primary domain. Registered on Hover.com with nameservers pointed to Cloudflare.
- **pez.solutions** — alternate domain. Most services that have a `*.pez.sh` host also accept the matching `*.pez.solutions` host, so apps remain reachable if one TLD has trouble.
- **pez.sh** — primary domain. Registered on Hover.com with nameservers pointed to Hetzner DNS. All records in Terraform.
- **pez.solutions** — alternate domain. Still resolves via Cloudflare nameservers (dashboard-managed, not in Terraform). Most services that have a `*.pez.sh` host also accept the matching `*.pez.solutions` host, so apps remain reachable if one TLD has trouble.
### How a request reaches a service
```mermaid
graph TD
Browser["1. Browser requests radarr.pez.sh"] --> CF
CF["2. Cloudflare resolves DNS<br/>(proxied record)"] --> TLS
TLS["3. Cloudflare terminates TLS,<br/>forwards to helsinki-a"] --> Caddy
Caddy["4. Caddy receives request"] --> AuthCheck{"5. Requires auth?"}
Browser["1. Browser requests radarr.pez.sh"] --> DNS
DNS["2. Hetzner DNS resolves<br/>to helsinki-a's public IP"] --> Caddy
Caddy["3. Caddy terminates TLS,<br/>receives request"] --> AuthCheck{"4. Requires auth?"}
AuthCheck -->|YES| Authelia["forward_auth → Authelia<br/>(localhost:9091)"]
AuthCheck -->|NO| Proxy
Authelia -->|Authenticated| Proxy["6. Reverse-proxy to backend<br/>over Tailscale<br/>(e.g. london-b:7878)"]
Authelia -->|Authenticated| Proxy["5. Reverse-proxy to backend<br/>over Tailscale<br/>(e.g. london-b:7878)"]
Authelia -->|Not authenticated| Redirect["Redirect to auth.pez.sh"]
Proxy --> Response["7. Response flows back:<br/>backend → Caddy → Cloudflare → browser"]
Proxy --> Response["6. Response flows back:<br/>backend → Caddy → browser"]
```
### Public Subdomains
All subdomains are Cloudflare-proxied and terminate at helsinki-a. Hosts marked with both `pez.sh` and `pez.solutions` are reachable on either TLD.
All subdomains resolve directly to helsinki-a, where Caddy terminates TLS. Hosts marked with both `pez.sh` and `pez.solutions` are reachable on either TLD.
| Subdomain | Backend | Auth |
|---|---|---|
@ -94,6 +93,7 @@ All subdomains are Cloudflare-proxied and terminate at helsinki-a. Hosts marked
| pez.sh | helsinki-a:/srv/pez.sh (static) | — |
| pez.solutions | helsinki-a:/srv/pez.solutions (static) | — |
| signup.pez.solutions | helsinki-a:/srv/pez-signup (static) | — |
| naveen.pez.sh | helsinki-a:/srv/naveen (static) | — |
| london-a.pez.sh | london-a:8006 (Proxmox UI) | Proxmox login |
| jellyfin.pez.sh / .solutions | london-b:8096 | Own auth |
| plex.pez.sh / .solutions | london-b:32400 | Own auth |
@ -108,9 +108,11 @@ All subdomains are Cloudflare-proxied and terminate at helsinki-a. Hosts marked
| soulseek.pez.sh / .solutions | london-b:5030 (slskd) | Authelia |
| download.pez.sh / .solutions | london-b:9091 (Transmission) | Authelia |
A few `pez.sh` records bypass Caddy entirely: `mail` points at nuremberg-a, `minecraft` and `wow` point at copenhagen-a's public IP (game clients connect directly), and `public` is a CNAME to a Cloudflare R2 public bucket (`public.r2.dev`).
### Mail DNS
nuremberg-a handles mail for pez.sh. DNS records managed via Cloudflare:
nuremberg-a handles mail for pez.sh. DNS records managed in Hetzner DNS (Terraform):
- **MX** record pointing to nuremberg-a
- **SPF** record for sender verification
@ -119,7 +121,7 @@ nuremberg-a handles mail for pez.sh. DNS records managed via Cloudflare:
### Caddy TLS
Caddy handles TLS termination for the Cloudflare-to-origin connection. Certificates are obtained and renewed automatically via ACME (Let's Encrypt). No manual cert management, no cron jobs, no renewals to think about.
Caddy terminates TLS for all public traffic. Certificates are obtained and renewed automatically via ACME (Let's Encrypt). No manual cert management, no cron jobs, no renewals to think about.
Example Caddyfile block for a protected service:

View file

@ -16,7 +16,7 @@ Encrypted files use `.enc.` in their extension:
services/authelia/config.enc.yml # encrypted YAML
services/<service>/<file>.enc.env # encrypted env file (convention)
terraform/secrets.enc.yaml # encrypted Terraform vars
ansible/group_vars/all/secrets.enc.yml
ansible/group_vars/all/secrets.enc.yaml
```
Plaintext files MUST NOT contain secrets. The `.gitignore` blocks common secret filenames (`secrets.yml`, `vault.yml`, `secret.env`, etc.) as a safety net.
@ -34,9 +34,6 @@ apt install age
# SOPS: download from https://github.com/getsops/sops/releases
wget https://github.com/getsops/sops/releases/download/v3.9.4/sops_3.9.4_amd64.deb
dpkg -i sops_3.9.4_amd64.deb
# FreeBSD
pkg install age sops
```
### Generate your age key
@ -52,7 +49,7 @@ SOPS automatically looks for keys in `~/.config/sops/age/keys.txt` (Linux/macOS)
### Add your public key to `.sops.yaml`
Replace the `age1TODO_PEZ_PUBLIC_KEY` placeholder in `.sops.yaml` with your actual public key. Commit the updated `.sops.yaml`.
Add your public key to the `creation_rules` in `.sops.yaml`, re-encrypt the affected files (see "Add a new recipient" below), and commit the updated `.sops.yaml`.
## Day-to-day usage
@ -111,11 +108,9 @@ In the workflow:
env:
SOPS_AGE_KEY: ${{ secrets.AGE_SECRET_KEY }}
run: |
sops -d ansible/group_vars/all/secrets.enc.yml > ansible/group_vars/all/secrets.yml
sops -d ansible/group_vars/all/secrets.enc.yaml > ansible/group_vars/all/secrets.yml
```
The existing `ANSIBLE_VAULT_PASS` secret can be retired once migration to SOPS is complete.
## Terraform integration
Use the [terraform-provider-sops](https://github.com/carlpett/terraform-provider-sops) to read encrypted values directly:
@ -128,8 +123,8 @@ data "sops_file" "secrets" {
}
# Use decrypted values
resource "cloudflare_record" "example" {
value = data.sops_file.secrets.data["cloudflare_api_token"]
provider "hcloud" {
token = data.sops_file.secrets.data["hetzner_token"]
}
```
@ -139,9 +134,9 @@ These are the types of secrets expected in this repo:
| Category | Example | Location |
|----------|---------|----------|
| Ansible vault vars | SSH keys, API tokens, passwords | `ansible/group_vars/*/secrets.enc.yml` |
| Ansible group vars | SSH keys, API tokens, passwords | `ansible/group_vars/all/secrets.enc.yaml` |
| Docker env files | DB passwords, app secrets | `services/*/service.enc.env` |
| Terraform vars | Cloudflare API token, Azure creds | `terraform/secrets.enc.yaml` |
| Terraform vars | Hetzner token, Grafana Cloud tokens, B2 keys | `terraform/secrets.enc.yaml` |
| Service configs | Authelia JWT secret, LLDAP admin pass | `services/*/config.enc.yml` |
## Security notes

View file

@ -12,11 +12,11 @@ Infrastructure-as-code for cloud and edge services. Uses [OpenTofu](https://open
Secrets are stored encrypted in `secrets.enc.yaml` via [SOPS](https://github.com/getsops/sops) and decrypted at plan/apply time into `secrets.yaml`. The Makefile handles decryption automatically.
Required secret keys: `hetzner_token`, `grafana_cloud_access_policy`, `grafana_synthetic_monitoring_access_token`, `grafana_fleet_management_auth`, `grafana_service_account_token`, `pagerduty_token`, `plex_token`, `backblaze_key_id`.
Required secret keys: `hetzner_token`, `grafana_cloud_access_policy`, `grafana_synthetic_monitoring_access_token`, `grafana_fleet_management_auth`, `grafana_service_account_token`, `pagerduty_token`, `plex_token`, `backblaze_keyID`, `backblaze_applicationKey`.
## State
State is stored in a Backblaze B2 bucket (`pez-infra-tfstate`) using an S3-compatible backend. Credentials are read from `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` environment variables.
State is stored in a Backblaze B2 bucket (`pez-infra-tfstate`) using an S3-compatible backend. Credentials are read from `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` environment variables — the Makefile exports them from the `backblaze_keyID` / `backblaze_applicationKey` secrets automatically.
## Usage
@ -33,5 +33,5 @@ make fmt # format all .tf files
|----------|--------|---------|
| Hetzner Cloud | `hetznercloud/hcloud` | `~> 1.45` |
| Grafana | `grafana/grafana` | `~> 4.35` |
| PagerDuty | `pagerduty/pagerduty` | `~> 2.2` |
| PagerDuty | `pagerduty/pagerduty` | `~> 3.32` |
| OpenTofu | — | `>= 1.6.0` |