The docs still described Cloudflare as DNS + CDN in front of helsinki-a, but that was dropped in #90 - pez.sh lives on Hetzner DNS via Terraform now and records point straight at the origin. Updated README, architecture, networking, getting-started and the nuremberg-a host doc to match, and noted that pez.solutions still resolves via Cloudflare outside Terraform. Also fixed while I was in there: - terraform/README: PagerDuty provider is ~> 3.32 (table said ~> 2.2), and the B2 secret keys are backblaze_keyID/backblaze_applicationKey - secrets docs: group_vars secrets file is .enc.yaml, dropped the FreeBSD install steps, the long-gone .sops.yaml placeholder note and the ANSIBLE_VAULT_PASS migration note, swapped the cloudflare_record example for hcloud - getting-started referenced ansible/scripts/sops-setup.sh which doesn't exist - added naveen.pez.sh to the subdomain tables and a note about the DNS-only records (mail, minecraft, wow, public)
4.8 KiB
Architecture
Overview
The infrastructure spans three physical locations (London, Copenhagen, Hetzner Cloud) connected by a Tailscale mesh network. All public traffic enters through a single Hetzner Cloud VPS (helsinki-a) running Caddy as a reverse proxy, which forwards requests over Tailscale to backend services running on physical servers in London and Copenhagen.
The setup is entirely self-hosted (with the exception of Hetzner Cloud VPSs, Hetzner DNS, and Grafana Cloud for observability). Most physical servers are old personal computers repurposed into server duty — cheaper than cloud, and I get a rack cabinet that doubles as a bedroom white noise machine.
Network Topology
graph TD
DNS["<b>DNS</b><br/>Hetzner DNS: *.pez.sh<br/>Cloudflare: *.pez.solutions"]
DNS -->|HTTPS| HEL
HEL["<b>helsinki-a</b><br/>Hetzner Cloud VPS<br/><br/>Caddy (reverse proxy)<br/>Authelia (SSO)<br/>LLDAP (Authelia backend)<br/>Bitwarden (Vaultwarden)<br/>Forgejo"]
HEL --> TS["<b>Tailscale Mesh</b><br/>WireGuard-based VPN"]
TS --> LB["<b>london-b</b><br/>Storage / Media<br/>*arr stack, Plex, Jellyfin<br/>(Threadripper, 87T ZFS)"]
TS --> LA["<b>london-a</b><br/>Proxmox VE hypervisor<br/>(Debian 13)"]
TS --> LC["<b>london-c</b><br/>Raspberry Pi<br/>Octopus Energy exporter"]
TS --> NA["<b>nuremberg-a</b><br/>Mail<br/>poste.io"]
TS --> CA["<b>copenhagen-a</b><br/>Gaming<br/>Minecraft / WoW (MaNGOS)"]
TS --> CC["<b>copenhagen-c</b><br/>Raspberry Pi<br/>cloudflared, idle"]
TS -.->|Alloy| GC["<b>Grafana Cloud</b><br/>metrics, logs, traces<br/>synthetic checks"]
style CC stroke-dasharray: 5 5
Traffic Flow
All public-facing services follow the same pattern:
User → DNS (Hetzner DNS) → helsinki-a (Caddy, TLS) → Backend (over Tailscale)
- DNS for
pez.shis managed by Hetzner DNS (provisioned via Terraform,terraform/hetzner/dns.tf);pez.solutionsstill resolves via Cloudflare (dashboard-managed) - Records point directly at helsinki-a's public IP — no CDN or proxying in front
- Caddy on helsinki-a terminates TLS (Let's Encrypt) and routes to the correct backend
- For protected services, Caddy calls Authelia first (
forward_auth) - If authenticated (or no auth required), traffic is proxied over Tailscale to the backend
graph LR
subgraph "helsinki-a (Caddy)"
A1["forward_auth → Authelia"]
A2["(no auth)"]
A3["forward_auth → Authelia"]
A4["(local)"]
end
R["radarr.pez.sh"] --> A1 --> LB1["london-b:7878"]
J["jellyfin.pez.sh"] --> A2 --> LB2["london-b:8096"]
G["git.pez.sh"] --> A3 --> LO3["localhost:3000 (Forgejo)"]
AU["auth.pez.sh"] --> A4 --> LO["localhost:9091 (Authelia)"]
Auth Architecture
graph TD
Caddy["<b>Caddy</b><br/>forward_auth"] --> Authelia["<b>Authelia</b><br/>SSO<br/>auth.pez.sh"]
Authelia --> LLDAP["<b>LLDAP</b><br/>User directory<br/>(Authelia backend only)"]
Authelia --> MariaDB["<b>MariaDB</b><br/>Authelia session/state"]
Authelia authenticates against LLDAP and uses a MariaDB for session/state. All three run as Docker containers on helsinki-a. LLDAP is not wired into other apps — it's purely Authelia's user backend. Services that sit behind Authelia inherit users from LLDAP via the Caddy forward_auth flow; services with their own auth (Bitwarden, Plex, Jellyfin, Navidrome, Jellyseerr, Forgejo, poste.io) maintain their own user databases.
Observability
Metrics, logs, and traces ship to Grafana Cloud from every host via Grafana Alloy. The Alloy collectors are registered in Grafana Fleet Management (configured in terraform/grafana/). Synthetic uptime checks for the public sites run from Grafana Cloud probes, and PagerDuty handles alert delivery.
History: Monitoring used to run locally on london-a (FreeBSD, with Prometheus + Grafana). london-a has since been wiped and reinstalled as Proxmox VE; the local stack was retired in favour of Grafana Cloud. See monitoring.md for the current setup.
Design Principles
- Self-hosted first. Cloud VPSs only where it makes sense (public gateway, mail with clean IP reputation). Everything else runs on physical hardware I own.
- Tailscale as the backbone. No ports exposed on residential IPs. All inter-server communication goes over the mesh.
- Ansible for everything. If a server dies, reinstall the OS, install Tailscale, run
make deploy. Roughly 30 minutes to full recovery. - Terraform for cloud + DNS. Hetzner servers, DNS records, Grafana Cloud configuration, and PagerDuty are all in code. No clicking around in dashboards.
- Cattle, not pets (as much as possible). The servers are technically pets — old hardware in specific locations — but the configs are cattle. Everything is reproducible from this repo.