pez-infra/docs/architecture.md
Rasmus Wejlgaard 361133ec7e docs: catch up with the Cloudflare to Hetzner DNS move, fix secrets/terraform drift
The docs still described Cloudflare as DNS + CDN in front of helsinki-a,
but that was dropped in #90 - pez.sh lives on Hetzner DNS via Terraform
now and records point straight at the origin. Updated README,
architecture, networking, getting-started and the nuremberg-a host doc
to match, and noted that pez.solutions still resolves via Cloudflare
outside Terraform.

Also fixed while I was in there:
- terraform/README: PagerDuty provider is ~> 3.32 (table said ~> 2.2),
  and the B2 secret keys are backblaze_keyID/backblaze_applicationKey
- secrets docs: group_vars secrets file is .enc.yaml, dropped the
  FreeBSD install steps, the long-gone .sops.yaml placeholder note and
  the ANSIBLE_VAULT_PASS migration note, swapped the cloudflare_record
  example for hcloud
- getting-started referenced ansible/scripts/sops-setup.sh which
  doesn't exist
- added naveen.pez.sh to the subdomain tables and a note about the
  DNS-only records (mail, minecraft, wow, public)
2026-06-10 19:35:53 +01:00

4.8 KiB

Architecture

Overview

The infrastructure spans three physical locations (London, Copenhagen, Hetzner Cloud) connected by a Tailscale mesh network. All public traffic enters through a single Hetzner Cloud VPS (helsinki-a) running Caddy as a reverse proxy, which forwards requests over Tailscale to backend services running on physical servers in London and Copenhagen.

The setup is entirely self-hosted (with the exception of Hetzner Cloud VPSs, Hetzner DNS, and Grafana Cloud for observability). Most physical servers are old personal computers repurposed into server duty — cheaper than cloud, and I get a rack cabinet that doubles as a bedroom white noise machine.

Network Topology

graph TD
    DNS["<b>DNS</b><br/>Hetzner DNS: *.pez.sh<br/>Cloudflare: *.pez.solutions"]
    DNS -->|HTTPS| HEL

    HEL["<b>helsinki-a</b><br/>Hetzner Cloud VPS<br/><br/>Caddy (reverse proxy)<br/>Authelia (SSO)<br/>LLDAP (Authelia backend)<br/>Bitwarden (Vaultwarden)<br/>Forgejo"]

    HEL --> TS["<b>Tailscale Mesh</b><br/>WireGuard-based VPN"]

    TS --> LB["<b>london-b</b><br/>Storage / Media<br/>*arr stack, Plex, Jellyfin<br/>(Threadripper, 87T ZFS)"]
    TS --> LA["<b>london-a</b><br/>Proxmox VE hypervisor<br/>(Debian 13)"]
    TS --> LC["<b>london-c</b><br/>Raspberry Pi<br/>Octopus Energy exporter"]
    TS --> NA["<b>nuremberg-a</b><br/>Mail<br/>poste.io"]
    TS --> CA["<b>copenhagen-a</b><br/>Gaming<br/>Minecraft / WoW (MaNGOS)"]
    TS --> CC["<b>copenhagen-c</b><br/>Raspberry Pi<br/>cloudflared, idle"]

    TS -.->|Alloy| GC["<b>Grafana Cloud</b><br/>metrics, logs, traces<br/>synthetic checks"]

    style CC stroke-dasharray: 5 5

Traffic Flow

All public-facing services follow the same pattern:

User → DNS (Hetzner DNS) → helsinki-a (Caddy, TLS) → Backend (over Tailscale)
  1. DNS for pez.sh is managed by Hetzner DNS (provisioned via Terraform, terraform/hetzner/dns.tf); pez.solutions still resolves via Cloudflare (dashboard-managed)
  2. Records point directly at helsinki-a's public IP — no CDN or proxying in front
  3. Caddy on helsinki-a terminates TLS (Let's Encrypt) and routes to the correct backend
  4. For protected services, Caddy calls Authelia first (forward_auth)
  5. If authenticated (or no auth required), traffic is proxied over Tailscale to the backend
graph LR
    subgraph "helsinki-a (Caddy)"
        A1["forward_auth → Authelia"]
        A2["(no auth)"]
        A3["forward_auth → Authelia"]
        A4["(local)"]
    end

    R["radarr.pez.sh"] --> A1 --> LB1["london-b:7878"]
    J["jellyfin.pez.sh"] --> A2 --> LB2["london-b:8096"]
    G["git.pez.sh"] --> A3 --> LO3["localhost:3000 (Forgejo)"]
    AU["auth.pez.sh"] --> A4 --> LO["localhost:9091 (Authelia)"]

Auth Architecture

graph TD
    Caddy["<b>Caddy</b><br/>forward_auth"] --> Authelia["<b>Authelia</b><br/>SSO<br/>auth.pez.sh"]
    Authelia --> LLDAP["<b>LLDAP</b><br/>User directory<br/>(Authelia backend only)"]
    Authelia --> MariaDB["<b>MariaDB</b><br/>Authelia session/state"]

Authelia authenticates against LLDAP and uses a MariaDB for session/state. All three run as Docker containers on helsinki-a. LLDAP is not wired into other apps — it's purely Authelia's user backend. Services that sit behind Authelia inherit users from LLDAP via the Caddy forward_auth flow; services with their own auth (Bitwarden, Plex, Jellyfin, Navidrome, Jellyseerr, Forgejo, poste.io) maintain their own user databases.

Observability

Metrics, logs, and traces ship to Grafana Cloud from every host via Grafana Alloy. The Alloy collectors are registered in Grafana Fleet Management (configured in terraform/grafana/). Synthetic uptime checks for the public sites run from Grafana Cloud probes, and PagerDuty handles alert delivery.

History: Monitoring used to run locally on london-a (FreeBSD, with Prometheus + Grafana). london-a has since been wiped and reinstalled as Proxmox VE; the local stack was retired in favour of Grafana Cloud. See monitoring.md for the current setup.

Design Principles

  • Self-hosted first. Cloud VPSs only where it makes sense (public gateway, mail with clean IP reputation). Everything else runs on physical hardware I own.
  • Tailscale as the backbone. No ports exposed on residential IPs. All inter-server communication goes over the mesh.
  • Ansible for everything. If a server dies, reinstall the OS, install Tailscale, run make deploy. Roughly 30 minutes to full recovery.
  • Terraform for cloud + DNS. Hetzner servers, DNS records, Grafana Cloud configuration, and PagerDuty are all in code. No clicking around in dashboards.
  • Cattle, not pets (as much as possible). The servers are technically pets — old hardware in specific locations — but the configs are cattle. Everything is reproducible from this repo.