fix: Documentation overhaul

2026-07-04 15:46:16 +00:00 · 2026-05-19 18:38:36 +01:00 · 2026-05-19 18:38:36 +01:00 · 9ee9802441
commit 9ee9802441
parent 1ec4e10eb1
16 changed files with 512 additions and 361 deletions
--- a/README.md
+++ b/README.md
@ -5,7 +5,7 @@ Infrastructure-as-code monorepo for managing my homelab and cloud server fleet.
 ## What's in this repo
 - **Ansible** — Playbooks, roles, and inventory for configuring servers, deploying Docker-based services, and managing dotfiles
- **Terraform** — OpenTofu/Terraform configs for cloud resources (Cloudflare DNS, Hetzner servers)
+- **Terraform** — OpenTofu/Terraform configs for cloud resources (Hetzner Cloud, Cloudflare DNS, Grafana Cloud, PagerDuty)
 - **Services** — Docker Compose definitions and config files for each self-hosted service
 - **Documentation** — Architecture decisions, networking topology, and operational guides
@ -13,54 +13,59 @@ Infrastructure-as-code monorepo for managing my homelab and cloud server fleet.
 ```mermaid
 graph TD
-    CF[Cloudflare<br/>DNS + CDN] --> HEL[helsinki-a<br/>Caddy proxy<br/><i>Hetzner Cloud</i>]
+    CF[Cloudflare<br/>DNS + CDN] --> HEL[helsinki-a<br/>Caddy proxy + SSO<br/><i>Hetzner Cloud</i>]
    HEL --> TS{Tailscale mesh}
-    TS --> LB[london-b<br/>Storage, Docker services]
+    TS --> LB[london-b<br/>Storage, media<br/>Docker + systemd]
-    TS --> LA[london-a<br/>Monitoring<br/>Prometheus, Grafana]
+    TS --> LA[london-a<br/>Proxmox VE hypervisor]
    TS --> LC[london-c<br/>Raspberry Pi<br/>Octopus Energy exporter]
    TS --> CA[copenhagen-a<br/>Gaming<br/>Minecraft, WoW MaNGOS]
    TS --> NUR[nuremberg-a<br/>Mail, poste.io]
-    TS --> CC[copenhagen-c<br/>idle]
+    TS --> CC[copenhagen-c<br/>Raspberry Pi<br/>cloudflared, idle]
    TS -.-> GC[Grafana Cloud<br/>metrics, logs, traces]
 ```
-Traffic enters via Cloudflare DNS, hits a Caddy reverse proxy on a Hetzner cloud instance, and is forwarded to backend services running on various hosts connected over a Tailscale mesh network. Authentication is handled by Authelia with an LLDAP backend.
+Traffic enters via Cloudflare DNS, hits a Caddy reverse proxy on a Hetzner cloud instance, and is forwarded to backend services running on various hosts connected over a Tailscale mesh network. Authentication for protected services is handled by Authelia with an LLDAP backend. Observability is shipped from every host to Grafana Cloud via Grafana Alloy.
 ### Hosts
 | Host | Location | OS | Role |
 |------|----------|-----|------|
-| helsinki-a | Hetzner Cloud | Linux | Reverse proxy (Caddy), main traffic gateway |
+| helsinki-a | Hetzner Cloud (Helsinki) | Debian 13 | Reverse proxy (Caddy), SSO (Authelia + LLDAP), Bitwarden, Forgejo |
-| london-b | London | Linux | Primary storage (ZFS), Docker services |
+| london-b | London | Ubuntu 24.04 | Primary storage (ZFS), media servers, *arr stack |
-| london-a | London | FreeBSD | Monitoring (Prometheus, Grafana) |
+| london-a | London | Debian 13 / Proxmox VE | Hypervisor (currently runs a Mac VM; platform for future VMs) |
-| nuremberg-a | Hetzner Cloud | Alpine Linux | Mail server (poste.io) |
+| london-c | London | Debian 13 (Raspberry Pi) | Octopus Energy exporter, edge utility box |
-| copenhagen-a | Copenhagen | Linux | Gaming servers (Minecraft, WoW/MaNGOS) |
+| nuremberg-a | Hetzner Cloud (Nuremberg) | Debian 13 | Mail server (poste.io) |
-| copenhagen-c | Copenhagen | Linux | Idle/available |
+| copenhagen-a | Copenhagen | Ubuntu 22.04 | Gaming servers (Minecraft, WoW/MaNGOS) |
 | copenhagen-c | Copenhagen | Debian 12 (Raspberry Pi) | cloudflared tunnel, idle/available |
 ## Directory Structure
 ```
 ├── ansible/        # Ansible playbooks, roles, inventory, and all managed files
-│   ├── roles/      # Ansible roles (caddy, docker, dotfiles, etc.)
+│   ├── roles/      # Ansible roles (caddy, docker, media_stack, proxmox_ve, etc.)
 │   ├── services/   # Docker Compose definitions and service configs
 │   ├── dotfiles/   # Shell config (fish, nvim, tmux, git, etc.)
 │   ├── playbooks/  # One-off playbooks (updates, reboots, status)
 │   └── scripts/    # Utility and maintenance scripts
-├── terraform/      # Terraform/OpenTofu for Cloudflare DNS, Hetzner servers
+├── terraform/      # Terraform/OpenTofu for Hetzner, Cloudflare, Grafana Cloud, PagerDuty
-└── docs/           # Architecture, networking, services, and monitoring docs
+└── docs/           # Architecture, networking, services, monitoring, and per-host docs
 ```
 ## Getting Started
 ### Prerequisites
- SSH access to hosts via Tailscale
+- SSH access to hosts via Tailscale (all hosts SSH as `root`)
 - `ansible` for configuration management
 - `tofu` (OpenTofu) or `terraform` for infrastructure provisioning
 - `sops` + `age` for editing encrypted secrets
 ### Usage
 1. **Clone:** `git clone git@github.com:RWejlgaard/pez-infra.git`
 2. **Services:** Each service has its own directory under `ansible/services/` with a `docker-compose.yml` and config files
-3. **Deploy:** Ansible playbooks in `ansible/` handle deployment (see individual playbook docs)
+3. **Deploy:** `cd ansible && make deploy` runs the unified `deploy.yml` against the whole fleet (or `make deploy-host HOST=<name>`)
-4. **Infrastructure:** Terraform configs in `terraform/` manage DNS and cloud resources
+4. **Infrastructure:** Terraform configs in `terraform/` manage Hetzner servers, Cloudflare DNS, Grafana Cloud, and PagerDuty
 ### Secrets
@ -73,5 +78,6 @@ Detailed documentation lives in [`docs/`](docs/):
 - **[Architecture](docs/architecture.md)** — Network topology, traffic flow, design principles
 - **[Networking](docs/networking.md)** — Tailscale mesh, DNS flow, physical networking
 - **[Services](docs/services.md)** — Complete service map with ports, auth, and deployment info
- **[Monitoring](docs/monitoring.md)** — Prometheus, Grafana, exporters, status page
+- **[Monitoring](docs/monitoring.md)** — Grafana Cloud, Alloy, synthetic checks, PagerDuty
 - **[Hosts](docs/hosts/)** — Per-host detail (hardware, services, quirks)
 - **[Getting Started](docs/getting-started.md)** — How to work with this repo
--- a/ansible/README.md
+++ b/ansible/README.md
@ -25,26 +25,28 @@ make deploy-host HOST=helsinki-a
 | Playbook | Purpose | Usage |
 |----------|---------|-------|
 | `deploy.yml` | Full host rebuild from repo | `make deploy` or `--limit <host>` |
-| `playbooks/update-all.yml` | OS package updates (all hosts) | `make update-all` |
+| `playbooks/update-all.yml` | OS package updates (all hosts, apt) | `make update-all` |
-| `playbooks/update-linux.yml` | Linux-only updates (apt + apk) | `make update-linux` |
+| `playbooks/update-linux.yml` | Alias for update-all (apt) | `make update-linux` |
 | `playbooks/update-freebsd.yml` | FreeBSD-only updates (pkg) | `make update-freebsd` |
 | `playbooks/docker-status.yml` | Show running containers | `make docker-status` |
 | `playbooks/reboot.yml` | Safe reboot with pre-flight | `make reboot HOST=<host>` |
 | `playbooks/zfs.yml` | ZFS scrub scheduling (london-b) | `ansible-playbook playbooks/zfs.yml` |
 ## Deploy Stages
-The deploy playbook runs in stages, each independently taggable:
+The deploy playbook runs in stages, each independently taggable (see `deploy.yml`):
-1. **common** — Baseline packages, SSH hardening, fish shell
+1. **common / baseline** — Baseline packages, SSH hardening, fish shell, dotfiles
-2. **docker** — Docker engine on container hosts
+2. **docker** — Docker engine on container hosts (`docker_hosts` group)
-3. **node-exporter** — Prometheus monitoring agent on all hosts
+3. **services** — Per-host service deployment:
-4. **services** — Per-host service deployment:
+   - `helsinki-a`: Caddy + status-page + custom systemd units
-   - `helsinki-a`: Caddy reverse proxy
+   - `docker_hosts`: Docker Compose stacks from `services/`
-   - `london-b`: Docker Compose services (Jellyseer, etc.)
+   - `nuremberg-a`: poste.io mail (Docker)
-   - `nuremberg-a`: poste.io mail
+   - `london-b`: `media_stack` + `backup` (rclone to B2)
-   - `copenhagen-a`: Minecraft + MaNGOS systemd services
+   - `copenhagen-a`: MaNGOS systemd units + MariaDB
-   - `london-a`: Prometheus + Grafana (FreeBSD)
+   - `london-a`: `proxmox_ve` (apt repo, nag patch, CIFS storage)
-5. **verify** — Post-deploy health check
+   - `zfs_hosts`: ZFS scrub scheduling
 Observability (node_exporter, systemd_exporter, Grafana Alloy) is part of the `common` baseline — every host gets it.
 Run a single stage: `ansible-playbook deploy.yml --tags docker`
@ -52,13 +54,18 @@ Run a single stage: `ansible-playbook deploy.yml --tags docker`
 | Role | Description |
 |------|-------------|
-| `common` | Base packages, SSH hardening, fish shell |
+| `common` | Base packages, SSH hardening, fish shell, exporters, Alloy |
 | `docker` | Docker engine install and setup |
 | `docker-services` | Deploy compose files from `services/` |
 | `dotfiles` | Shell config from `dotfiles/` |
 | `docker` | Docker engine install and setup |
 | `docker_services` | Deploy compose files from `services/` |
 | `caddy` | Caddy reverse proxy (helsinki-a) |
-| `node-exporter` | Prometheus node_exporter |
+| `status_page` | status.pez.sh generator script + cron |
-| `systemd-services` | Custom systemd units from `services/` |
+| `systemd_services` | Custom systemd units from `services/` |
 | `media_stack` | *Arr stack, Plex/Jellyfin, Samba, Syncthing on london-b |
 | `backup` | rclone-to-B2 cron job on london-b |
 | `mariadb` | Native MariaDB (used by MaNGOS on copenhagen-a) |
 | `proxmox_ve` | PVE no-subscription repo, UI lockdown, CIFS storage |
 | `zfs` | Weekly scrub cron on ZFS hosts |
 ## Inventory
--- a/ansible/services/README.md
+++ b/ansible/services/README.md
@ -1,45 +1,53 @@
 # Services
-Version-controlled service definitions across the fleet.
+Version-controlled service definitions across the fleet. Each subdirectory is a single deployable unit — either a Docker Compose stack, a systemd unit, or a static config file set — that the Ansible roles in `ansible/roles/` pick up and deploy.
-## Directory Structure
+## Layout
 ```
 services/
-├── systemd/              # systemd unit files (Linux hosts)
+├── <service-name>/
-│   ├── copenhagen-a/
+│   ├── docker-compose.yml      # Docker services
-│   │   ├── mangos-realmd.service   # MaNGOS Zero realm server
+│   ├── <service>.service       # Native systemd unit (when applicable)
-│   │   └── mangos-world.service    # MaNGOS Zero world server
+│   ├── config/                 # Mounted/copied config files
-│   └── helsinki-a/
+│   ├── *.enc.{yml,yaml,env}    # SOPS-encrypted secrets
-│       ├── caddy.service                    # Caddy reverse proxy (stock unit)
+│   └── README.md               # Service-specific notes (where relevant)
 │       └── thiswebsitedoesnotexist.service  # Node.js app on port 3721
 └── rc.d/                 # FreeBSD rc.conf and rc.d scripts
    └── london-a/
        └── rc.conf       # /etc/rc.conf — all enabled services
 ```
-## Notes
+There is **no** per-host subdirectory — services are named by what they are, and the host they land on is decided by `docker_services` / `systemd_services` lists in `ansible/inventory/host_vars/<host>.yml`.
-### copenhagen-a (Linux)
+## Service inventory
-| Service | Unit | Status | Notes |
+| Service | Type | Host(s) | Notes |
-|---------|------|--------|-------|
+|---|---|---|---|
-| MaNGOS realmd | `mangos-realmd.service` | enabled, custom | Realm server for WoW private server. Depends on MariaDB. |
+| caddy | Native (apt) | helsinki-a | Reverse proxy. Caddyfile lives here. |
-| MaNGOS world | `mangos-world.service` | enabled, custom | World server. Depends on MariaDB and realmd. |
+| authelia | Docker | helsinki-a | SSO, plus MariaDB and LLDAP sidecars |
 | bitwarden | Docker | helsinki-a | Vaultwarden + MariaDB |
 | forgejo | Docker | helsinki-a | Git forge |
 | poste-io | Docker | nuremberg-a | Mail |
 | jellyseerr | Docker | london-b | Plex request manager |
 | navidrome | Docker | london-b | Music streaming |
 | slskd | Docker | london-b | Soulseek client |
 | miniflux | Docker | london-b | RSS reader (with postgres) |
 | smartctl-exporter | Docker | london-b, copenhagen-a | SMART metrics |
 | plex-exporter | Docker | london-b | Plex metrics |
 | octopus-exporter | Docker | london-c | Octopus Energy metrics |
 | minecraft | Docker | copenhagen-a | PaperMC server |
 | radarr / sonarr / lidarr / readarr / prowlarr / whisparr | systemd | london-b | *Arr stack (systemd unit files here) |
 | transmission | systemd | london-b | Config files (the daemon itself is apt) |
 | samba / vsftpd | systemd | london-b | File-sharing config |
 | ollama | systemd | london-b | Custom unit + binary install |
 | mangos-realmd / mangos-world / mangos-zero | systemd | copenhagen-a | MaNGOS WoW server |
 | promtail | systemd | (currently unused; historical) | Log shipper, replaced by Alloy |
 | status-page | Cron script | helsinki-a | `update-status.sh` writes `/srv/status` |
 | rc.d | FreeBSD rc.conf | (historical) | Snapshot of london-a's old FreeBSD setup |
-### helsinki-a (Linux)
+## Conventions
-| Service | Unit | Status | Notes |
+- **Compose stacks** live at `<service>/docker-compose.yml` and are deployed to `/opt/docker/<service>/` on the target host.
-|---------|------|--------|-------|
+- **Systemd units** are copied to `/etc/systemd/system/<service>.service` by the `media_stack` or `systemd_services` role.
-| Caddy | `caddy.service` | enabled, stock | Installed via package manager. Config at `/etc/caddy/Caddyfile`. |
+- **Secrets** are SOPS-encrypted (`*.enc.yml`) and decrypted into place at deploy time.
 | thiswebsitedoesnotexist | `thiswebsitedoesnotexist.service` | enabled, custom | Node.js app. Env vars in `/opt/thiswebsitedoesnotexist/.env`. |
 ### london-a (Linux)
 No custom rc.d scripts — all services installed via `pkg`. The `rc.conf` captures all enabled services:
 | Service | Unit | Notes |
 |---------|-----------------|-------|
 | libvirtd | `libvirtd.service` | Virtualisation daemon |
 ## Adding a new service
 See [docs/getting-started.md](../../docs/getting-started.md#adding-a-new-service) for the end-to-end flow (compose → host_vars → Caddy → DNS → docs).
--- a/docs/README.md
+++ b/docs/README.md
@ -7,17 +7,19 @@ Everything you need to understand how this infrastructure works.
 - **[Architecture](architecture.md)** — High-level overview, network topology, traffic flow diagrams
 - **[Networking](networking.md)** — Tailscale mesh, physical networking, DNS and proxy flow
 - **[Services](services.md)** — Complete service map: what runs where, ports, auth
- **[Monitoring](monitoring.md)** — Prometheus, Grafana, exporters, alerting, status page
+- **[Monitoring](monitoring.md)** — Grafana Cloud, Alloy, synthetic checks, alerting via PagerDuty
 - **[Secrets](secrets.md)** — SOPS + age encryption: setup, usage, CI integration
 - **[Getting Started](getting-started.md)** — How to work with this repo, deploy changes, add services
 - **[Hosts](hosts/)** — Per-host detail (hardware, services, quirks)
 ## Quick Reference
 | Host | Tailscale IP | Location | Role |
 |------|-------------|----------|------|
-| helsinki-a | 100.67.6.27 | Hetzner Cloud | Reverse proxy, SSO, Bitwarden |
+| helsinki-a | 100.67.6.27 | Hetzner Cloud (Helsinki) | Reverse proxy, SSO, Bitwarden, Forgejo |
 | london-a | 100.122.180.98 | London | Proxmox VE hypervisor |
 | london-b | 100.84.65.101 | London | Storage, media, Docker services |
-| london-a | 100.122.219.41 | London | Prometheus + Grafana |
+| london-c | 100.123.72.87 | London | Raspberry Pi, Octopus Energy exporter |
-| nuremberg-a | 100.117.235.28 | Hetzner Cloud | Mail (poste.io) |
+| nuremberg-a | 100.70.180.24 | Hetzner Cloud (Nuremberg) | Mail (poste.io) |
-| copenhagen-a | 100.89.206.60 | Copenhagen | Minecraft, WoW |
+| copenhagen-a | 100.89.206.60 | Copenhagen | Minecraft, WoW/MaNGOS |
-| copenhagen-c | 100.115.45.53 | Copenhagen | Idle |
+| copenhagen-c | 100.115.45.53 | Copenhagen | Raspberry Pi, cloudflared, idle |
--- a/docs/architecture.md
+++ b/docs/architecture.md
@ -2,26 +2,29 @@
 ## Overview
-The infrastructure spans four physical locations connected by a Tailscale mesh network. All public traffic enters through a single Hetzner Cloud VPS (helsinki-a) running Caddy as a reverse proxy, which forwards requests over Tailscale to backend services running on physical servers in London and Copenhagen.
+The infrastructure spans three physical locations (London, Copenhagen, Hetzner Cloud) connected by a Tailscale mesh network. All public traffic enters through a single Hetzner Cloud VPS (helsinki-a) running Caddy as a reverse proxy, which forwards requests over Tailscale to backend services running on physical servers in London and Copenhagen.
-The setup is entirely self-hosted (with the exception of Hetzner Cloud VPSs and Cloudflare for DNS/CDN). Servers are old personal computers repurposed into server duty — cheaper than cloud, and I get a rack cabinet that doubles as a bedroom white noise machine.
+The setup is entirely self-hosted (with the exception of Hetzner Cloud VPSs, Cloudflare for DNS/CDN, and Grafana Cloud for observability). Most physical servers are old personal computers repurposed into server duty — cheaper than cloud, and I get a rack cabinet that doubles as a bedroom white noise machine.
 ## Network Topology
 ```mermaid
 graph TD
-    CF["<b>Cloudflare</b><br/>DNS + CDN<br/>*.pez.sh"]
+    CF["<b>Cloudflare</b><br/>DNS + CDN<br/>*.pez.sh, *.pez.solutions"]
    CF -->|HTTPS| HEL
-    HEL["<b>helsinki-a</b><br/>Hetzner Cloud VPS<br/><br/>Caddy (reverse proxy)<br/>Authelia (SSO)<br/>Bitwarden<br/>LLDAP"]
+    HEL["<b>helsinki-a</b><br/>Hetzner Cloud VPS<br/><br/>Caddy (reverse proxy)<br/>Authelia (SSO)<br/>LLDAP (Authelia backend)<br/>Bitwarden (Vaultwarden)<br/>Forgejo"]
    HEL --> TS["<b>Tailscale Mesh</b><br/>WireGuard-based VPN"]
-    TS --> LB["<b>london-b</b><br/>Storage / Media<br/>Docker services<br/>(46T ZFS)"]
+    TS --> LB["<b>london-b</b><br/>Storage / Media<br/>*arr stack, Plex, Jellyfin<br/>(Threadripper, 87T ZFS)"]
-    TS --> LA["<b>london-a</b><br/>Monitoring<br/>Prometheus / Grafana<br/>(FreeBSD)"]
+    TS --> LA["<b>london-a</b><br/>Proxmox VE hypervisor<br/>(Debian 13)"]
-    TS --> NA["<b>nuremberg-a</b><br/>Mail<br/>poste.io<br/>(Alpine)"]
+    TS --> LC["<b>london-c</b><br/>Raspberry Pi<br/>Octopus Energy exporter"]
-    TS --> CA["<b>copenhagen-a</b><br/>Gaming<br/>Minecraft / WoW/MaNGOS<br/>(Ubuntu)"]
+    TS --> NA["<b>nuremberg-a</b><br/>Mail<br/>poste.io"]
-    TS --> CC["<b>copenhagen-c</b><br/>(idle)"]
+    TS --> CA["<b>copenhagen-a</b><br/>Gaming<br/>Minecraft / WoW (MaNGOS)"]
    TS --> CC["<b>copenhagen-c</b><br/>Raspberry Pi<br/>cloudflared, idle"]
    TS -.->|Alloy| GC["<b>Grafana Cloud</b><br/>metrics, logs, traces<br/>synthetic checks"]
    style CC stroke-dasharray: 5 5
 ```
@ -34,7 +37,7 @@ All public-facing services follow the same pattern:
 User → Cloudflare (DNS + TLS) → helsinki-a (Caddy) → Backend (over Tailscale)
 ```
-1. DNS for `*.pez.sh` is managed by Cloudflare (provisioned via Terraform)
+1. DNS for `pez.sh` and `pez.solutions` is managed by Cloudflare (provisioned via Terraform)
 2. Cloudflare proxies traffic to helsinki-a
 3. Caddy on helsinki-a terminates TLS and routes to the correct backend
 4. For protected services, Caddy calls Authelia first (`forward_auth`)
@ -51,8 +54,8 @@ graph LR
    R["radarr.pez.sh"] --> A1 --> LB1["london-b:7878"]
    J["jellyfin.pez.sh"] --> A2 --> LB2["london-b:8096"]
-    G["grafana.pez.sh"] --> A3 --> LA["london-a:3000"]
+    G["git.pez.sh"] --> A3 --> LO3["localhost:3000 (Forgejo)"]
-    AU["auth.pez.sh"] --> A4 --> LO["localhost:9091"]
+    AU["auth.pez.sh"] --> A4 --> LO["localhost:9091 (Authelia)"]
 ```
 ## Auth Architecture
@ -60,17 +63,22 @@ graph LR
 ```mermaid
 graph TD
    Caddy["<b>Caddy</b><br/>forward_auth"] --> Authelia["<b>Authelia</b><br/>SSO<br/>auth.pez.sh"]
-    Authelia --> LLDAP["<b>LLDAP</b><br/>User directory"]
+    Authelia --> LLDAP["<b>LLDAP</b><br/>User directory<br/>(Authelia backend only)"]
    Authelia --> MariaDB["<b>MariaDB</b><br/>Authelia session/state"]
 ```
-Authelia authenticates against LLDAP (both on helsinki-a). One place to manage users — add or remove someone in LDAP and it propagates to all protected services.
+Authelia authenticates against LLDAP and uses a MariaDB for session/state. All three run as Docker containers on helsinki-a. LLDAP is **not** wired into other apps — it's purely Authelia's user backend. Services that sit behind Authelia inherit users from LLDAP via the Caddy `forward_auth` flow; services with their own auth (Bitwarden, Plex, Jellyfin, Navidrome, Jellyseerr, Forgejo, poste.io) maintain their own user databases.
-Services with their own auth (Bitwarden, Jellyfin, Plex, Nextcloud, Navidrome, Jellyseerr) are not behind Authelia.
+## Observability
 Metrics, logs, and traces ship to **Grafana Cloud** from every host via **Grafana Alloy**. The Alloy collectors are registered in Grafana Fleet Management (configured in `terraform/grafana/`). Synthetic uptime checks for the public sites run from Grafana Cloud probes, and PagerDuty handles alert delivery.
 > **History:** Monitoring used to run locally on london-a (FreeBSD, with Prometheus + Grafana). london-a has since been wiped and reinstalled as Proxmox VE; the local stack was retired in favour of Grafana Cloud. See [monitoring.md](monitoring.md) for the current setup.
 ## Design Principles
 - **Self-hosted first.** Cloud VPSs only where it makes sense (public gateway, mail with clean IP reputation). Everything else runs on physical hardware I own.
 - **Tailscale as the backbone.** No ports exposed on residential IPs. All inter-server communication goes over the mesh.
- **Ansible for everything.** If a server dies, reinstall the OS, install Tailscale, run Ansible. 30 minutes to full recovery.
+- **Ansible for everything.** If a server dies, reinstall the OS, install Tailscale, run `make deploy`. Roughly 30 minutes to full recovery.
- **Terraform for DNS.** All Cloudflare records are in code. No clicking around in dashboards.
+- **Terraform for cloud + DNS.** Hetzner servers, Cloudflare records, Grafana Cloud configuration, and PagerDuty are all in code. No clicking around in dashboards.
 - **Cattle, not pets (as much as possible).** The servers are technically pets — old hardware in specific locations — but the configs are cattle. Everything is reproducible from this repo.
--- a/docs/getting-started.md
+++ b/docs/getting-started.md
@ -8,10 +8,10 @@ You'll need:
 - **Tailscale** — installed and connected to the tailnet. All SSH access goes through Tailscale. No servers have SSH exposed on the public internet.
 - **SSH keys** — set up for each host you need to access
- **Ansible** — for configuration management and deployments
+- **Ansible** — for configuration management and deployments (`make deps` from `ansible/` installs collections)
- **OpenTofu** (or Terraform) — for managing Cloudflare DNS and infrastructure
+- **OpenTofu** (or Terraform) — for Hetzner, Cloudflare, Grafana Cloud, and PagerDuty
 - **Docker** — helpful to understand, since most services are containerised
- **SOPS + age** — for secrets encryption/decryption (run `./scripts/sops-setup.sh`)
+- **SOPS + age** — for secrets encryption/decryption (run `./ansible/scripts/sops-setup.sh`)
 - **Git** — obviously
 - **gh CLI** — for GitHub operations (PRs, issues, etc.)
@ -28,76 +28,98 @@ cd pez-infra
 pez-infra/
 ├── docs/           # You are here
 ├── ansible/        # Ansible playbooks, roles, inventory, and all managed files
-│   ├── roles/      # Ansible roles (caddy, docker, dotfiles, etc.)
+│   ├── roles/      # Ansible roles (common, caddy, docker, media_stack, proxmox_ve, etc.)
 │   ├── services/   # Docker Compose definitions and service configs
 │   ├── dotfiles/   # Shell config (fish, nvim, tmux, git, etc.)
 │   ├── playbooks/  # One-off playbooks (updates, reboots, status)
 │   └── scripts/    # Utility and maintenance scripts
-└── terraform/      # Terraform/OpenTofu for Cloudflare, DNS, etc.
+└── terraform/      # Terraform/OpenTofu for Hetzner, Cloudflare, Grafana Cloud, PagerDuty
 ```
 ## Connecting to hosts
-All access is via Tailscale. Once you're on the tailnet, SSH using the Tailscale IP or hostname:
+All access is via Tailscale, as `root`. Once you're on the tailnet, SSH using the Tailscale IP or hostname:
 ```bash
 ssh root@helsinki-a        # or ssh root@100.67.6.27
-ssh root@london-b         # or ssh root@100.84.65.101
+ssh root@london-a          # Proxmox VE host
-ssh root@london-a         # FreeBSD — might need a different user
+ssh root@london-b          # storage / media
-ssh root@copenhagen-a     # or ssh root@100.89.206.60
+ssh root@london-c          # Raspberry Pi
 ssh root@copenhagen-a
 ssh root@copenhagen-c      # Raspberry Pi
 ssh root@nuremberg-a
 ```
 ## Common Tasks
 ### Deploying configuration changes
-Ansible handles deployments. Playbooks are in `ansible/` and are structured by host/role.
+Ansible handles deployments. The unified `deploy.yml` rebuilds a host from bare-metal-with-Tailscale to fully configured.
 ```bash
-# Run the full site playbook
+cd ansible/
 cd ansible
 ansible-playbook site.yml
-# Target a specific host
+# Install collections
-ansible-playbook site.yml --limit london-b
+make deps
-# Dry run first
+# Dry run — see what would change
-ansible-playbook site.yml --check --diff
+make deploy-check
 # Deploy everything
 make deploy
 # Deploy a single host
 make deploy-host HOST=london-b
 # Or run a single stage
 ansible-playbook deploy.yml --tags docker
 ```
 Ansible also runs automatically via GitHub Actions on commits to the main branch — so a quick commit from your phone can fix a misconfiguration when you're out.
-### Managing DNS
+Other playbooks live under `ansible/playbooks/`:
-DNS records are managed via Terraform in the `terraform/` directory:
+| Playbook | Purpose |
 |---|---|
 | `update-all.yml` | OS package updates (all hosts) |
 | `update-linux.yml` | Linux-only updates (apt) |
 | `docker-status.yml` | Show running containers per host |
 | `reboot.yml` | Safe reboot with pre-flight (interactive confirm for london-b) |
 | `zfs.yml` | ZFS scrub scheduling |
 ### Managing cloud + DNS + observability
 Terraform manages Hetzner servers, Cloudflare DNS, Grafana Cloud (stack, fleet, dashboards, synthetic checks), and PagerDuty:
 ```bash
 cd terraform
-tofu plan          # see what would change
+make init   # initialize providers and B2 backend
-tofu apply         # apply the changes
+make plan   # preview changes
 make apply  # apply the changes
 ```
-All Cloudflare DNS records, pages, and access policies are defined here. Don't click around in the Cloudflare dashboard — if it's not in Terraform, it doesn't exist.
+State lives in a Backblaze B2 bucket (`pez-infra-tfstate`) via the S3-compatible backend. Don't click around in the Cloudflare or Grafana Cloud dashboards — if it's not in Terraform, it doesn't exist.
 ### Adding a new service
-1. **Create a Docker Compose file** in `ansible/services/<service-name>/docker-compose.yml`
+1. **Create a Docker Compose file** in `ansible/services/<service-name>/docker-compose.yml` (or a systemd unit if it's native)
-2. **Add the Caddy route** — if it needs a public subdomain, add a block to the Caddyfile in `ansible/services/caddy/`
+2. **Add the host_var** — list the service under `docker_services` (or `systemd_services`) in `ansible/inventory/host_vars/<host>.yml`
-3. **Add a DNS record** — add the subdomain to `terraform/` and run `tofu apply`
+3. **Add the Caddy route** — if it needs a public subdomain, add a block to `ansible/services/caddy/Caddyfile`
-4. **Add Ansible deployment** — create or update the relevant role in `ansible/` so the service gets deployed automatically
+4. **Add a DNS record** — add the subdomain to `terraform/hetzner/dns.tf` and run `tofu apply`
-5. **Add monitoring** — if the service has a metrics endpoint, add it as a Prometheus scrape target
+5. **Add monitoring** — if the service has a metrics endpoint, scrape it via Alloy (`terraform/grafana/fleet_pipelines/`)
-6. **Update docs** — add the service to `docs/services.md`
+6. **Update docs** — add the service to `docs/services.md` (and the relevant `docs/hosts/<host>.md` page)
 ### Adding a new server
-1. Install the OS (Ubuntu preferred — see below)
+1. Install the OS (Debian 13 or Ubuntu LTS preferred — see below)
-2. Set up SSH keys
+2. Set up SSH keys for `root`
 3. Install Tailscale and join the tailnet
-4. Add the host to the Ansible inventory in `ansible/`
+4. Add the host to `ansible/inventory/hosts.ini` and create `ansible/inventory/host_vars/<host>.yml`
-5. Assign roles (at minimum: node_exporter for monitoring)
+5. Run `make deploy-host HOST=<new-host>` from `ansible/`
-6. Run `ansible-playbook site.yml --limit <new-host>`
+6. Register the host as a Grafana Fleet collector in `terraform/grafana/fleet_collectors.tf` and `tofu apply`
-7. Update `docs/services.md` and `docs/architecture.md`
+7. Add a doc at `docs/hosts/<host>.md` and update `docs/services.md` + `docs/architecture.md`
-That's it. Ansible takes care of installing node_exporter, configuring the system, and deploying any assigned services.
+That's it. The common role installs node_exporter, systemd_exporter, and Alloy as part of the baseline, so observability is automatic.
 ### Working with ZFS (london-b)
@ -108,17 +130,20 @@ zpool status hdd
 # Check usage
 zfs list
-# Scrub status (runs weekly on Sundays)
+# Scrub status (runs weekly on Sundays at 12:00)
 zpool status hdd | grep scan
 ```
-ZFS is set up with 3× RAIDZ1 vdevs across 8 drives. Tolerates one drive failure per vdev.
+ZFS is set up with 3× RAIDZ1 vdevs of 4 drives each (12 drives total) on the `hdd` pool. Tolerates one drive failure per vdev. The long-term plan is to replace the 8 TB drives with 24 TB drives and grow the pool toward 24 drives / ~0.5 PB raw.
 ## OS Choice
-Ubuntu is the preferred OS for new servers. Not because I love it — Alpine is faster and leaner — but because Ansible support is vastly better. The lack of GNU binaries and systemd on Alpine caused enough headaches that the switch to Ubuntu was worth it.
+- **Debian (12 or 13)** is the default for new hosts — including the Raspberry Pis. Stable, well-supported by Ansible, predictable.
 - **Ubuntu LTS** is on london-b and copenhagen-a (historical — both came up before the Debian standard).
 - **Proxmox VE** (Debian Bookworm under the hood) on london-a.
 - **No more FreeBSD.** london-a used to run FreeBSD for Prometheus/Grafana; that's all on Grafana Cloud now and london-a is Linux/Proxmox.
-FreeBSD is used on london-a (monitoring) and works well for that single-purpose role.
+Alpine has been tried and rejected — the missing GNU binaries / systemd caused enough Ansible headaches to not be worth the size savings.
 ## Secrets
@ -151,7 +176,7 @@ This monorepo replaces several standalone repos:
 |----------|-------------|
 | pez-ansible | `ansible/` |
 | pez-terraform | `terraform/` |
-| pez-grafana | `services/grafana/` |
+| pez-grafana | `terraform/grafana/` |
-| pez-proxy | `services/caddy/` |
+| pez-proxy | `ansible/services/caddy/` |
 | pez-docs | `docs/` |
-| server-scripts | `scripts/` and `ansible/` |
+| server-scripts | `ansible/scripts/` and `ansible/roles/` |
--- a/docs/hosts/copenhagen-a.md
+++ b/docs/hosts/copenhagen-a.md
@ -7,7 +7,7 @@ Game servers. Located at my dad's place in Copenhagen as an off-site location.
 | | |
 |---|---|
 | **Location** | Copenhagen |
-| **OS** | Ubuntu 22.04 |
+| **OS** | Ubuntu 22.04 LTS |
 | **Tailscale IP** | 100.89.206.60 |
 | **Role** | Gaming servers (Minecraft, WoW) |
 | **Form factor** | Lenovo "tiny" desktop (lunchbox-sized) |
@ -18,7 +18,7 @@ Game servers. Located at my dad's place in Copenhagen as an off-site location.
 |---|---|
 | CPU | Intel i5-4570T (4 threads) |
 | Memory | 16 GB |
-| Boot disk | 500 GB (26% used) |
+| Boot disk | 500 GB |
 Compact Lenovo desktop — powered by a standard ThinkPad charging brick. Small, quiet, and draws minimal power.
@ -28,11 +28,11 @@ Compact Lenovo desktop — powered by a standard ThinkPad charging brick. Small,
 | | |
 |---|---|
-| Image | `marctv/minecraft-papermc-server` |
+| Image | `itzg/minecraft-server` |
 | Port | 25565 |
 | Deployment | Docker |
-PaperMC for better performance than vanilla. Not proxied through Caddy — accessed directly via Tailscale or the host's IP.
+Not proxied through Caddy — accessed directly via Tailscale or the host's public IP.
 ### World of Warcraft (MaNGOS Zero)
@ -47,29 +47,18 @@ WoW 1.12 (Vanilla) private server using the MaNGOS Zero emulator. Runs natively
 - Runs as the `mangos` user
 - Install path: `/home/mangos/mangos/zero/`
 - MariaDB hosts the character, world, and auth databases locally
 - Both `mangos-realmd` and `mangos-world` start automatically on boot via systemd
 - The `mariadb` Ansible role manages package + secrets; the `systemd_services` role drops the unit files (`ansible/services/mangos-realmd/`, `ansible/services/mangos-world/`)
-Both `mangos-realmd` and `mangos-world` start automatically on boot via systemd.
+### Other
-### Monitoring
+| Service | Port | Deployment | Notes |
-
+|---------|------|-----------|-------|
-| Service | Port | Managed by |
+| smartctl_exporter | 9633 | Docker | Disk SMART metrics scraped by Alloy |
-|---------|------|-----------|
+| node_exporter | 9100 | Native | Host metrics |
-| node_exporter | 9100 | systemd (Ansible-managed) |
+| systemd_exporter | — | Native | systemd unit metrics |
-
+| Alloy | — | Native | Ships everything to Grafana Cloud |
-Prometheus Node Exporter for host metrics. Installed and managed via the Ansible `node_exporter` role. Scraped by Prometheus on london-a via Tailscale.
+| Tailscale | — | Native | Mesh networking |
 > **Note:** Stale Docker images for `prom/node-exporter` and `quay.io/prometheus/node-exporter` exist on the host from a previous Docker-based deployment. These should be cleaned up — the systemd service is the active one.
 ### Potentially Unused Services
 The following services are running but have no known active consumers:
 | Service | Notes |
 |---------|-------|
 | PostgreSQL 14 | Only default databases (template0, template1, postgres). Likely leftover. |
 | Redis 6.0 | Running but no known application depends on it. |
 These are candidates for removal or investigation.
 ## Networking
@ -77,4 +66,4 @@ Connected directly to the ISP router's built-in switch. Symmetrical 500 Mbit con
 ## Notes
-Copenhagen-a has a static IP, which is needed for game servers that require direct client connections (WoW realm list, Minecraft server list).
+copenhagen-a has a static public IP, which is needed for game servers that require direct client connections (WoW realm list, Minecraft server list). The reboot playbook (`ansible/playbooks/reboot.yml`) does a netplan pre-flight check before rebooting to make sure the static IP config will come back up cleanly.
--- a/docs/hosts/copenhagen-c.md
+++ b/docs/hosts/copenhagen-c.md
@ -1,21 +1,29 @@
 # copenhagen-c
-General purpose box. Currently idle.
+Raspberry Pi at the Copenhagen site. General-purpose / off-site utility box.
 ## Overview
 | | |
 |---|---|
 | **Location** | Copenhagen |
-| **OS** | Debian 12 |
+| **OS** | Debian 12 (Bookworm), aarch64 |
 | **Tailscale IP** | 100.115.45.53 |
-| **Role** | Idle / available |
+| **Role** | Idle / cloudflared tunnel |
-| **Disk** | 117 GB (15% used) |
+| **Form factor** | Raspberry Pi (ARM64) |
-## Status
+## Services
-No active workloads. Connected to Tailscale and available for future use. Has node_exporter running for monitoring.
+| Service | Deployment | Notes |
 |---------|-----------|-------|
 | cloudflared | Native (systemd) | Cloudflare-managed tunnel for ad-hoc exposure of services from this site |
 | Tailscale | Native | Mesh networking |
 | Alloy | Native | Ships metrics/logs to Grafana Cloud |
 | node_exporter | Native | Host metrics |
 | Docker / containerd | Native | Available, but no compose services currently scheduled here |
 The cloudflared token is stored directly in the systemd unit (`/etc/systemd/system/cloudflared.service`); the tunnel itself is configured in the Cloudflare dashboard.
 ## Notes
-Part of the Copenhagen off-site setup at my dad's place. Available if I need to spin up something that benefits from a Copenhagen location or just need another box.
+Part of the Copenhagen off-site setup at my dad's place. Otherwise idle — available if I need to spin up something that benefits from a Copenhagen location or just need another always-on box.
--- a/docs/hosts/helsinki-a.md
+++ b/docs/hosts/helsinki-a.md
@ -7,31 +7,44 @@ Public-facing traffic gateway. Everything exposed to the internet goes through t
 | | |
 |---|---|
 | **Location** | Hetzner Cloud (Helsinki) |
-| **OS** | Linux (Ubuntu/Debian) |
+| **OS** | Debian 13 (Trixie) |
 | **Tailscale IP** | 100.67.6.27 |
-| **Role** | Reverse proxy, SSO, Bitwarden, LDAP |
+| **Role** | Reverse proxy, SSO, Bitwarden, Forgejo |
 | **Provider** | Hetzner Cloud VPS |
 ## What it does
-This is the front door. All public subdomains (*.pez.sh) terminate here via Caddy, which proxies traffic to the appropriate backend over Tailscale.
+This is the front door. All public subdomains under `pez.sh` and `pez.solutions` terminate here via Caddy, which proxies traffic to the appropriate backend over Tailscale.
-It also runs the auth stack — Authelia for SSO and LLDAP for user management. Having auth on the same box as the proxy keeps latency low for the `forward_auth` check.
+It also runs the auth stack — Authelia for SSO and LLDAP as Authelia's user backend. Having auth on the same box as the proxy keeps latency low for the `forward_auth` check.
-Bitwarden (Vaultwarden) lives here too, because password management needs to be available even if the London servers are having a moment.
+Bitwarden (Vaultwarden) and Forgejo also live here. Both expose their own login and don't go through Authelia. Bitwarden is on helsinki-a for availability — password management needs to be reachable even if the London servers are having a moment. Forgejo is colocated for the same reason and to keep Git access independent of home internet.
 ## Services
 | Service | Port | Deployment | Notes |
 |---------|------|-----------|-------|
-| Caddy | 80, 443 | Docker | Reverse proxy + TLS termination |
+| Caddy | 80, 443 | Native (apt + systemd) | Reverse proxy + TLS termination. Config at `/etc/caddy/Caddyfile` |
 | Authelia | 9091 | Docker | SSO, accessible at auth.pez.sh |
-| Bitwarden (Vaultwarden) | 8443 | Docker | bitwarden.pez.sh, own auth |
+| Authelia MariaDB | (internal) | Docker | Authelia session/state |
-| LLDAP | 3890/17170 | Docker | User directory for Authelia |
+| LLDAP | 3890, 17170 | Docker | User directory for Authelia (UI at ldap.pez.sh) |
 | Bitwarden (Vaultwarden) | 8443, 8080 | Docker | bitwarden.pez.sh, own auth |
 | Bitwarden MariaDB | (internal) | Docker | Backing DB |
 | Forgejo | 3000 (HTTP), 2222 (SSH) | Docker | git.pez.sh, own auth; SSH on `git.pez.sh:2222` |
-Also serves static content:
+Caddy is the only service installed natively — it needs to bind 80/443 directly and there's no benefit to wrapping it in Docker on a single-purpose proxy host. Everything else runs as Docker Compose stacks under `/opt/docker/<service>/` (managed by the `docker_services` Ansible role from `ansible/services/<service>/docker-compose.yml`).
- **status.pez.sh** → `/srv/status` (public status page)
+
- **apps.pez.sh** → `/srv/apps` (behind Authelia)
+### Static sites
 Caddy also serves static content from `/srv/`:
 | Path | URL | Auth |
 |---|---|---|
 | `/srv/status` | status.pez.sh | — |
 | `/srv/apps` | apps.pez.sh, apps.pez.solutions | Authelia |
 | `/srv/pez.sh` | pez.sh | — |
 | `/srv/pez.solutions` | pez.solutions | — |
 | `/srv/pez-signup` | signup.pez.solutions | — |
 ## Why Hetzner Cloud
--- a/docs/hosts/london-a.md
+++ b/docs/hosts/london-a.md
@ -1,13 +1,13 @@
 # london-a
-Proxmox VE hypervisor.
+Proxmox VE hypervisor. The platform for any VM workloads I want to run on owned hardware.
 ## Overview
 | | |
 |---|---|
 | **Location** | London (NW9) |
-| **OS** | Proxmox VE (Debian Bookworm) |
+| **OS** | Debian 13 (Trixie) with Proxmox VE 9.x |
 | **Tailscale IP** | 100.122.180.98 |
 | **Role** | Hypervisor (Proxmox VE) |
@ -25,9 +25,40 @@ Old gaming PC. Runs Proxmox VE on bare metal.
 | Service | Port | Status | Notes |
 |---------|------|--------|-------|
-| Proxmox VE | 8006 | Active | Web UI — Tailscale only |
+| Proxmox VE | 8006 | Active | Web UI — reachable via `london-a.pez.sh` (Caddy) or Tailscale IP |
 | Tailscale | — | Active | Mesh networking |
 | node_exporter, systemd_exporter, Alloy | — | Active | Observability baseline (Ansible-managed) |
 ### Storage
 Proxmox is connected to a CIFS share on **london-b** (`100.84.65.101 /pve`) for ISO/template/backup storage. The mount is configured by the `proxmox_ve` Ansible role:
 | Storage ID | Type | Backing |
 |---|---|---|
 | `local-lvm` | LVM-Thin | Local boot disk |
 | `hdd` | CIFS | london-b `/pve` share |
 ### VMs
 | VMID | Name | Status | Notes |
 |---|---|---|---|
 | 100 | Mac-Server | Stopped | macOS Sequoia VM (OpenCore bootloader). Intended for occasional macOS workloads. |
 The VM list will grow over time — this is a general-purpose hypervisor, not a single-VM appliance.
 ## Ansible
 The `proxmox_ve` role:
 - Swaps the enterprise apt repo for `pve-no-subscription` so updates work without a paid subscription
 - Patches `proxmoxlib.js` to suppress the subscription nag dialog
 - Restricts the web UI to the `tailscale0` interface via UFW
 - Mounts the london-b CIFS storage
 ## Networking
-Connected via Cat 5 to the Ubiquiti switch alongside london-b.
+Connected via Cat 5 to the Ubiquiti switch alongside london-b and london-c.
 ## History
 london-a used to run **FreeBSD** as a single-purpose monitoring host (Prometheus + Grafana). Monitoring moved to Grafana Cloud, the box was repaved as Proxmox VE, and the FreeBSD-specific Ansible has been removed.
--- a/docs/hosts/london-b.md
+++ b/docs/hosts/london-b.md
@ -19,24 +19,26 @@ Primary storage and media server. The workhorse of the fleet.
 | Memory | 64 GB |
 | GPU | Nvidia GTX 980 |
 | Boot disk | 500 GB |
-| Storage pool | ~64 TB (ZFS) |
+| Storage pool | ~87 TB raw / ~64 TB usable (ZFS) |
 This machine is ridiculously overpowered as a media server. It's my old gaming/workstation PC repurposed into server duty. The GPU helps with Plex transcoding but the CPU can handle it fine on its own.
 ## Storage
-ZFS pool `hdd`: 3× RAIDZ1 vdevs, 8 drives total.
+ZFS pool `hdd`: 3× RAIDZ1 vdevs, 4 drives each (12 drives total).
 | Metric | Value |
 |---|---|
-| Used | 46 TB |
+| Used | ~61 TB |
-| Free | 18 TB |
+| Free | ~26 TB |
-| Total | ~64 TB |
+| Total | ~87 TB raw |
-| Usage | 72% |
+| Usage | ~70% |
-| Scrub | Weekly (Sundays) |
+| Scrub | Weekly (Sundays at 12:00, cron `/sbin/zpool scrub hdd`) |
 RAIDZ1 tolerates one drive failure per vdev. With this many drives and this much data, ZFS checksumming is essential — silent data corruption on spinning disks is real and you don't want to find out about it years later.
 **Roadmap:** the long-term plan is to gradually replace the 8 TB drives with 24 TB drives and grow the pool toward 24 drives / ~0.5 PB raw.
 ## Services
 ### Media Servers
@ -58,15 +60,19 @@ RAIDZ1 tolerates one drive failure per vdev. With this many drives and this much
 | Prowlarr | 9696 | prowlarr.pez.sh |
 | Transmission | 9091 | download.pez.sh |
 | Jellyseerr | 5055 | request.pez.sh |
 | Overseerr (snap) | 5056 | jellyfin-requests.pez.sh |
 ### Other
 | Service | Port | URL |
 |---------|------|-----|
-| Nextcloud AIO | 11000 | cloud.pez.sh |
+| Nextcloud AIO | 11000 | cloud.pez.sh (internal) |
 | Miniflux | 8181 | rss.pez.sh |
 | slskd (Soulseek) | 5030 | soulseek.pez.sh |
-| smartctl_exporter | 9633 | (Prometheus scrape) |
+| Syncthing (`syncthing@pez`) | 8384 | (LAN / Tailscale) |
-| prom-plex-exporter | — | (Prometheus scrape) |
+| Ollama | 11434 | (Tailscale) |
 | smartctl_exporter | 9633 | (Alloy scrape) |
 | prom-plex-exporter | 9594 | (Alloy scrape) |
 ### Systemd Services (non-Docker)
@ -85,12 +91,15 @@ The media automation suite and several supporting services run as native systemd
 | Transmission | transmission-daemon | Package-managed |
 | Samba | smbd | Package-managed |
 | Ollama | ollama | /usr/local/bin, custom unit |
-| Promtail | promtail | Custom unit, ships logs to Loki |
+| Syncthing | syncthing@pez | Package-managed, user instance |
 | vsftpd | vsftpd | FTP server for /hdd/ftp |
 | systemd_exporter | systemd_exporter | Ansible-managed |
-| node_exporter | node_exporter | Ansible-managed |
+| node_exporter | prometheus-node-exporter | apt-managed |
 | Alloy | alloy | Grafana Alloy, fleet-managed config |
-Docker services: Nextcloud AIO, Jellyseerr, Navidrome, slskd, Miniflux, smartctl-exporter, plex-exporter.
+Docker services: Nextcloud AIO, Jellyseerr, Navidrome, slskd, Miniflux (with postgres sidecar), smartctl-exporter, plex-exporter.
 Snap: Overseerr (`latest/beta` channel).
 ### Cron Jobs
@ -99,7 +108,8 @@ Docker services: Nextcloud AIO, Jellyseerr, Navidrome, slskd, Miniflux, smartctl
 | Every hour | `/root/scripts/movie-rename-fix.fish` |
 | Midnight daily | `systemctl restart radarr` |
 | Midnight daily | `systemctl restart sonarr` |
-| 22:00 daily | `/root/scripts/backup.sh` (rclone to B2) |
+| 22:00 daily | `/root/scripts/backup.sh` (rclone to Backblaze B2) |
 | Sundays 12:00 | `/sbin/zpool scrub hdd` |
 ### Samba Shares
@ -108,8 +118,9 @@ Docker services: Nextcloud AIO, Jellyseerr, Navidrome, slskd, Miniflux, smartctl
 | HDD | /hdd | pez, root (rw) |
 | Movies | /hdd/movies | public (ro) |
 | TV Shows | /hdd/tv | public (ro) |
 | pve | /hdd/pve | london-a Proxmox (rw) — ISO/template/backup storage |
-Media is served directly from the ZFS pool.
+Media is served directly from the ZFS pool. Docker root (`/hdd/docker`) and PVE storage (`/hdd/pve`) live on the pool too.
 ## Networking
--- a/docs/hosts/london-c.md
+++ b/docs/hosts/london-c.md
@ -0,0 +1,36 @@
 # london-c
 Raspberry Pi at the London site. Edge utility box for lightweight workloads that don't justify spinning up the Threadripper.
 ## Overview
 | | |
 |---|---|
 | **Location** | London (NW9) |
 | **OS** | Debian 13 (Trixie), aarch64 |
 | **Tailscale IP** | 100.123.72.87 |
 | **Role** | Octopus Energy exporter, general-purpose Pi |
 | **Form factor** | Raspberry Pi (ARM64) |
 ## Services
 | Service | Port | Deployment | Notes |
 |---------|------|-----------|-------|
 | octopus_exporter | 9359 | Docker (`rwejlgaard/octopus_exporter`) | Pulls electricity-usage data from the Octopus Energy API; scraped by Alloy |
 | Tailscale | — | Native | Mesh networking |
 | Docker / containerd | — | Native | For octopus-exporter |
 | Alloy | — | Native (Ansible-managed) | Ships metrics/logs to Grafana Cloud |
 | node_exporter | 9100 | Native | Host metrics |
 | systemd_exporter | — | Native | systemd unit metrics |
 | fail2ban | — | Native | SSH brute-force protection |
 Compose file lives at `ansible/services/octopus-exporter/docker-compose.yml`. The `OCTOPUS_API_KEY` is templated in from a SOPS-encrypted variable.
 ## Networking
 Connected via Ethernet to the Ubiquiti switch alongside london-a and london-b.
 ## Notes
 - Single-board-computer form factor — runs cool, draws ~5 W, lives on the rack shelf without active cooling.
 - A natural place to park future "small but always-on" workloads (sensors, cron jobs, smart-home glue) that don't need to share fate with london-b.
--- a/docs/hosts/nuremberg-a.md
+++ b/docs/hosts/nuremberg-a.md
@ -7,7 +7,7 @@ Dedicated mail server. One job, does it well.
 | | |
 |---|---|
 | **Location** | Hetzner Cloud (Nuremberg) |
-| **OS** | Debian |
+| **OS** | Debian 13 (Trixie) |
 | **Tailscale IP** | 100.70.180.24 |
 | **Role** | Mail server (poste.io) |
 | **Provider** | Hetzner Cloud VPS |
@ -16,10 +16,12 @@ Dedicated mail server. One job, does it well.
 | Service | Ports | Deployment |
 |---------|-------|-----------|
-| poste.io | 25, 587, 993, 443 | Docker |
+| poste.io | 25, 80, 110, 143, 443, 465, 587, 993, 995 | Docker |
 poste.io is a batteries-included mail server that bundles postfix, dovecot, rspamd, and webmail into a single Docker container. No juggling separate containers for each mail component.
 The compose definition lives at `ansible/services/poste-io/docker-compose.yml` and is deployed via the `docker_services` Ansible role (see `ansible/inventory/host_vars/nuremberg-a.yml`).
 ## Why a separate server
 Mail lives on its own VPS to isolate its IP reputation. If the IP gets flagged for any reason, it doesn't affect the rest of the infrastructure. And if something else gets flagged, it doesn't affect mail deliverability.
@ -35,4 +37,4 @@ Mail-related DNS records are managed via Cloudflare (Terraform):
 ## Firewall
-Managed by Hetzner Cloud firewall rules (Terraform). Mail ports are exposed via Docker port mappings in `ansible/services/poste-io/docker-compose.yml`.
+Managed by Hetzner Cloud firewall rules (Terraform, `terraform/hetzner/firewall.tf`). Mail ports are exposed via Docker port mappings in `ansible/services/poste-io/docker-compose.yml`.
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@ -2,111 +2,82 @@
 ## Stack Overview
 Observability is a fully managed pipeline today: every host runs **Grafana Alloy** as the local collector, and everything ships to **Grafana Cloud**. Synthetic checks are also driven from Grafana Cloud, and alerts are routed to **PagerDuty**.
 ```mermaid
-graph TD
+graph LR
-    subgraph "london-a (FreeBSD)"
+    subgraph "Fleet (each host)"
-        Prometheus[":9090 Prometheus"] -->|query| Grafana[":3000 Grafana"]
+        NE["node_exporter :9100"]
        SE["systemd_exporter :9558"]
        XE["host-specific<br/>exporters<br/>(smartctl, plex,<br/>octopus...)"]
        Alloy["alloy.service<br/>(Grafana Alloy)"]
        NE --> Alloy
        SE --> Alloy
        XE --> Alloy
    end
-    Prometheus -->|scrape over Tailscale| NE["node_exporter<br/>(all hosts) :9100"]
+    Alloy -->|metrics, logs, traces| GC["<b>Grafana Cloud</b><br/>pez.grafana.net"]
-    Prometheus -->|scrape over Tailscale| SE["smartctl_exporter<br/>(london-b) :9633"]
+    SM["Synthetic Monitoring<br/>probes (London)"] -->|HTTPS GETs| Internet["https://*.pez.sh"]
-    Prometheus -->|scrape over Tailscale| PE["plex_exporter<br/>(london-b)"]
+    SM --> GC
    GC -->|alerts| PD["PagerDuty"]
 ```
-Both Prometheus and Grafana are accessible via:
+Everything in `terraform/grafana/` is the source of truth for the Grafana Cloud side: stack, Fleet Management collectors, fleet pipelines, dashboards, and synthetic checks. Everything in `terraform/pagerduty/` configures the on-call destination.
 - **grafana.pez.sh** (behind Authelia)
 - **prometheus.pez.sh** (behind Authelia)
-## Prometheus
+## Grafana Alloy (per-host collector)
-Prometheus runs on london-a and scrapes metrics from exporters across the fleet. All scrape targets are reached over Tailscale — no ports need to be exposed on the public internet.
+Alloy runs as `alloy.service` on every host in the inventory. Each host is registered as a Grafana Fleet Management collector in `terraform/grafana/fleet_collectors.tf`, tagged with a `location` attribute (`london`, `copenhagen`, `cloud`) so pipelines can target subsets of the fleet.
-### Scrape Targets
+Pipelines (what to scrape, how to relabel, where to ship) live in `terraform/grafana/fleet_pipelines/` and are pushed to Grafana Cloud as a `grafana_fleet_management_pipeline` resource. The Alloy daemons on each host pull their config from Fleet Management.
-| Target | Host | Port | What |
+### Local exporters scraped by Alloy
 |--------|------|------|------|
 | node_exporter | All hosts | 9100 | System metrics (CPU, memory, disk, network) |
 | smartctl_exporter | london-b | 9633 | Disk SMART health data |
 | prom-plex-exporter | london-b | (varies) | Plex streaming activity |
-node_exporter is deployed to every host via Ansible. It's one of the first things that gets installed on a new server.
+| Exporter | Hosts | What |
 |---|---|---|
 | node_exporter | All hosts | CPU, memory, disk, network, system uptime |
 | systemd_exporter | All hosts | Per-unit systemd state |
 | smartctl_exporter (Docker) | london-b, copenhagen-a | Disk SMART data |
 | prom-plex-exporter (Docker) | london-b | Plex streaming activity |
 | octopus_exporter (Docker) | london-c | Octopus Energy electricity usage |
 | Caddy `/metrics` | helsinki-a | HTTP request metrics, upstream health (per host) |
-### Adding a scrape target
+### Logs
-1. Deploy the exporter to the host (via Ansible or Docker)
+Alloy ships systemd journal entries from every host to Grafana Cloud Logs. Log-derived alerts (e.g. SSH brute-force, mail server errors) can be configured directly in Grafana Cloud.
 2. Add the target to the Prometheus config in `services/prometheus/`
 3. Deploy the updated config (Ansible or manual restart)
 4. Verify it shows up in Prometheus targets page
-## Grafana
+## Synthetic Monitoring
-Grafana reads from Prometheus and provides dashboards for everything worth watching.
+Grafana Cloud's Synthetic Monitoring service runs HTTPS probes from the London region against the public services, every 10 minutes. Configured in `terraform/grafana/synthetic_checks.tf`:
-### Dashboards
+| Check | URL |
 |---|---|
 | pez_sh | https://pez.sh |
 | pez_solutions | https://pez.solutions |
 | jellyfin | https://jellyfin.pez.sh |
 | plex | https://plex.pez.sh (auth header) |
 | request | https://request.pez.sh |
 | jellyfin_requests | https://jellyfin-requests.pez.sh |
 | git | https://git.pez.sh |
-| Dashboard | What it shows |
+Each check has a `ProbeFailedExecutionsTooHigh` alert wired up (3 failed executions in a 30-minute window).
 |-----------|--------------|
 | Server Health | CPU, memory, disk usage, network I/O across all hosts |
 | ZFS | Pool status, usage, scrub results for london-b |
 | SMART | Disk health metrics, temperature, error counts |
 | Plex | Active streams, transcoding status, library stats |
-### Adding a dashboard
+## Alerting → PagerDuty
-Dashboards are defined in `services/grafana/`. Export as JSON from Grafana and commit to the repo to keep them in version control.
+PagerDuty is configured in `terraform/pagerduty/`:
-## Exporters
+- A single service (`pez-infra`) receives alerts
-
+- Escalation policy fires to me directly
-### node_exporter
+- The Grafana Cloud → PagerDuty integration sends every fired alert (synthetic check failures today; can be extended to log/metric alerts)
 Standard Prometheus node exporter. Deployed on every host. Provides system-level metrics:
 - CPU usage and load averages
 - Memory usage
 - Disk space and I/O
 - Network traffic
 - System uptime
 Installed via Ansible as part of the base server setup.
 ### smartctl_exporter
 Runs on london-b (the ZFS storage server with 8 spinning disks). Exposes SMART data from all drives:
 - Temperature
 - Reallocated sectors
 - Read/write error rates
 - Power-on hours
 - Overall health assessment
 Critical for catching dying drives before they take out a RAIDZ1 vdev.
 ### prom-plex-exporter
 Runs on london-b. Scrapes the Plex API and exposes metrics about:
 - Active streams
 - Transcode sessions
 - Library size
 - User activity
 Mostly for fun — it's satisfying to see the Plex dashboard light up when people are streaming.
 ## Status Page
-**status.pez.sh** is a lightweight public status page that shows service availability.
+**status.pez.sh** is a public status page hosted on helsinki-a at `/srv/status`.
- Pulls availability data from Prometheus
+- Cron-driven static JSON (see `ansible/roles/status_page/`) — does not require Grafana Cloud to render
- Shows 90-day uptime history
+- Hosted directly by Caddy as a `file_server`
- Hosted on helsinki-a at `/srv/status`
+- Public by design (no Authelia)
- Source: [RWejlgaard/pez-status](https://github.com/RWejlgaard/pez-status)
+- Source repo for the front-end: [RWejlgaard/pez-status](https://github.com/RWejlgaard/pez-status)
 - Not behind Authelia — it's public by design
-## Alerting
+## History
-Prometheus alerting rules can be configured in the Prometheus config. Alert conditions worth monitoring:
+Monitoring used to run locally on **london-a** (FreeBSD) with a self-hosted Prometheus + Grafana. When london-a was reinstalled as Proxmox VE, the local stack was retired and everything moved to Grafana Cloud + Alloy. Older docs (and a few legacy hard-coded IPs in helper scripts) may still reference `100.122.219.41:9090` — that endpoint no longer exists.
 - Host down (node_exporter unreachable)
 - Disk space critical (>90% usage)
 - ZFS scrub errors
 - SMART drive failures
 - High memory usage
 Grafana can also be configured with alert channels (email, webhooks, etc.) for dashboard-based alerts.
--- a/docs/networking.md
+++ b/docs/networking.md
@ -9,16 +9,17 @@ All inter-server communication uses Tailscale IPs:
 | Host | Tailscale IP |
 |------|-------------|
 | helsinki-a | 100.67.6.27 |
 | london-a | 100.122.180.98 |
 | london-b | 100.84.65.101 |
-| london-a | 100.122.219.41 |
+| london-c | 100.123.72.87 |
-| nuremberg-a | 100.117.235.28 |
+| nuremberg-a | 100.70.180.24 |
 | copenhagen-a | 100.89.206.60 |
 | copenhagen-c | 100.115.45.53 |
 ### What Tailscale is used for
 - **Reverse proxying:** Caddy on helsinki-a forwards traffic to backends via Tailscale IPs
- **Monitoring:** Prometheus on london-a scrapes exporters on all hosts via Tailscale
+- **Observability:** Grafana Alloy on each host pushes metrics/logs/traces to Grafana Cloud; intra-fleet probes (e.g. Proxmox UI) hop over Tailscale
 - **SSH access:** All SSH is done over Tailscale — no SSH ports exposed to the internet
 - **Ansible deployments:** GitHub Actions runs Ansible over Tailscale SSH connections
 - **Exit nodes:** Servers can act as VPN endpoints — useful for accessing UK content from Copenhagen or vice versa
@ -29,21 +30,22 @@ All inter-server communication uses Tailscale IPs:
 graph TD
    HEL["helsinki-a"] <--> LB["london-b"]
    HEL <--> LA["london-a"]
    HEL <--> LC["london-c"]
    HEL <--> NA["nuremberg-a"]
-    LB <--> LA
+    HEL <--> CA["copenhagen-a"]
-    LB <--> CA["copenhagen-a"]
+    HEL <--> CC["copenhagen-c"]
-    LA <--> CA
+    LA <--> LB
-    CA <--> CC["copenhagen-c"]
+    LA <--> LC
-    NA <--> CA
+    LB <--> LC
-    HEL <--> CA
+    LB <--> CA
    HEL <--> CC
    LB <--> CC
    NA <--> CA
    NA <--> LB
    NA <--> CC
    NA <--> LA
-    LA <--> CC
+    CA <--> CC
    style CC stroke-dasharray: 5 5
    style LC stroke-dasharray: 5 5
 ```
 > Every node can reach every other node directly. The mesh is fully connected.
@ -57,7 +59,7 @@ The London setup is in a rack cabinet in the bedroom (great white noise machine,
 - **Router:** Ubiquiti Dream Machine Special Edition — overkill for a home setup but gives excellent routing performance vs an ISP router
 - **ISP:** BT, 1 Gbit down / 300 Mbit up, ~£90/month
 - **Cabling:** Cat 5 in the walls, patch panel in the utility closet, connected to a Ubiquiti switch
- **Servers:** london-a and london-b connected via Ethernet to the switch
+- **Servers:** london-a, london-b, and london-c all wired into the Ubiquiti switch (london-c is a Raspberry Pi running over Ethernet)
 ### Copenhagen
@ -65,22 +67,23 @@ A stack of servers at my dad's place — acts as an off-site location.
 - **Router:** ISP-provided (not my house, can't exactly install a Ubiquiti rack)
 - **ISP:** Symmetrical 500 Mbit — plenty for what's running there
- **Servers:** copenhagen-a and copenhagen-c connected directly to the ISP router's built-in switch
+- **Servers:** copenhagen-a (Lenovo tiny desktop) and copenhagen-c (Raspberry Pi) connected directly to the ISP router's built-in switch
 ### Helsinki / Nuremberg (Hetzner Cloud)
 - Standard Hetzner Cloud VPS networking
- Public IPv4 addresses
+- Public IPv4 addresses, managed via the `terraform/hetzner/` module
- helsinki-a is the only server that receives traffic from the public internet
+- helsinki-a is the only server that receives general HTTP/HTTPS traffic from the public internet
- nuremberg-a receives mail (ports 25, 587, 993)
+- nuremberg-a receives mail (ports 25, 465, 587, 993, 995)
 ## DNS Flow
 All DNS is managed by Cloudflare, provisioned via Terraform.
-### Domain: pez.sh
+### Domains
-The domain is registered on Hover.com with nameservers pointed to Cloudflare.
+- **pez.sh** — primary domain. Registered on Hover.com with nameservers pointed to Cloudflare.
 - **pez.solutions** — alternate domain. Most services that have a `*.pez.sh` host also accept the matching `*.pez.solutions` host, so apps remain reachable if one TLD has trouble.
 ### How a request reaches a service
@ -102,28 +105,33 @@ graph TD
 ### Public Subdomains
-All subdomains are Cloudflare-proxied and terminate at helsinki-a:
+All subdomains are Cloudflare-proxied and terminate at helsinki-a. Hosts marked with both `pez.sh` and `pez.solutions` are reachable on either TLD.
 | Subdomain | Backend | Auth |
 |---|---|---|
-| auth.pez.sh | helsinki-a:9091 | — |
+| auth.pez.sh / auth.pez.solutions | helsinki-a:9091 (Authelia) | — |
-| bitwarden.pez.sh | helsinki-a:8443 | — |
+| bitwarden.pez.sh | helsinki-a:8443 (Vaultwarden) | Own auth |
-| status.pez.sh | helsinki-a:/srv/status | — |
+| git.pez.sh | helsinki-a:3000 (Forgejo) | Own auth |
-| apps.pez.sh | helsinki-a:/srv/apps | Authelia |
+| ldap.pez.sh | helsinki-a:17170 (LLDAP web UI) | LLDAP login |
-| grafana.pez.sh | london-a:3000 | Authelia |
+| status.pez.sh | helsinki-a:/srv/status (static) | — |
-| prometheus.pez.sh | london-a:9090 | Authelia |
+| apps.pez.sh / apps.pez.solutions | helsinki-a:/srv/apps (static dashboard) | Authelia |
-| jellyfin.pez.sh | london-b:8096 | — |
+| pez.sh | helsinki-a:/srv/pez.sh (static) | — |
-| plex.pez.sh | london-b:32400 | — |
+| pez.solutions | helsinki-a:/srv/pez.solutions (static) | — |
-| request.pez.sh | london-b:5055 | — |
+| signup.pez.solutions | helsinki-a:/srv/pez-signup (static) | — |
-| cloud.pez.sh | london-b:11000 | — |
+| london-a.pez.sh | london-a:8006 (Proxmox UI) | Proxmox login |
-| music.pez.sh | london-b:4533 | — |
+| jellyfin.pez.sh / .solutions | london-b:8096 | Own auth |
-| radarr.pez.sh | london-b:7878 | Authelia |
+| plex.pez.sh / .solutions | london-b:32400 | Own auth |
-| sonarr.pez.sh | london-b:8989 | Authelia |
+| music.pez.sh | london-b:4533 (Navidrome) | Own auth |
-| lidarr.pez.sh | london-b:8686 | Authelia |
+| rss.pez.sh | london-b:8181 (Miniflux) | Authelia |
-| readarr.pez.sh | london-b:8787 | Authelia |
+| request.pez.sh / .solutions | london-b:5055 (Jellyseerr) | Own auth |
-| prowlarr.pez.sh | london-b:9696 | Authelia |
+| jellyfin-requests.pez.sh / .solutions | london-b:5056 (Overseerr) | Own auth |
-| soulseek.pez.sh | london-b:5030 | Authelia |
+| radarr.pez.sh / .solutions | london-b:7878 | Authelia |
-| download.pez.sh | london-b:9091 | Authelia |
+| sonarr.pez.sh / .solutions | london-b:8989 | Authelia |
 | lidarr.pez.sh / .solutions | london-b:8686 | Authelia |
 | readarr.pez.sh / .solutions | london-b:8787 | Authelia |
 | prowlarr.pez.sh / .solutions | london-b:9696 | Authelia |
 | soulseek.pez.sh / .solutions | london-b:5030 (slskd) | Authelia |
 | download.pez.sh / .solutions | london-b:9091 (Transmission) | Authelia |
 ### Mail DNS
@ -140,13 +148,13 @@ Caddy handles TLS termination for the Cloudflare-to-origin connection. Certifica
 Example Caddyfile block for a protected service:
-```
+```caddyfile
 radarr.pez.sh {
-    forward_auth helsinki-a:9091 {
+    forward_auth localhost:9091 {
-        uri /api/verify?rd=https://auth.pez.sh
+        uri /api/authz/forward-auth
        copy_headers Remote-User Remote-Groups Remote-Name Remote-Email
    }
-    reverse_proxy london-b:7878
+    reverse_proxy 100.84.65.101:7878
 }
 ```
--- a/docs/services.md
+++ b/docs/services.md
@ -2,20 +2,25 @@
 Complete map of every service in the fleet — what it does, where it runs, how it's deployed, and whether it's behind auth.
-## helsinki-a — Gateway & Auth
+## helsinki-a — Gateway, Auth, Git
 | Service | Port | Deployment | Auth | URL |
 |---------|------|-----------|------|-----|
-| Caddy | 80, 443 | Native (apt) | — | (reverse proxy, no direct URL) |
+| Caddy | 80, 443 | Native (apt + systemd) | — | (reverse proxy, no direct URL) |
 | Authelia | 9091 | Docker | — | auth.pez.sh |
-| Bitwarden (Vaultwarden) | 8443 | Docker | Own auth | bitwarden.pez.sh |
+| Authelia MariaDB | 3306 (internal) | Docker | — | (Authelia session/state) |
-| LLDAP | 3890/17170 | Docker | — | (internal, used by Authelia) |
+| LLDAP | 3890, 17170 | Docker | — | ldap.pez.sh (UI) — used by Authelia |
 | Bitwarden (Vaultwarden) | 8443, 8080 | Docker | Own auth | bitwarden.pez.sh |
 | Bitwarden MariaDB | 3306 (internal) | Docker | — | (Vaultwarden backing DB) |
 | Forgejo | 3000 (HTTP), 2222 (SSH) | Docker | Own auth | git.pez.sh |
-Caddy is the single entry point for all public traffic. Authelia and LLDAP provide SSO. Bitwarden is on helsinki-a for availability — it needs to be reachable even if the London servers are down.
+Caddy is the single entry point for all public traffic and runs as a native apt-managed systemd service so it can bind 80/443 directly. Everything else on this host runs in Docker.
 Authelia provides SSO via Caddy `forward_auth`. LLDAP is Authelia's user backend — it is **not** wired into Forgejo or Bitwarden, both of which keep their own user databases. Bitwarden lives on helsinki-a so password management stays reachable even if the London servers are down. Forgejo hosts internal Git repositories and exposes SSH on port 2222 (the SSH service itself uses `git.pez.sh:2222`).
 ## london-b — Storage & Media
-The workhorse. Threadripper 3970X, 64GB RAM, 64TB ZFS storage. Everything media-related lives here.
+The workhorse. Threadripper 3970X, 64GB RAM. Everything media-related lives here.
 ### Media Servers
@ -31,35 +36,51 @@ I run both Plex and Jellyfin — some clients work better with one than the othe
 | Service | Port | Deployment | Auth | URL |
 |---------|------|-----------|------|-----|
-| Radarr | 7878 | Native (systemd) | Authelia | radarr.pez.sh |
+| Radarr | 7878 | Custom systemd unit (`/opt/Radarr`) | Authelia | radarr.pez.sh |
-| Sonarr | 8989 | Native (apt/systemd) | Authelia | sonarr.pez.sh |
+| Sonarr | 8989 | Native (apt/systemd, mono) | Authelia | sonarr.pez.sh |
-| Lidarr | 8686 | Native (systemd) | Authelia | lidarr.pez.sh |
+| Lidarr | 8686 | Custom systemd unit (`/opt/Lidarr`) | Authelia | lidarr.pez.sh |
-| Readarr | 8787 | Native (systemd) | Authelia | readarr.pez.sh |
+| Readarr | 8787 | Custom systemd unit (`/opt/Readarr`) | Authelia | readarr.pez.sh |
-| Prowlarr | 9696 | Native (systemd) | Authelia | prowlarr.pez.sh |
+| Prowlarr | 9696 | Custom systemd unit (`/opt/Prowlarr`) | Authelia | prowlarr.pez.sh |
 | Whisparr | — | Custom systemd unit (disabled) | — | — |
 | Transmission | 9091 | Native (apt/systemd) | Authelia | download.pez.sh |
 | Jellyseerr | 5055 | Docker | Own auth | request.pez.sh |
 | Overseerr | 5056 | Snap (`overseerr` from `latest/beta`) | Own auth | jellyfin-requests.pez.sh |
-The arr stack pipeline: Jellyseerr accepts requests → Radarr/Sonarr/Lidarr/Readarr search via Prowlarr → sends to Transmission → downloaded content is moved to the library → Plex and Jellyfin pick it up automatically.
+The arr stack pipeline: Jellyseerr/Overseerr accept requests → Radarr/Sonarr/Lidarr/Readarr search via Prowlarr → send to Transmission → downloaded content is moved to the library → Plex and Jellyfin pick it up automatically. Two requesters because Overseerr is hooked into Jellyfin and Jellyseerr into Plex.
 ### Other
 | Service | Port | Deployment | Auth | URL |
 |---------|------|-----------|------|-----|
-| Nextcloud AIO | 11000 | Docker | Own auth | cloud.pez.sh |
+| Nextcloud AIO | 11000 | Docker | Own auth | cloud.pez.sh (internal/Tailscale) |
 | Miniflux | 8181 | Docker (with postgres sidecar) | Authelia | rss.pez.sh |
 | slskd (Soulseek) | 5030 | Docker | Authelia | soulseek.pez.sh |
-| smartctl exporter | 9633 | Docker | — | (scraped by Prometheus) |
+| Syncthing (`syncthing@pez`) | 8384 | Native (apt) | Own auth | (LAN/Tailscale only) |
-| prom-plex-exporter | — | Docker | — | (scraped by Prometheus) |
+| Samba (`smbd`) | 445 | Native (apt) | Local users | (LAN/Tailscale only) |
 | vsftpd | 21 | Native (apt) | Local users | (LAN/Tailscale only) |
 | Ollama | 11434 | Native (`/usr/local/bin`) | — | (Tailscale only) |
 | smartctl_exporter | 9633 | Docker | — | (scraped by Alloy → Grafana Cloud) |
 | prom-plex-exporter | 9594 | Docker | — | (scraped by Alloy → Grafana Cloud) |
-## london-a — Monitoring
+## london-a — Proxmox VE Hypervisor
-Dedicated monitoring host running FreeBSD. Very lightly loaded.
+Repurposed gaming PC (i7-4790K, 32 GB) running Proxmox VE on bare metal. Currently hosts a single Mac VM and is the landing zone for future virtual machines.
 | Service | Port | Deployment | Auth | URL |
 |---------|------|-----------|------|-----|
-| Prometheus | 9090 | Native | Authelia | prometheus.pez.sh |
+| Proxmox VE | 8006 | Native (Debian Bookworm-based PVE) | Proxmox login | london-a.pez.sh |
 | Grafana | 3000 | Native | Authelia | grafana.pez.sh |
-See [monitoring.md](monitoring.md) for details on scrape targets, dashboards, and exporters.
+The web UI is exposed via Caddy at `london-a.pez.sh` but is also reachable directly over Tailscale at `https://100.122.180.98:8006`. Proxmox storage is augmented with a CIFS share mounted from london-b's `/hdd/pve` for ISO/template/backup storage (configured by the `proxmox_ve` Ansible role).
 ## london-c — Edge Utility (Raspberry Pi)
 Raspberry Pi running Debian 13. Houses helper services that don't need a beefy box.
 | Service | Port | Deployment | Auth | URL |
 |---------|------|-----------|------|-----|
 | octopus_exporter | 9359 | Docker | — | (scraped by Alloy → Grafana Cloud) |
 The `octopus_exporter` pulls electricity consumption data from the Octopus Energy API and exposes it as Prometheus-formatted metrics, which Alloy then ships to Grafana Cloud.
 ## nuremberg-a — Mail
@ -67,43 +88,48 @@ Dedicated mail server on Hetzner Cloud. Isolated to protect IP reputation.
 | Service | Port | Deployment | Auth | URL |
 |---------|------|-----------|------|-----|
-| poste.io | 25, 587, 993, 443 | Docker | Own auth | (webmail via direct access) |
+| poste.io | 25, 80, 110, 143, 443, 465, 587, 993, 995 | Docker | Own auth | (webmail via direct host access) |
 poste.io bundles everything — postfix, dovecot, rspamd, webmail — into a single container. Makes updates straightforward.
 ## copenhagen-a — Gaming
-Game servers. Not publicly exposed via Caddy — accessed directly or over Tailscale.
+Game servers. Not publicly exposed via Caddy — accessed directly over the public IP/Tailscale.
 | Service | Port | Deployment | Auth | URL |
 |---------|------|-----------|------|-----|
-| Minecraft (PaperMC) | 25565 | Docker | — | (direct connection) |
+| Minecraft (`itzg/minecraft-server`) | 25565 | Docker | — | (direct connection) |
 | MaNGOS realmd | 3724 | Native (systemd) | — | (direct connection) |
 | MaNGOS world | 8085 | Native (systemd) | — | (direct connection) |
-| MariaDB | 3306 | Native | — | (local, used by MaNGOS) |
+| MariaDB | 3306 | Native (apt) | — | (local, used by MaNGOS) |
 | smartctl_exporter | 9633 | Docker | — | (scraped by Alloy → Grafana Cloud) |
 MaNGOS Zero is a WoW 1.12 (Vanilla) private server. Runs natively under systemd as the `mangos` user from `/home/mangos/mangos/zero/`. Not containerised — it predates the Docker setup on this host.
-## copenhagen-c — Idle
+## copenhagen-c — Idle (Raspberry Pi)
-No active services. Available for future use.
+Raspberry Pi running Debian 12 at the Copenhagen site. Mostly idle, but runs a cloudflared tunnel for one-off use.
-## Exporters (Monitoring)
+| Service | Port | Deployment | Auth | URL |
 |---------|------|-----------|------|-----|
 | cloudflared | — | Native (systemd) | — | (Cloudflare-managed tunnel) |
-These run on various hosts and are scraped by Prometheus:
+## Observability Agents
-| Exporter | Host | What it monitors |
+Every host runs:
-|----------|------|-----------------|
+
-| node_exporter | All hosts | CPU, memory, disk, network |
+- **Grafana Alloy** (`alloy.service`) — collects metrics/logs/traces and ships them to Grafana Cloud
-| smartctl_exporter | london-b | Disk SMART health data |
+- **node_exporter** (`prometheus-node-exporter.service`) — host metrics (CPU/memory/disk/network)
-| prom-plex-exporter | london-b | Plex activity metrics |
+- **systemd_exporter** (`systemd_exporter.service`) — per-unit systemd metrics
 Plus host-specific exporters (smartctl, plex, octopus) called out above. See [monitoring.md](monitoring.md) for details on what gets shipped and where.
 ## Auth Summary
 Services fall into two categories:
-**Behind Authelia** (SSO via Caddy forward_auth):
+**Behind Authelia** (SSO via Caddy `forward_auth`):
- Grafana, Prometheus, Radarr, Sonarr, Lidarr, Readarr, Prowlarr, Transmission, Soulseek, apps.pez.sh
+- Radarr, Sonarr, Lidarr, Readarr, Prowlarr, Transmission, Soulseek, Miniflux, apps.pez.sh
 **Own auth** (handle login themselves):
- Bitwarden, Plex, Jellyfin, Nextcloud, Navidrome, Jellyseerr, poste.io
+- Bitwarden, Forgejo, Plex, Jellyfin, Navidrome, Jellyseerr, Overseerr, Proxmox, poste.io