pez-infra/docs
Rasmus "Pez" Wejlgaard 9ac179dbec
Make Alloy resilient to transient failures; remove leftover Grafana (PESO-149) (#126)
copenhagen-c stopped reporting to Grafana Cloud on 2026-05-20: a transient
TLS failure to fleet-management tripped systemd's default start rate-limit,
systemd gave up, and the host sat silently unmonitored for ~2.5 weeks.

Add a 10-resilience.conf systemd drop-in for alloy.service on every host
(StartLimitIntervalSec=0, Restart=always, RestartSec=30) so a momentary
upstream/TLS blip can no longer permanently kill the collector.

Also drop the old self-hosted Grafana package that was left enabled and
failing on copenhagen-c after the move to Grafana Cloud.
2026-06-07 14:30:08 +01:00
..
hosts chore: retire readarr service, replaced by bookshelf (#123) 2026-06-06 15:50:37 +01:00
architecture.md fix: Documentation overhaul (#112) 2026-05-19 18:49:21 +01:00
getting-started.md fix: Documentation overhaul (#112) 2026-05-19 18:49:21 +01:00
monitoring.md Make Alloy resilient to transient failures; remove leftover Grafana (PESO-149) (#126) 2026-06-07 14:30:08 +01:00
networking.md fix: Documentation overhaul (#112) 2026-05-19 18:49:21 +01:00
README.md fix: Documentation overhaul (#112) 2026-05-19 18:49:21 +01:00
secrets.md initial commit 2026-03-28 12:39:41 +00:00
services.md chore: retire readarr service, replaced by bookshelf (#123) 2026-06-06 15:50:37 +01:00

Documentation

Everything you need to understand how this infrastructure works.

Contents

  • Architecture — High-level overview, network topology, traffic flow diagrams
  • Networking — Tailscale mesh, physical networking, DNS and proxy flow
  • Services — Complete service map: what runs where, ports, auth
  • Monitoring — Grafana Cloud, Alloy, synthetic checks, alerting via PagerDuty
  • Secrets — SOPS + age encryption: setup, usage, CI integration
  • Getting Started — How to work with this repo, deploy changes, add services
  • Hosts — Per-host detail (hardware, services, quirks)

Quick Reference

Host Tailscale IP Location Role
helsinki-a 100.67.6.27 Hetzner Cloud (Helsinki) Reverse proxy, SSO, Bitwarden, Forgejo
london-a 100.122.180.98 London Proxmox VE hypervisor
london-b 100.84.65.101 London Storage, media, Docker services
london-c 100.123.72.87 London Raspberry Pi, Octopus Energy exporter
nuremberg-a 100.70.180.24 Hetzner Cloud (Nuremberg) Mail (poste.io)
copenhagen-a 100.89.206.60 Copenhagen Minecraft, WoW/MaNGOS
copenhagen-c 100.115.45.53 Copenhagen Raspberry Pi, cloudflared, idle