mirror of
https://github.com/RWejlgaard/pez-infra.git
synced 2026-07-04 15:46:16 +00:00
copenhagen-c stopped reporting to Grafana Cloud on 2026-05-20: a transient TLS failure to fleet-management tripped systemd's default start rate-limit, systemd gave up, and the host sat silently unmonitored for ~2.5 weeks. Add a 10-resilience.conf systemd drop-in for alloy.service on every host (StartLimitIntervalSec=0, Restart=always, RestartSec=30) so a momentary upstream/TLS blip can no longer permanently kill the collector. Also drop the old self-hosted Grafana package that was left enabled and failing on copenhagen-c after the move to Grafana Cloud. |
||
|---|---|---|
| .. | ||
| hosts | ||
| architecture.md | ||
| getting-started.md | ||
| monitoring.md | ||
| networking.md | ||
| README.md | ||
| secrets.md | ||
| services.md | ||
Documentation
Everything you need to understand how this infrastructure works.
Contents
- Architecture — High-level overview, network topology, traffic flow diagrams
- Networking — Tailscale mesh, physical networking, DNS and proxy flow
- Services — Complete service map: what runs where, ports, auth
- Monitoring — Grafana Cloud, Alloy, synthetic checks, alerting via PagerDuty
- Secrets — SOPS + age encryption: setup, usage, CI integration
- Getting Started — How to work with this repo, deploy changes, add services
- Hosts — Per-host detail (hardware, services, quirks)
Quick Reference
| Host | Tailscale IP | Location | Role |
|---|---|---|---|
| helsinki-a | 100.67.6.27 | Hetzner Cloud (Helsinki) | Reverse proxy, SSO, Bitwarden, Forgejo |
| london-a | 100.122.180.98 | London | Proxmox VE hypervisor |
| london-b | 100.84.65.101 | London | Storage, media, Docker services |
| london-c | 100.123.72.87 | London | Raspberry Pi, Octopus Energy exporter |
| nuremberg-a | 100.70.180.24 | Hetzner Cloud (Nuremberg) | Mail (poste.io) |
| copenhagen-a | 100.89.206.60 | Copenhagen | Minecraft, WoW/MaNGOS |
| copenhagen-c | 100.115.45.53 | Copenhagen | Raspberry Pi, cloudflared, idle |