Mono-repo for my server stack
Find a file
Rasmus "Pez" Wejlgaard 9ac179dbec
Make Alloy resilient to transient failures; remove leftover Grafana (PESO-149) (#126)
copenhagen-c stopped reporting to Grafana Cloud on 2026-05-20: a transient
TLS failure to fleet-management tripped systemd's default start rate-limit,
systemd gave up, and the host sat silently unmonitored for ~2.5 weeks.

Add a 10-resilience.conf systemd drop-in for alloy.service on every host
(StartLimitIntervalSec=0, Restart=always, RestartSec=30) so a momentary
upstream/TLS blip can no longer permanently kill the collector.

Also drop the old self-hosted Grafana package that was left enabled and
failing on copenhagen-c after the move to Grafana Cloud.
2026-06-07 14:30:08 +01:00
.github chore(deps): bump the github-actions group across 1 directory with 2 updates (#117) 2026-06-05 21:13:03 +01:00
ansible Make Alloy resilient to transient failures; remove leftover Grafana (PESO-149) (#126) 2026-06-07 14:30:08 +01:00
docs Make Alloy resilient to transient failures; remove leftover Grafana (PESO-149) (#126) 2026-06-07 14:30:08 +01:00
terraform chore: commit terraform lock file for reproducible provider versions (#121) 2026-06-06 13:19:08 +01:00
.gitignore chore: commit terraform lock file for reproducible provider versions (#121) 2026-06-06 13:19:08 +01:00
.sops.yaml initial commit 2026-03-28 12:39:41 +00:00
Makefile initial commit 2026-03-28 12:39:41 +00:00
README.md fix: Documentation overhaul (#112) 2026-05-19 18:49:21 +01:00

pez-infra

Infrastructure-as-code monorepo for managing my homelab and cloud server fleet. It contains everything needed to rebuild, configure, and maintain the entire infrastructure from scratch — including server provisioning, service deployment, DNS, monitoring, and secrets management.

What's in this repo

  • Ansible — Playbooks, roles, and inventory for configuring servers, deploying Docker-based services, and managing dotfiles
  • Terraform — OpenTofu/Terraform configs for cloud resources (Hetzner Cloud, Cloudflare DNS, Grafana Cloud, PagerDuty)
  • Services — Docker Compose definitions and config files for each self-hosted service
  • Documentation — Architecture decisions, networking topology, and operational guides

Architecture Overview

graph TD
    CF[Cloudflare<br/>DNS + CDN] --> HEL[helsinki-a<br/>Caddy proxy + SSO<br/><i>Hetzner Cloud</i>]
    HEL --> TS{Tailscale mesh}
    TS --> LB[london-b<br/>Storage, media<br/>Docker + systemd]
    TS --> LA[london-a<br/>Proxmox VE hypervisor]
    TS --> LC[london-c<br/>Raspberry Pi<br/>Octopus Energy exporter]
    TS --> CA[copenhagen-a<br/>Gaming<br/>Minecraft, WoW MaNGOS]
    TS --> NUR[nuremberg-a<br/>Mail, poste.io]
    TS --> CC[copenhagen-c<br/>Raspberry Pi<br/>cloudflared, idle]
    TS -.-> GC[Grafana Cloud<br/>metrics, logs, traces]

Traffic enters via Cloudflare DNS, hits a Caddy reverse proxy on a Hetzner cloud instance, and is forwarded to backend services running on various hosts connected over a Tailscale mesh network. Authentication for protected services is handled by Authelia with an LLDAP backend. Observability is shipped from every host to Grafana Cloud via Grafana Alloy.

Hosts

Host Location OS Role
helsinki-a Hetzner Cloud (Helsinki) Debian 13 Reverse proxy (Caddy), SSO (Authelia + LLDAP), Bitwarden, Forgejo
london-b London Ubuntu 24.04 Primary storage (ZFS), media servers, *arr stack
london-a London Debian 13 / Proxmox VE Hypervisor (currently runs a Mac VM; platform for future VMs)
london-c London Debian 13 (Raspberry Pi) Octopus Energy exporter, edge utility box
nuremberg-a Hetzner Cloud (Nuremberg) Debian 13 Mail server (poste.io)
copenhagen-a Copenhagen Ubuntu 22.04 Gaming servers (Minecraft, WoW/MaNGOS)
copenhagen-c Copenhagen Debian 12 (Raspberry Pi) cloudflared tunnel, idle/available

Directory Structure

├── ansible/        # Ansible playbooks, roles, inventory, and all managed files
│   ├── roles/      # Ansible roles (caddy, docker, media_stack, proxmox_ve, etc.)
│   ├── services/   # Docker Compose definitions and service configs
│   ├── dotfiles/   # Shell config (fish, nvim, tmux, git, etc.)
│   ├── playbooks/  # One-off playbooks (updates, reboots, status)
│   └── scripts/    # Utility and maintenance scripts
├── terraform/      # Terraform/OpenTofu for Hetzner, Cloudflare, Grafana Cloud, PagerDuty
└── docs/           # Architecture, networking, services, monitoring, and per-host docs

Getting Started

Prerequisites

  • SSH access to hosts via Tailscale (all hosts SSH as root)
  • ansible for configuration management
  • tofu (OpenTofu) or terraform for infrastructure provisioning
  • sops + age for editing encrypted secrets

Usage

  1. Clone: git clone git@github.com:RWejlgaard/pez-infra.git
  2. Services: Each service has its own directory under ansible/services/ with a docker-compose.yml and config files
  3. Deploy: cd ansible && make deploy runs the unified deploy.yml against the whole fleet (or make deploy-host HOST=<name>)
  4. Infrastructure: Terraform configs in terraform/ manage Hetzner servers, Cloudflare DNS, Grafana Cloud, and PagerDuty

Secrets

Secrets are encrypted in-repo using SOPS + age. Encrypted files use .enc. in their extension (e.g. secrets.enc.yml). See Secrets Management for full setup and usage instructions.

Documentation

Detailed documentation lives in docs/:

  • Architecture — Network topology, traffic flow, design principles
  • Networking — Tailscale mesh, DNS flow, physical networking
  • Services — Complete service map with ports, auth, and deployment info
  • Monitoring — Grafana Cloud, Alloy, synthetic checks, PagerDuty
  • Hosts — Per-host detail (hardware, services, quirks)
  • Getting Started — How to work with this repo