pez-infra/ansible/roles
Rasmus Wejlgaard 3702584856 Make Alloy resilient to transient failures; remove leftover Grafana (PESO-149)
copenhagen-c stopped reporting to Grafana Cloud on 2026-05-20: a transient
TLS failure to fleet-management tripped systemd's default start rate-limit,
systemd gave up, and the host sat silently unmonitored for ~2.5 weeks.

Add a 10-resilience.conf systemd drop-in for alloy.service on every host
(StartLimitIntervalSec=0, Restart=always, RestartSec=30) so a momentary
upstream/TLS blip can no longer permanently kill the collector.

Also drop the old self-hosted Grafana package that was left enabled and
failing on copenhagen-c after the move to Grafana Cloud.
2026-06-07 14:06:03 +01:00
..
backup/tasks Add backup role to deploy hdd-backup.sh and cron to london-b (#16) 2026-03-29 15:09:01 +01:00
caddy bug: add retry to restarting caddy (#97) 2026-05-05 20:42:52 +01:00
common Make Alloy resilient to transient failures; remove leftover Grafana (PESO-149) 2026-06-07 14:06:03 +01:00
docker fix: cleanup freebsd and alpine stuff (#105) 2026-05-12 22:43:12 +01:00
docker_services/tasks fix: stop masking failed service deploys; trim dead config (#119) 2026-06-04 18:41:24 +01:00
dotfiles/tasks fix remaining yaml lint nitpicks 2026-03-28 13:13:37 +00:00
mariadb fix: bind mariadb to local ip (#62) 2026-04-11 21:24:11 +01:00
media_stack chore: retire readarr service, replaced by bookshelf (#123) 2026-06-06 15:50:37 +01:00
proxmox_ve fix: add smb mount (#107) 2026-05-14 20:49:25 +01:00
status_page capture helsinki-a status page cron in repo (#17) 2026-03-29 15:39:35 +01:00
systemd_services fix: stop masking failed service deploys; trim dead config (#119) 2026-06-04 18:41:24 +01:00
zfs fix: cleanup freebsd and alpine stuff (#105) 2026-05-12 22:43:12 +01:00