pez-infra/ansible/roles/common/handlers
Rasmus "Pez" Wejlgaard 9ac179dbec
Make Alloy resilient to transient failures; remove leftover Grafana (PESO-149) (#126)
copenhagen-c stopped reporting to Grafana Cloud on 2026-05-20: a transient
TLS failure to fleet-management tripped systemd's default start rate-limit,
systemd gave up, and the host sat silently unmonitored for ~2.5 weeks.

Add a 10-resilience.conf systemd drop-in for alloy.service on every host
(StartLimitIntervalSec=0, Restart=always, RestartSec=30) so a momentary
upstream/TLS blip can no longer permanently kill the collector.

Also drop the old self-hosted Grafana package that was left enabled and
failing on copenhagen-c after the move to Grafana Cloud.
2026-06-07 14:30:08 +01:00
..
main.yml Make Alloy resilient to transient failures; remove leftover Grafana (PESO-149) (#126) 2026-06-07 14:30:08 +01:00