pez-infra/ansible/roles/common/handlers/main.yml
Rasmus Wejlgaard 3702584856 Make Alloy resilient to transient failures; remove leftover Grafana (PESO-149)
copenhagen-c stopped reporting to Grafana Cloud on 2026-05-20: a transient
TLS failure to fleet-management tripped systemd's default start rate-limit,
systemd gave up, and the host sat silently unmonitored for ~2.5 weeks.

Add a 10-resilience.conf systemd drop-in for alloy.service on every host
(StartLimitIntervalSec=0, Restart=always, RestartSec=30) so a momentary
upstream/TLS blip can no longer permanently kill the collector.

Also drop the old self-hosted Grafana package that was left enabled and
failing on copenhagen-c after the move to Grafana Cloud.
2026-06-07 14:06:03 +01:00

19 lines
346 B
YAML

---
- name: Restart sshd
ansible.builtin.service:
name: sshd
state: restarted
- name: Reload ufw
community.general.ufw:
state: reloaded
- name: Reload systemd daemon
ansible.builtin.systemd:
daemon_reload: true
- name: Restart alloy
ansible.builtin.systemd:
name: alloy
state: restarted
daemon_reload: true