Commit graph

4 commits

Author SHA1 Message Date
3702584856 Make Alloy resilient to transient failures; remove leftover Grafana (PESO-149)
copenhagen-c stopped reporting to Grafana Cloud on 2026-05-20: a transient
TLS failure to fleet-management tripped systemd's default start rate-limit,
systemd gave up, and the host sat silently unmonitored for ~2.5 weeks.

Add a 10-resilience.conf systemd drop-in for alloy.service on every host
(StartLimitIntervalSec=0, Restart=always, RestartSec=30) so a momentary
upstream/TLS blip can no longer permanently kill the collector.

Also drop the old self-hosted Grafana package that was left enabled and
failing on copenhagen-c after the move to Grafana Cloud.
2026-06-07 14:06:03 +01:00
a031d4218b
fix: Documentation overhaul (#112)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
* fix: Documentation overhaul

* removing joke graph
2026-05-19 18:49:21 +01:00
029c35fba6
Replace ASCII diagrams with mermaid in docs (#47)
Convert remaining ASCII art diagrams to mermaid syntax:
- monitoring.md: stack overview diagram
- networking.md: Tailscale mesh diagram + DNS request flow

architecture.md already used mermaid, no changes needed.

PESO-123
2026-04-03 10:48:41 +01:00
Rasmus Wejlgaard
737d6e0bc1 initial commit 2026-03-28 12:39:41 +00:00