RWejlgaard/pez-infra

mirror of https://github.com/RWejlgaard/pez-infra.git synced 2026-07-04 15:46:16 +00:00

Author	SHA1	Message	Date
Rasmus "Pez" Wejlgaard	9ac179dbec	Make Alloy resilient to transient failures; remove leftover Grafana (PESO-149) (#126 ) copenhagen-c stopped reporting to Grafana Cloud on 2026-05-20: a transient TLS failure to fleet-management tripped systemd's default start rate-limit, systemd gave up, and the host sat silently unmonitored for ~2.5 weeks. Add a 10-resilience.conf systemd drop-in for alloy.service on every host (StartLimitIntervalSec=0, Restart=always, RestartSec=30) so a momentary upstream/TLS blip can no longer permanently kill the collector. Also drop the old self-hosted Grafana package that was left enabled and failing on copenhagen-c after the move to Grafana Cloud.	2026-06-07 14:30:08 +01:00
Rasmus "Pez" Wejlgaard	81efa1b717	Remove stale cloudflared service from copenhagen-a (PESO-138) (#125 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details cloudflared was retired in #56 when Caddy + Authelia replaced Cloudflare Tunnels, but copenhagen-a was unreachable at the time so its cloudflared.service was never stopped and is still running. Add a cleanup task to the common role that stops, disables and purges cloudflared wherever the unit lingers. Gated on the unit file existing so it self-targets copenhagen-a and is a no-op everywhere else, and explicitly excludes copenhagen-c, which legitimately runs a hand-configured tunnel.	2026-06-07 11:45:35 +01:00
Rasmus "Pez" Wejlgaard	3871dc8f90	Restrict london-b Samba (445) to LAN + Tailscale, off public internet (#124 ) Samba on london-b was allowed on 445/tcp from anywhere via UFW, exposing SMB/CIFS to the public internet. Tailscale already reaches it through the tailscale0 allow-all rule, so scope the explicit rule to the local London LAN (192.168.1.0/24) instead of the world. The common UFW task only ever adds allow rules, so it gained support for an optional per-port from_ip, plus a follow-up task that deletes the superseded world-open variant of any source-restricted port — otherwise the old '445 ALLOW Anywhere' rule would linger on the host and defeat the change. PESO-145	2026-06-07 11:37:45 +01:00
Rasmus "Pez" Wejlgaard	d3b516c594	fix: cleanup freebsd and alpine stuff (#105 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details	2026-05-12 22:43:12 +01:00
Rasmus "Pez" Wejlgaard	f9d0a7ebf4	fix: resolve UFW ansible-lint failures and deploy error (#11 ) - Fix 'interface_or_direction' → 'direction' (required param for ufw module) - Rename ufw_enabled/ufw_allowed_ports → common_ufw_enabled/common_ufw_allowed_ports (role prefix convention) - Fix yaml[braces] violations in helsinki-a host_vars	2026-03-29 10:53:54 +01:00
Rasmus "Pez" Wejlgaard	4554dec7d2	Remove unused Prometheus alerting config (#10 ) * Configure UFW firewall rules in common Ansible role Add UFW configuration to the common role for Debian hosts: - Default deny incoming, allow outgoing - Allow all traffic on tailscale0 interface (mesh comms) - Allow SSH port 22 as safety net - Per-host allowed ports via ufw_allowed_ports variable - Enable UFW after rules are applied helsinki-a gets ports 80/443 for reverse proxy traffic. Other Debian hosts only need Tailscale + SSH. Closes PESO-79 * Remove unused alerting and rule_files from prometheus.yml Alerting is handled by Grafana, not Prometheus Alertmanager. The empty alertmanagers and rule_files sections were just noise. Resolves PESO-74	2026-03-29 10:37:25 +01:00
Rasmus Wejlgaard	dc10ceacf5	fix remaining yaml lint nitpicks - add missing document start (---) to contact-points.yml and docker-compose files - fix extra spaces inside braces in dotfiles and common role tasks	2026-03-28 13:13:37 +00:00
Rasmus Wejlgaard	737d6e0bc1	initial commit	2026-03-28 12:39:41 +00:00

8 commits