RWejlgaard/pez-infra

mirror of https://github.com/RWejlgaard/pez-infra.git synced 2026-07-04 23:56:16 +00:00

Author	SHA1	Message	Date
Rasmus "Pez" Wejlgaard	99c2091b96	Add smartctl-exporter to copenhagen-a and Prometheus scrape (#55 ) - Add smartctl-exporter to copenhagen-a docker_services - Add copenhagen-a as a Prometheus smartmontools scrape target - Update compose file comment to reflect multi-host usage Closes PESO-128	2026-04-03 21:20:20 +01:00
Rasmus "Pez" Wejlgaard	a31f8b5651	Add systemd_exporter Ansible role and Prometheus scrape config (#49 ) * Add systemd_exporter Ansible role and Prometheus scrape config - Create systemd_exporter role (download binary, create user, deploy service) - Add scrape job for london-b:9558 and copenhagen-a:9558 - Add systemd_exporter_hosts inventory group - Add stage 3b to deploy.yml - Map role to deploy-on-merge scope Closes PESO-120 * Fix line length lint violations in systemd_exporter tasks * Fix var-naming lint: use systemd_exporter_ prefix for role variables	2026-04-03 12:23:38 +01:00
Rasmus "Pez" Wejlgaard	2d7723d145	Add rule_files to prometheus.yml, remove empty node-exporter.rules (#46 ) prometheus.yml was missing the rule_files section, so alerting rules deployed to /usr/local/etc/prometheus/rules/ were never loaded. - Add rule_files glob so Prometheus evaluates the ZFS pool rules - Document that alerting notifications go through Grafana, not Alertmanager — no alerting: section needed - Remove node-exporter.rules (all rules were commented out) Resolves PESO-103	2026-04-03 04:49:16 +01:00
Rasmus "Pez" Wejlgaard	f75e2a8d5f	remove alertmanager caddyfile entry and clean up references (#42 ) alerting is handled by grafana, not alertmanager. removed the stale reverse proxy block from caddyfile template and updated caddy + prometheus docs to reflect grafana-only alerting.	2026-04-03 02:49:37 +01:00
Rasmus "Pez" Wejlgaard	69918c8619	Add ZFS management role: scrub scheduling and pool monitoring (#18 ) - New zfs role with cron-based scrub scheduling for Linux and FreeBSD - Weekly Sunday scrubs at noon (matching existing manual crons) - Add zfs_hosts inventory group with london-a and london-b - Configure zfs_pools per host: zroot (london-a), hdd (london-b) - Add Prometheus alert rules for degraded/faulted/offline pools - Add zfs.yml playbook for targeted deploys Captures the previously untracked scrub cron on london-a and re-enables the commented-out scrub on london-b. Refs: PESO-93	2026-03-29 19:12:42 +01:00
Rasmus "Pez" Wejlgaard	4554dec7d2	Remove unused Prometheus alerting config (#10 ) * Configure UFW firewall rules in common Ansible role Add UFW configuration to the common role for Debian hosts: - Default deny incoming, allow outgoing - Allow all traffic on tailscale0 interface (mesh comms) - Allow SSH port 22 as safety net - Per-host allowed ports via ufw_allowed_ports variable - Enable UFW after rules are applied helsinki-a gets ports 80/443 for reverse proxy traffic. Other Debian hosts only need Tailscale + SSH. Closes PESO-79 * Remove unused alerting and rule_files from prometheus.yml Alerting is handled by Grafana, not Prometheus Alertmanager. The empty alertmanagers and rule_files sections were just noise. Resolves PESO-74	2026-03-29 10:37:25 +01:00
Rasmus "Pez" Wejlgaard	03ce524730	Standardise Prometheus targets to Tailscale IPs (#4 ) Replace local network IPs (192.168.1.x) with Tailscale IPs for london-a and london-b in all scrape configs. This ensures consistent connectivity via Tailscale mesh regardless of network topology changes. Refs: PESO-80	2026-03-28 20:08:09 +00:00
Rasmus Wejlgaard	46063246a2	fix last 3 yaml lint failures - add missing --- to notification-policy.yml - prometheus.yml: replace commented-out template defaults with empty lists	2026-03-28 13:17:42 +00:00
Rasmus Wejlgaard	dc198eea81	fix more yaml document-start and comment indentation - add missing --- to 13 more yml files - fix comment indentation in prometheus.yml	2026-03-28 13:15:46 +00:00
Rasmus Wejlgaard	737d6e0bc1	initial commit	2026-03-28 12:39:41 +00:00

10 commits