Commit graph

10 commits

Author SHA1 Message Date
99c2091b96
Add smartctl-exporter to copenhagen-a and Prometheus scrape (#55)
- Add smartctl-exporter to copenhagen-a docker_services
- Add copenhagen-a as a Prometheus smartmontools scrape target
- Update compose file comment to reflect multi-host usage

Closes PESO-128
2026-04-03 21:20:20 +01:00
a31f8b5651
Add systemd_exporter Ansible role and Prometheus scrape config (#49)
* Add systemd_exporter Ansible role and Prometheus scrape config

- Create systemd_exporter role (download binary, create user, deploy service)
- Add scrape job for london-b:9558 and copenhagen-a:9558
- Add systemd_exporter_hosts inventory group
- Add stage 3b to deploy.yml
- Map role to deploy-on-merge scope

Closes PESO-120

* Fix line length lint violations in systemd_exporter tasks

* Fix var-naming lint: use systemd_exporter_ prefix for role variables
2026-04-03 12:23:38 +01:00
2d7723d145
Add rule_files to prometheus.yml, remove empty node-exporter.rules (#46)
prometheus.yml was missing the rule_files section, so alerting rules
deployed to /usr/local/etc/prometheus/rules/ were never loaded.

- Add rule_files glob so Prometheus evaluates the ZFS pool rules
- Document that alerting notifications go through Grafana, not
  Alertmanager — no alerting: section needed
- Remove node-exporter.rules (all rules were commented out)

Resolves PESO-103
2026-04-03 04:49:16 +01:00
f75e2a8d5f
remove alertmanager caddyfile entry and clean up references (#42)
alerting is handled by grafana, not alertmanager. removed the
stale reverse proxy block from caddyfile template and updated
caddy + prometheus docs to reflect grafana-only alerting.
2026-04-03 02:49:37 +01:00
69918c8619
Add ZFS management role: scrub scheduling and pool monitoring (#18)
- New zfs role with cron-based scrub scheduling for Linux and FreeBSD
- Weekly Sunday scrubs at noon (matching existing manual crons)
- Add zfs_hosts inventory group with london-a and london-b
- Configure zfs_pools per host: zroot (london-a), hdd (london-b)
- Add Prometheus alert rules for degraded/faulted/offline pools
- Add zfs.yml playbook for targeted deploys

Captures the previously untracked scrub cron on london-a and
re-enables the commented-out scrub on london-b.

Refs: PESO-93
2026-03-29 19:12:42 +01:00
4554dec7d2
Remove unused Prometheus alerting config (#10)
* Configure UFW firewall rules in common Ansible role

Add UFW configuration to the common role for Debian hosts:
- Default deny incoming, allow outgoing
- Allow all traffic on tailscale0 interface (mesh comms)
- Allow SSH port 22 as safety net
- Per-host allowed ports via ufw_allowed_ports variable
- Enable UFW after rules are applied

helsinki-a gets ports 80/443 for reverse proxy traffic.
Other Debian hosts only need Tailscale + SSH.

Closes PESO-79

* Remove unused alerting and rule_files from prometheus.yml

Alerting is handled by Grafana, not Prometheus Alertmanager.
The empty alertmanagers and rule_files sections were just noise.

Resolves PESO-74
2026-03-29 10:37:25 +01:00
03ce524730
Standardise Prometheus targets to Tailscale IPs (#4)
Replace local network IPs (192.168.1.x) with Tailscale IPs for
london-a and london-b in all scrape configs. This ensures consistent
connectivity via Tailscale mesh regardless of network topology changes.

Refs: PESO-80
2026-03-28 20:08:09 +00:00
46063246a2 fix last 3 yaml lint failures
- add missing --- to notification-policy.yml
- prometheus.yml: replace commented-out template defaults with empty lists
2026-03-28 13:17:42 +00:00
dc198eea81 fix more yaml document-start and comment indentation
- add missing --- to 13 more yml files
- fix comment indentation in prometheus.yml
2026-03-28 13:15:46 +00:00
Rasmus Wejlgaard
737d6e0bc1 initial commit 2026-03-28 12:39:41 +00:00