node_exporter was listening on 0.0.0.0:9100 on helsinki-a and london-a,
exposing metrics to the public internet.
Changes:
- Add node_exporter_bind_tailscale flag (default false) to opt in
- Set flag on helsinki-a and london-a host_vars
- Debian: configure ARGS in /etc/default/prometheus-node-exporter
- FreeBSD: use native node_exporter_listen_address rc.conf variable
- Add handlers to restart on config change
Prometheus already scrapes via Tailscale IPs, no scrape config changes needed.
Fixes PESO-98
Audit of london-a rc.conf found several services running but not
captured in host_vars or docs: cloudflared, InfluxDB, Redis,
PostgreSQL, and libvirtd.
- InfluxDB: only _internal db, completely unused
- Redis: empty keyspace, unused
- PostgreSQL: has pez_vps db from a dead project, needs data review
- libvirtd: zero VMs, related to same dead project
- cloudflared: running tunnel 168eccae, config now captured
Also documented the weekly ZFS scrub cron (Sundays at noon) which
is in root's crontab but not ansible-managed.
Ref: PESO-101
- New zfs role with cron-based scrub scheduling for Linux and FreeBSD
- Weekly Sunday scrubs at noon (matching existing manual crons)
- Add zfs_hosts inventory group with london-a and london-b
- Configure zfs_pools per host: zroot (london-a), hdd (london-b)
- Add Prometheus alert rules for degraded/faulted/offline pools
- Add zfs.yml playbook for targeted deploys
Captures the previously untracked scrub cron on london-a and
re-enables the commented-out scrub on london-b.
Refs: PESO-93