mirror of https://github.com/RWejlgaard/pez-infra.git synced 2026-07-04 15:46:16 +00:00

History

Rasmus "Pez" Wejlgaard 69918c8619 Add ZFS management role: scrub scheduling and pool monitoring (#18 ) - New zfs role with cron-based scrub scheduling for Linux and FreeBSD - Weekly Sunday scrubs at noon (matching existing manual crons) - Add zfs_hosts inventory group with london-a and london-b - Configure zfs_pools per host: zroot (london-a), hdd (london-b) - Add Prometheus alert rules for degraded/faulted/offline pools - Add zfs.yml playbook for targeted deploys Captures the previously untracked scrub cron on london-a and re-enables the commented-out scrub on london-b. Refs: PESO-93		2026-03-29 19:12:42 +01:00
..
rules	Add ZFS management role: scrub scheduling and pool monitoring (#18 )	2026-03-29 19:12:42 +01:00
prometheus.yml	Remove unused Prometheus alerting config (#10 )	2026-03-29 10:37:25 +01:00
README.md	initial commit	2026-03-28 12:39:41 +00:00

README.md

Prometheus

Runs on london-a (FreeBSD, 100.122.219.41).

Service Details

Binary: /usr/local/bin/prometheus
Config: /usr/local/etc/prometheus.yml
Data: /var/db/prometheus
Web UI: http://london-a:9090
Runs as: prometheus user via daemon(8)

Scrape Targets

Job	Target	Host	Port	What it scrapes
`prometheus`	localhost:9090	london-a	9090	Prometheus self-metrics
`node_exporter`	192.168.1.254:9100	london-a	9100	OS metrics (FreeBSD)
`node_exporter`	192.168.1.253:9100	london-b	9100	OS metrics (Linux)
`node_exporter`	100.89.206.60:9100	copenhagen-a	9100	OS metrics (Linux)
`node_exporter`	100.115.45.53:9100	copenhagen-c	9100	OS metrics (Linux)
`node_exporter`	100.117.235.28:9100	nuremberg-a	9100	OS metrics (Alpine)
`node_exporter`	100.67.6.27:9100	helsinki-a	9100	OS metrics (Linux)
`smartmontools`	192.168.1.253:9633	london-b	9633	SMART disk health (smartctl_exporter)
`plex`	192.168.1.253:9000	london-b	9000	Plex media server metrics
`caddy`	100.67.6.27:2019	helsinki-a	2019	Caddy admin API / metrics

Network Notes

London hosts (london-a, london-b) use LAN IPs (192.168.1.x) since Prometheus runs locally in the London rack
Remote hosts (copenhagen, nuremberg, helsinki) use Tailscale IPs (100.x.x.x)

Alerting Rules

`rules/node-exporter.rules`

Sourced from pez-ansible. Currently all rules are commented out — only a placeholder ServerRunningBtrfs alert exists (disabled). No active alerting rules or Alertmanager configured.

What's Not Configured

Alertmanager — target is commented out; no alerting pipeline active
Rule files — referenced lines in prometheus.yml are commented out (rules exist in rules/ but aren't loaded)
Recording rules — none

Deployment

Config is managed manually on london-a. To deploy changes:

# Copy config to london-a
scp prometheus.yml root@100.122.219.41:/usr/local/etc/prometheus.yml

# Reload (graceful, no restart needed)
ssh root@100.122.219.41 "kill -HUP $(pgrep prometheus)"