mirror of https://github.com/RWejlgaard/pez-infra.git synced 2026-07-04 23:56:16 +00:00

History

Rasmus Wejlgaard 8822078998 remove alertmanager caddyfile entry and clean up references alerting is handled by grafana, not alertmanager. removed the stale reverse proxy block from caddyfile template and updated caddy + prometheus docs to reflect grafana-only alerting.		2026-04-03 01:48:17 +00:00
..
rules	Add ZFS management role: scrub scheduling and pool monitoring (#18 )	2026-03-29 19:12:42 +01:00
prometheus.yml	Remove unused Prometheus alerting config (#10 )	2026-03-29 10:37:25 +01:00
README.md	remove alertmanager caddyfile entry and clean up references	2026-04-03 01:48:17 +00:00

README.md

Prometheus

Runs on london-a (FreeBSD, 100.122.219.41).

Service Details

Binary: /usr/local/bin/prometheus
Config: /usr/local/etc/prometheus.yml
Data: /var/db/prometheus
Web UI: http://london-a:9090
Runs as: prometheus user via daemon(8)

Scrape Targets

Job	Target	Host	Port	What it scrapes
`prometheus`	localhost:9090	london-a	9090	Prometheus self-metrics
`node_exporter`	192.168.1.254:9100	london-a	9100	OS metrics (FreeBSD)
`node_exporter`	192.168.1.253:9100	london-b	9100	OS metrics (Linux)
`node_exporter`	100.89.206.60:9100	copenhagen-a	9100	OS metrics (Linux)
`node_exporter`	100.115.45.53:9100	copenhagen-c	9100	OS metrics (Linux)
`node_exporter`	100.117.235.28:9100	nuremberg-a	9100	OS metrics (Alpine)
`node_exporter`	100.67.6.27:9100	helsinki-a	9100	OS metrics (Linux)
`smartmontools`	192.168.1.253:9633	london-b	9633	SMART disk health (smartctl_exporter)
`plex`	192.168.1.253:9000	london-b	9000	Plex media server metrics
`caddy`	100.67.6.27:2019	helsinki-a	2019	Caddy admin API / metrics

Network Notes

London hosts (london-a, london-b) use LAN IPs (192.168.1.x) since Prometheus runs locally in the London rack
Remote hosts (copenhagen, nuremberg, helsinki) use Tailscale IPs (100.x.x.x)

Alerting Rules

`rules/node-exporter.rules`

Sourced from pez-ansible. Currently all rules are commented out — only a placeholder ServerRunningBtrfs alert exists (disabled). No active alerting rules loaded by Prometheus. Alerting is handled exclusively by Grafana (not Alertmanager).

What's Not Configured

Rule files — referenced lines in prometheus.yml are commented out (rules exist in rules/ but aren't loaded)
Recording rules — none

Deployment

Config is managed manually on london-a. To deploy changes:

# Copy config to london-a
scp prometheus.yml root@100.122.219.41:/usr/local/etc/prometheus.yml

# Reload (graceful, no restart needed)
ssh root@100.122.219.41 "kill -HUP $(pgrep prometheus)"