pez-infra/ansible/services/prometheus/README.md
Rasmus "Pez" Wejlgaard f75e2a8d5f
remove alertmanager caddyfile entry and clean up references (#42)
alerting is handled by grafana, not alertmanager. removed the
stale reverse proxy block from caddyfile template and updated
caddy + prometheus docs to reflect grafana-only alerting.
2026-04-03 02:49:37 +01:00

54 lines
2.2 KiB
Markdown

# Prometheus
Runs on **london-a** (FreeBSD, 100.122.219.41).
## Service Details
- **Binary:** `/usr/local/bin/prometheus`
- **Config:** `/usr/local/etc/prometheus.yml`
- **Data:** `/var/db/prometheus`
- **Web UI:** `http://london-a:9090`
- **Runs as:** `prometheus` user via daemon(8)
## Scrape Targets
| Job | Target | Host | Port | What it scrapes |
|-----|--------|------|------|-----------------|
| `prometheus` | localhost:9090 | london-a | 9090 | Prometheus self-metrics |
| `node_exporter` | 192.168.1.254:9100 | london-a | 9100 | OS metrics (FreeBSD) |
| `node_exporter` | 192.168.1.253:9100 | london-b | 9100 | OS metrics (Linux) |
| `node_exporter` | 100.89.206.60:9100 | copenhagen-a | 9100 | OS metrics (Linux) |
| `node_exporter` | 100.115.45.53:9100 | copenhagen-c | 9100 | OS metrics (Linux) |
| `node_exporter` | 100.117.235.28:9100 | nuremberg-a | 9100 | OS metrics (Alpine) |
| `node_exporter` | 100.67.6.27:9100 | helsinki-a | 9100 | OS metrics (Linux) |
| `smartmontools` | 192.168.1.253:9633 | london-b | 9633 | SMART disk health (smartctl_exporter) |
| `plex` | 192.168.1.253:9000 | london-b | 9000 | Plex media server metrics |
| `caddy` | 100.67.6.27:2019 | helsinki-a | 2019 | Caddy admin API / metrics |
### Network Notes
- London hosts (london-a, london-b) use **LAN IPs** (192.168.1.x) since Prometheus runs locally in the London rack
- Remote hosts (copenhagen, nuremberg, helsinki) use **Tailscale IPs** (100.x.x.x)
## Alerting Rules
### `rules/node-exporter.rules`
Sourced from pez-ansible. Currently all rules are **commented out** — only a placeholder `ServerRunningBtrfs` alert exists (disabled). No active alerting rules loaded by Prometheus. Alerting is handled exclusively by **Grafana** (not Alertmanager).
## What's Not Configured
- **Rule files** — referenced lines in `prometheus.yml` are commented out (rules exist in `rules/` but aren't loaded)
- **Recording rules** — none
## Deployment
Config is managed manually on london-a. To deploy changes:
```bash
# Copy config to london-a
scp prometheus.yml root@100.122.219.41:/usr/local/etc/prometheus.yml
# Reload (graceful, no restart needed)
ssh root@100.122.219.41 "kill -HUP $(pgrep prometheus)"
```