Commit graph

9 commits

Author SHA1 Message Date
5391c500e1
fix: loki & alloy (#83)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
* fix: loki & alloy

* fix linting
2026-04-28 16:40:45 +01:00
c495b73720
template prometheus config (#67) 2026-04-21 20:44:37 +01:00
d8757d37e1
fix(london-a): correct grafana provisioning dir path (#53)
grafana.ini on london-a sets provisioning = /usr/local/etc/grafana/provisioning
but grafana_provisioning_dir pointed at /usr/local/share/grafana/conf/provisioning.

This meant deploy.yml synced alerting rules, dashboards provisioning, and
datasources to a path Grafana never reads — a from-scratch deploy would have
broken alerting entirely.

Fixes PESO-131
2026-04-03 20:20:15 +01:00
dca6a08ba1
Remove cloudflared from london-a (PESO-134) (#50)
cloudflared has been replaced by Caddy + Authelia. Removed:
- cloudflared service config (services/cloudflared/london-a/)
- tunnel ID from london-a host_vars
- cloudflared_enable from rc.conf

Also synced rc.conf with live server state (disabled services
from PESO-113, added node_exporter_listen_address).

Live server: stopped service, removed from rc.conf, uninstalled pkg.
2026-04-03 18:51:51 +01:00
5a5c60b6b2
london-a: disable unused services (InfluxDB, Redis, PostgreSQL, libvirtd) (#37)
Services stopped and disabled in rc.conf on london-a.
Removed audit variables from host_vars, replaced with cleanup note.

All four were leftovers from a defunct pez_vps project:
- InfluxDB: no user databases, only _internal
- Redis: empty keyspace, no clients
- PostgreSQL: defunct pez_vps database (Pez approved removal)
- libvirtd: zero VMs defined

Resolves PESO-113
2026-04-03 00:17:58 +01:00
f2cebcdf38
Bind node_exporter to Tailscale IP on public-facing hosts (#31)
node_exporter was listening on 0.0.0.0:9100 on helsinki-a and london-a,
exposing metrics to the public internet.

Changes:
- Add node_exporter_bind_tailscale flag (default false) to opt in
- Set flag on helsinki-a and london-a host_vars
- Debian: configure ARGS in /etc/default/prometheus-node-exporter
- FreeBSD: use native node_exporter_listen_address rc.conf variable
- Add handlers to restart on config change

Prometheus already scrapes via Tailscale IPs, no scrape config changes needed.

Fixes PESO-98
2026-03-30 22:56:59 +01:00
0bcc53b01d
Document undocumented services on london-a (#29)
Audit of london-a rc.conf found several services running but not
captured in host_vars or docs: cloudflared, InfluxDB, Redis,
PostgreSQL, and libvirtd.

- InfluxDB: only _internal db, completely unused
- Redis: empty keyspace, unused
- PostgreSQL: has pez_vps db from a dead project, needs data review
- libvirtd: zero VMs, related to same dead project
- cloudflared: running tunnel 168eccae, config now captured

Also documented the weekly ZFS scrub cron (Sundays at noon) which
is in root's crontab but not ansible-managed.

Ref: PESO-101
2026-03-30 21:39:57 +01:00
69918c8619
Add ZFS management role: scrub scheduling and pool monitoring (#18)
- New zfs role with cron-based scrub scheduling for Linux and FreeBSD
- Weekly Sunday scrubs at noon (matching existing manual crons)
- Add zfs_hosts inventory group with london-a and london-b
- Configure zfs_pools per host: zroot (london-a), hdd (london-b)
- Add Prometheus alert rules for degraded/faulted/offline pools
- Add zfs.yml playbook for targeted deploys

Captures the previously untracked scrub cron on london-a and
re-enables the commented-out scrub on london-b.

Refs: PESO-93
2026-03-29 19:12:42 +01:00
Rasmus Wejlgaard
737d6e0bc1 initial commit 2026-03-28 12:39:41 +00:00