RWejlgaard/pez-infra

mirror of https://github.com/RWejlgaard/pez-infra.git synced 2026-05-06 04:14:43 +00:00

Author	SHA1	Message	Date
Rasmus "Pez" Wejlgaard	5391c500e1	fix: loki & alloy (#83 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details * fix: loki & alloy * fix linting	2026-04-28 16:40:45 +01:00
Rasmus "Pez" Wejlgaard	c495b73720	template prometheus config (#67 )	2026-04-21 20:44:37 +01:00
Rasmus "Pez" Wejlgaard	d8757d37e1	fix(london-a): correct grafana provisioning dir path (#53 ) grafana.ini on london-a sets provisioning = /usr/local/etc/grafana/provisioning but grafana_provisioning_dir pointed at /usr/local/share/grafana/conf/provisioning. This meant deploy.yml synced alerting rules, dashboards provisioning, and datasources to a path Grafana never reads — a from-scratch deploy would have broken alerting entirely. Fixes PESO-131	2026-04-03 20:20:15 +01:00
Rasmus "Pez" Wejlgaard	dca6a08ba1	Remove cloudflared from london-a (PESO-134) (#50 ) cloudflared has been replaced by Caddy + Authelia. Removed: - cloudflared service config (services/cloudflared/london-a/) - tunnel ID from london-a host_vars - cloudflared_enable from rc.conf Also synced rc.conf with live server state (disabled services from PESO-113, added node_exporter_listen_address). Live server: stopped service, removed from rc.conf, uninstalled pkg.	2026-04-03 18:51:51 +01:00
Rasmus "Pez" Wejlgaard	5a5c60b6b2	london-a: disable unused services (InfluxDB, Redis, PostgreSQL, libvirtd) (#37 ) Services stopped and disabled in rc.conf on london-a. Removed audit variables from host_vars, replaced with cleanup note. All four were leftovers from a defunct pez_vps project: - InfluxDB: no user databases, only _internal - Redis: empty keyspace, no clients - PostgreSQL: defunct pez_vps database (Pez approved removal) - libvirtd: zero VMs defined Resolves PESO-113	2026-04-03 00:17:58 +01:00
Rasmus "Pez" Wejlgaard	f2cebcdf38	Bind node_exporter to Tailscale IP on public-facing hosts (#31 ) node_exporter was listening on 0.0.0.0:9100 on helsinki-a and london-a, exposing metrics to the public internet. Changes: - Add node_exporter_bind_tailscale flag (default false) to opt in - Set flag on helsinki-a and london-a host_vars - Debian: configure ARGS in /etc/default/prometheus-node-exporter - FreeBSD: use native node_exporter_listen_address rc.conf variable - Add handlers to restart on config change Prometheus already scrapes via Tailscale IPs, no scrape config changes needed. Fixes PESO-98	2026-03-30 22:56:59 +01:00
Rasmus "Pez" Wejlgaard	0bcc53b01d	Document undocumented services on london-a (#29 ) Audit of london-a rc.conf found several services running but not captured in host_vars or docs: cloudflared, InfluxDB, Redis, PostgreSQL, and libvirtd. - InfluxDB: only _internal db, completely unused - Redis: empty keyspace, unused - PostgreSQL: has pez_vps db from a dead project, needs data review - libvirtd: zero VMs, related to same dead project - cloudflared: running tunnel 168eccae, config now captured Also documented the weekly ZFS scrub cron (Sundays at noon) which is in root's crontab but not ansible-managed. Ref: PESO-101	2026-03-30 21:39:57 +01:00
Rasmus "Pez" Wejlgaard	69918c8619	Add ZFS management role: scrub scheduling and pool monitoring (#18 ) - New zfs role with cron-based scrub scheduling for Linux and FreeBSD - Weekly Sunday scrubs at noon (matching existing manual crons) - Add zfs_hosts inventory group with london-a and london-b - Configure zfs_pools per host: zroot (london-a), hdd (london-b) - Add Prometheus alert rules for degraded/faulted/offline pools - Add zfs.yml playbook for targeted deploys Captures the previously untracked scrub cron on london-a and re-enables the commented-out scrub on london-b. Refs: PESO-93	2026-03-29 19:12:42 +01:00
Rasmus Wejlgaard	737d6e0bc1	initial commit	2026-03-28 12:39:41 +00:00

9 commits