grafana.ini on london-a sets provisioning = /usr/local/etc/grafana/provisioning
but grafana_provisioning_dir pointed at /usr/local/share/grafana/conf/provisioning.
This meant deploy.yml synced alerting rules, dashboards provisioning, and
datasources to a path Grafana never reads — a from-scratch deploy would have
broken alerting entirely.
Fixes PESO-131
- Add copenhagen-a to [docker_hosts] inventory group so the docker role
runs on it in Stage 2
- Add docker_services: [minecraft] to copenhagen-a host_vars
- Add docker_services role to Stage 4d (copenhagen-a) in deploy.yml
- Update deploy-on-merge scope mapping to include copenhagen-a for
docker role changes
Closes PESO-132
cloudflared has been replaced by Caddy + Authelia. Removed:
- cloudflared service config (services/cloudflared/london-a/)
- tunnel ID from london-a host_vars
- cloudflared_enable from rc.conf
Also synced rc.conf with live server state (disabled services
from PESO-113, added node_exporter_listen_address).
Live server: stopped service, removed from rc.conf, uninstalled pkg.
* Add systemd_exporter Ansible role and Prometheus scrape config
- Create systemd_exporter role (download binary, create user, deploy service)
- Add scrape job for london-b:9558 and copenhagen-a:9558
- Add systemd_exporter_hosts inventory group
- Add stage 3b to deploy.yml
- Map role to deploy-on-merge scope
Closes PESO-120
* Fix line length lint violations in systemd_exporter tasks
* Fix var-naming lint: use systemd_exporter_ prefix for role variables
Docker is masked on copenhagen-a and Minecraft is no longer managed
via Docker Compose. Removes:
- copenhagen-a from [docker_hosts] inventory group
- docker_services var from copenhagen-a host_vars
- docker_services role from Stage 4d deploy play
MaNGOS systemd services remain unchanged.
Fixes PESO-104
london-b had both a custom node_exporter.service and the
package-managed prometheus-node-exporter.service installed.
Both tried to bind port 9100, causing the package version to fail.
- Add cleanup tasks to remove custom /etc/systemd/system/node_exporter.service
and /usr/local/bin/node_exporter if present
- Add node_exporter_extra_collectors variable for configurable collectors
- Configure london-b with systemd/processes/sysctl/ethtool/zfs collectors
matching its previous custom setup
Resolves PESO-109
Adds docker_services list to nuremberg-a host_vars so the docker_services
role deploys and manages the poste-io mail container via docker compose,
replacing the current manual container setup.
Services stopped and disabled in rc.conf on london-a.
Removed audit variables from host_vars, replaced with cleanup note.
All four were leftovers from a defunct pez_vps project:
- InfluxDB: no user databases, only _internal
- Redis: empty keyspace, no clients
- PostgreSQL: defunct pez_vps database (Pez approved removal)
- libvirtd: zero VMs defined
Resolves PESO-113
PostgreSQL 14 and Redis have been stopped, disabled, purged, and
data directories removed from copenhagen-a. These were leftovers
from an old WordPress project with no user data.
Resolves: PESO-114
- Move unit file from services/systemd/helsinki-a/ to
services/thiswebsitedoesnotexist/ (matches systemd_services role convention)
- Add systemd_services: [thiswebsitedoesnotexist] to helsinki-a host_vars
- Add systemd_services role to helsinki-a stage in deploy.yml
- Remove redundant caddy.service (apt manages this via the caddy role)
Closes PESO-117
node_exporter was listening on 0.0.0.0:9100 on helsinki-a and london-a,
exposing metrics to the public internet.
Changes:
- Add node_exporter_bind_tailscale flag (default false) to opt in
- Set flag on helsinki-a and london-a host_vars
- Debian: configure ARGS in /etc/default/prometheus-node-exporter
- FreeBSD: use native node_exporter_listen_address rc.conf variable
- Add handlers to restart on config change
Prometheus already scrapes via Tailscale IPs, no scrape config changes needed.
Fixes PESO-98
Audit of copenhagen-a found several running services not captured in
host_vars: cloudflared, node_exporter (systemd), and MariaDB. Also
found postgresql and redis running with no active consumers.
Updated host_vars to list all services and added undocumented_services
for the potentially unused ones. Updated docs with cloudflare tunnel,
monitoring, and notes about stale Docker images to clean up.
Closes PESO-100
Audit of london-a rc.conf found several services running but not
captured in host_vars or docs: cloudflared, InfluxDB, Redis,
PostgreSQL, and libvirtd.
- InfluxDB: only _internal db, completely unused
- Redis: empty keyspace, unused
- PostgreSQL: has pez_vps db from a dead project, needs data review
- libvirtd: zero VMs, related to same dead project
- cloudflared: running tunnel 168eccae, config now captured
Also documented the weekly ZFS scrub cron (Sundays at noon) which
is in root's crontab but not ansible-managed.
Ref: PESO-101
* Add Docker official apt repo to docker role
The docker role was installing docker-compose-plugin which is only
available from Docker's official apt repository. helsinki-a had it
configured manually, but london-b and copenhagen-a did not, causing
deploy failures.
Now the role:
- Adds Docker's GPG key and apt repo (handles both Debian and Ubuntu)
- Installs docker-ce, docker-ce-cli, containerd.io, docker-compose-plugin
- Removes conflicting stock packages (docker.io, docker-compose)
* fix: resolve yamllint violations in docker role
- Remove standalone comment blocks that caused indentation errors
- Collapse multiline repo string to single line
- Ensure document start marker is present
* fix: keep all lines under 160 chars for yamllint
Use set_fact to build the Docker repo line in parts instead of
one long inline string.
* fix: resolve yamllint errors in london-b host_vars and promtail config
- Remove trailing blank line in inventory/host_vars/london-b.yml
- Add missing document start marker to promtail config
- Fix indentation in promtail scrape_configs (indent list items under key)
* Remove ansible-lint on push, keep PR-only
Lint already runs on pull_request — no need to double up on push to main.
- New zfs role with cron-based scrub scheduling for Linux and FreeBSD
- Weekly Sunday scrubs at noon (matching existing manual crons)
- Add zfs_hosts inventory group with london-a and london-b
- Configure zfs_pools per host: zroot (london-a), hdd (london-b)
- Add Prometheus alert rules for degraded/faulted/offline pools
- Add zfs.yml playbook for targeted deploys
Captures the previously untracked scrub cron on london-a and
re-enables the commented-out scrub on london-b.
Refs: PESO-93
* Allow Plex port (32400/tcp) through UFW on london-b
Plex needs direct access on port 32400 for remote streaming.
Adds common_ufw_allowed_ports to london-b host_vars.
* Add BitTorrent port (6881) to london-b UFW allowed ports
Port was already manually configured in UFW, bringing it under Ansible management.
* Add Samba port (445/tcp) to london-b UFW allowed ports
helsinki-a runs Docker containers (authelia, forgejo, bitwarden) but was
missing from docker_hosts. This means the docker role and docker-status
playbook weren't targeting it during deploys.
Closes PESO-91
Add firewall_alpine role for Alpine hosts with iptables persistence
and fail2ban SSH jails. Wire it into nuremberg-a's deploy stage.
Mail ports are already exposed via Docker port mappings in the
poste-io docker-compose — this captures the surrounding iptables
and fail2ban config that was previously undocumented.
Closes PESO-96
* Allow Plex port (32400/tcp) through UFW on london-b
Plex needs direct access on port 32400 for remote streaming.
Adds common_ufw_allowed_ports to london-b host_vars.
* Add BitTorrent port (6881) to london-b UFW allowed ports
Port was already manually configured in UFW, bringing it under Ansible management.
* Configure UFW firewall rules in common Ansible role
Add UFW configuration to the common role for Debian hosts:
- Default deny incoming, allow outgoing
- Allow all traffic on tailscale0 interface (mesh comms)
- Allow SSH port 22 as safety net
- Per-host allowed ports via ufw_allowed_ports variable
- Enable UFW after rules are applied
helsinki-a gets ports 80/443 for reverse proxy traffic.
Other Debian hosts only need Tailscale + SSH.
Closes PESO-79
* Remove unused alerting and rule_files from prometheus.yml
Alerting is handled by Grafana, not Prometheus Alertmanager.
The empty alertmanagers and rule_files sections were just noise.
Resolves PESO-74
These services run on helsinki-a, not london-b. Verified via docker ps
on both hosts. deploy.yml would have managed them on the wrong host.
Fixes PESO-73