node_exporter is deployed by the dedicated node_exporter Ansible role
using distro packages (prometheus-node-exporter). Having it in
systemd_services causes the systemd_services role to look for a
non-existent services/node_exporter/node_exporter.service file,
producing errors during deploy.
Resolves PESO-135
Sonarr is running on london-b as an apt-managed systemd service
but was the only *arr service without a services/ directory in the
repo. Add services/sonarr/README.md documenting the install method,
data paths, and how it differs from the other *arr services.
Closes PESO-133
Cloudflared tunnels are no longer used. All traffic now routes through
Cloudflare DNS to Caddy on helsinki-a over Tailscale.
- Remove cloudflared systemd unit files (copenhagen-a, london-b)
- Remove cloudflared from media_stack role and copenhagen-a host_vars
- Remove cloudflared references from services README and host docs
- Remove cloudflared deploy trigger from CI workflow
Live service on london-b stopped and disabled. copenhagen-a was
unreachable but the tunnel is unused regardless.
The lint-docker-compose workflow was swallowing all validation errors with
|| true, meaning broken compose files would never fail the check.
- Remove || true and let validation failures propagate
- Add a pre-step that creates empty stubs for referenced env_file entries
(e.g. bitwarden/settings.env) so docker compose config can validate
structure without needing real secrets
- Track per-file pass/fail and exit non-zero if any file fails
Closes PESO-130
grafana.ini on london-a sets provisioning = /usr/local/etc/grafana/provisioning
but grafana_provisioning_dir pointed at /usr/local/share/grafana/conf/provisioning.
This meant deploy.yml synced alerting rules, dashboards provisioning, and
datasources to a path Grafana never reads — a from-scratch deploy would have
broken alerting entirely.
Fixes PESO-131
- Add copenhagen-a to [docker_hosts] inventory group so the docker role
runs on it in Stage 2
- Add docker_services: [minecraft] to copenhagen-a host_vars
- Add docker_services role to Stage 4d (copenhagen-a) in deploy.yml
- Update deploy-on-merge scope mapping to include copenhagen-a for
docker role changes
Closes PESO-132
cloudflared has been replaced by Caddy + Authelia. Removed:
- cloudflared service config (services/cloudflared/london-a/)
- tunnel ID from london-a host_vars
- cloudflared_enable from rc.conf
Also synced rc.conf with live server state (disabled services
from PESO-113, added node_exporter_listen_address).
Live server: stopped service, removed from rc.conf, uninstalled pkg.
* Add systemd_exporter Ansible role and Prometheus scrape config
- Create systemd_exporter role (download binary, create user, deploy service)
- Add scrape job for london-b:9558 and copenhagen-a:9558
- Add systemd_exporter_hosts inventory group
- Add stage 3b to deploy.yml
- Map role to deploy-on-merge scope
Closes PESO-120
* Fix line length lint violations in systemd_exporter tasks
* Fix var-naming lint: use systemd_exporter_ prefix for role variables
prometheus.yml was missing the rule_files section, so alerting rules
deployed to /usr/local/etc/prometheus/rules/ were never loaded.
- Add rule_files glob so Prometheus evaluates the ZFS pool rules
- Document that alerting notifications go through Grafana, not
Alertmanager — no alerting: section needed
- Remove node-exporter.rules (all rules were commented out)
Resolves PESO-103
Docker is masked on copenhagen-a and Minecraft is no longer managed
via Docker Compose. Removes:
- copenhagen-a from [docker_hosts] inventory group
- docker_services var from copenhagen-a host_vars
- docker_services role from Stage 4d deploy play
MaNGOS systemd services remain unchanged.
Fixes PESO-104
alerting is handled by grafana, not alertmanager. removed the
stale reverse proxy block from caddyfile template and updated
caddy + prometheus docs to reflect grafana-only alerting.
Instead of deploying to the entire fleet on every merge, detect which
files changed and limit ansible-playbook to only affected hosts.
Maps ansible roles, services, and host_vars to their target hosts.
Falls back to full fleet deploy for unmapped paths or changes to
shared infrastructure (common role, deploy.yml, inventory).
Closes PESO-108
london-b had both a custom node_exporter.service and the
package-managed prometheus-node-exporter.service installed.
Both tried to bind port 9100, causing the package version to fail.
- Add cleanup tasks to remove custom /etc/systemd/system/node_exporter.service
and /usr/local/bin/node_exporter if present
- Add node_exporter_extra_collectors variable for configurable collectors
- Configure london-b with systemd/processes/sysctl/ethtool/zfs collectors
matching its previous custom setup
Resolves PESO-109
Both deploy-on-merge.yml and deploy.yml install ansible via pip but
never install the required Galaxy collections (community.docker,
community.general, ansible.posix) from ansible/requirements.yml.
This works by accident because the pip ansible package bundles some
collections, but it's fragile — a pip upgrade or runner image change
could break deploys silently.
Fixes PESO-110
Adds docker_services list to nuremberg-a host_vars so the docker_services
role deploys and manages the poste-io mail container via docker compose,
replacing the current manual container setup.
Services stopped and disabled in rc.conf on london-a.
Removed audit variables from host_vars, replaced with cleanup note.
All four were leftovers from a defunct pez_vps project:
- InfluxDB: no user databases, only _internal
- Redis: empty keyspace, no clients
- PostgreSQL: defunct pez_vps database (Pez approved removal)
- libvirtd: zero VMs defined
Resolves PESO-113
PostgreSQL 14 and Redis have been stopped, disabled, purged, and
data directories removed from copenhagen-a. These were leftovers
from an old WordPress project with no user data.
Resolves: PESO-114
Several services were incorrectly listed as Docker when they actually
run as native systemd services:
- helsinki-a: Caddy is apt-installed, not Docker
- london-b: Radarr, Sonarr, Lidarr, Readarr, Prowlarr are systemd
services managed by media_stack role
- london-b: Jellyfin, Plex, Transmission are apt packages with systemd
units
Updated Deployment column to reflect actual deployment method.
Fixes PESO-116
- Move unit file from services/systemd/helsinki-a/ to
services/thiswebsitedoesnotexist/ (matches systemd_services role convention)
- Add systemd_services: [thiswebsitedoesnotexist] to helsinki-a host_vars
- Add systemd_services role to helsinki-a stage in deploy.yml
- Remove redundant caddy.service (apt manages this via the caddy role)
Closes PESO-117
* Bind node_exporter to Tailscale IP on public-facing hosts
node_exporter was listening on 0.0.0.0:9100 on helsinki-a and london-a,
exposing metrics to the public internet.
Changes:
- Add node_exporter_bind_tailscale flag (default false) to opt in
- Set flag on helsinki-a and london-a host_vars
- Debian: configure ARGS in /etc/default/prometheus-node-exporter
- FreeBSD: use native node_exporter_listen_address rc.conf variable
- Add handlers to restart on config change
Prometheus already scrapes via Tailscale IPs, no scrape config changes needed.
Fixes PESO-98
* fix(firewall_alpine): replace empty iptables ruleset with proper INPUT filtering
The rules.v4.j2 template deployed a ruleset with INPUT ACCEPT and zero
custom rules — effectively a no-op. nuremberg-a is a public-facing mail
server and needs actual filtering.
Changes:
- INPUT default policy set to DROP
- Allow loopback, established/related, Tailscale interface, SSH, ICMP
- FORWARD stays ACCEPT for Docker port-forwarding
- Added firewall_alpine_extra_input_rules variable for host-specific rules
Mail ports remain handled by Docker's FORWARD chain, not INPUT.
Closes PESO-119
node_exporter was listening on 0.0.0.0:9100 on helsinki-a and london-a,
exposing metrics to the public internet.
Changes:
- Add node_exporter_bind_tailscale flag (default false) to opt in
- Set flag on helsinki-a and london-a host_vars
- Debian: configure ARGS in /etc/default/prometheus-node-exporter
- FreeBSD: use native node_exporter_listen_address rc.conf variable
- Add handlers to restart on config change
Prometheus already scrapes via Tailscale IPs, no scrape config changes needed.
Fixes PESO-98
Audit of copenhagen-a found several running services not captured in
host_vars: cloudflared, node_exporter (systemd), and MariaDB. Also
found postgresql and redis running with no active consumers.
Updated host_vars to list all services and added undocumented_services
for the potentially unused ones. Updated docs with cloudflare tunnel,
monitoring, and notes about stale Docker images to clean up.
Closes PESO-100
Audit of london-a rc.conf found several services running but not
captured in host_vars or docs: cloudflared, InfluxDB, Redis,
PostgreSQL, and libvirtd.
- InfluxDB: only _internal db, completely unused
- Redis: empty keyspace, unused
- PostgreSQL: has pez_vps db from a dead project, needs data review
- libvirtd: zero VMs, related to same dead project
- cloudflared: running tunnel 168eccae, config now captured
Also documented the weekly ZFS scrub cron (Sundays at noon) which
is in root's crontab but not ansible-managed.
Ref: PESO-101
Remove webdav.pez.sh DNS record (WebDAV replaced by Nextcloud AIO on cloud.pez.sh)
Remove alertmanager.pez.sh DNS record and Caddyfile block (Alertmanager not running on london-a)
Remove status-https HTTPS record pointing to old statuspage.io (status.pez.sh is self-hosted on helsinki-a)
Remove commented-out WebDAV block from Caddyfile
Remove empty section headers for decommissioned hosts (london-c, copenhagen-b, copenhagen-c)
Closes PESO-102
copenhagen-a had Ubuntu's docker-compose-v2 package installed, which
conflicts with Docker's official docker-compose-plugin over
/usr/libexec/docker/cli-plugins/docker-compose.
Moved the removal task before the install task and added docker-compose-v2
to the removal list.
* Add Docker official apt repo to docker role
The docker role was installing docker-compose-plugin which is only
available from Docker's official apt repository. helsinki-a had it
configured manually, but london-b and copenhagen-a did not, causing
deploy failures.
Now the role:
- Adds Docker's GPG key and apt repo (handles both Debian and Ubuntu)
- Installs docker-ce, docker-ce-cli, containerd.io, docker-compose-plugin
- Removes conflicting stock packages (docker.io, docker-compose)
* fix: resolve yamllint violations in docker role
- Remove standalone comment blocks that caused indentation errors
- Collapse multiline repo string to single line
- Ensure document start marker is present
* fix: keep all lines under 160 chars for yamllint
Use set_fact to build the Docker repo line in parts instead of
one long inline string.
* fix: resolve yamllint errors in london-b host_vars and promtail config
- Remove trailing blank line in inventory/host_vars/london-b.yml
- Add missing document start marker to promtail config
- Fix indentation in promtail scrape_configs (indent list items under key)
* Remove ansible-lint on push, keep PR-only
Lint already runs on pull_request — no need to double up on push to main.
Add the full media automation stack (sonarr, radarr, prowlarr, lidarr,
readarr, whisparr), media servers (jellyfin, plex), and supporting
services (transmission, samba, ollama, promtail, cloudflared, vsftpd)
to the repo as a media_stack Ansible role.
Includes:
- Custom systemd unit files for non-package-managed services
- Config files for promtail, samba, transmission, vsftpd
- Cron jobs for movie-rename-fix, sonarr/radarr midnight restarts
- Updated deploy.yml to wire the role into london-b's stage
- Updated london-b docs with full service inventory
Backup script (backup.sh) already covered by the existing backup role.
Node/systemd exporters already covered by existing monitoring roles.
Closes PESO-92