* fix: actually decomission nextcloud and TWDNE
* ignore spaces in lint and remove dns for the services
* linting on the linting config wasn't linting the lints
node_exporter is deployed by the dedicated node_exporter Ansible role
using distro packages (prometheus-node-exporter). Having it in
systemd_services causes the systemd_services role to look for a
non-existent services/node_exporter/node_exporter.service file,
producing errors during deploy.
Resolves PESO-135
Sonarr is running on london-b as an apt-managed systemd service
but was the only *arr service without a services/ directory in the
repo. Add services/sonarr/README.md documenting the install method,
data paths, and how it differs from the other *arr services.
Closes PESO-133
Cloudflared tunnels are no longer used. All traffic now routes through
Cloudflare DNS to Caddy on helsinki-a over Tailscale.
- Remove cloudflared systemd unit files (copenhagen-a, london-b)
- Remove cloudflared from media_stack role and copenhagen-a host_vars
- Remove cloudflared references from services README and host docs
- Remove cloudflared deploy trigger from CI workflow
Live service on london-b stopped and disabled. copenhagen-a was
unreachable but the tunnel is unused regardless.
The lint-docker-compose workflow was swallowing all validation errors with
|| true, meaning broken compose files would never fail the check.
- Remove || true and let validation failures propagate
- Add a pre-step that creates empty stubs for referenced env_file entries
(e.g. bitwarden/settings.env) so docker compose config can validate
structure without needing real secrets
- Track per-file pass/fail and exit non-zero if any file fails
Closes PESO-130
grafana.ini on london-a sets provisioning = /usr/local/etc/grafana/provisioning
but grafana_provisioning_dir pointed at /usr/local/share/grafana/conf/provisioning.
This meant deploy.yml synced alerting rules, dashboards provisioning, and
datasources to a path Grafana never reads — a from-scratch deploy would have
broken alerting entirely.
Fixes PESO-131
- Add copenhagen-a to [docker_hosts] inventory group so the docker role
runs on it in Stage 2
- Add docker_services: [minecraft] to copenhagen-a host_vars
- Add docker_services role to Stage 4d (copenhagen-a) in deploy.yml
- Update deploy-on-merge scope mapping to include copenhagen-a for
docker role changes
Closes PESO-132
cloudflared has been replaced by Caddy + Authelia. Removed:
- cloudflared service config (services/cloudflared/london-a/)
- tunnel ID from london-a host_vars
- cloudflared_enable from rc.conf
Also synced rc.conf with live server state (disabled services
from PESO-113, added node_exporter_listen_address).
Live server: stopped service, removed from rc.conf, uninstalled pkg.
* Add systemd_exporter Ansible role and Prometheus scrape config
- Create systemd_exporter role (download binary, create user, deploy service)
- Add scrape job for london-b:9558 and copenhagen-a:9558
- Add systemd_exporter_hosts inventory group
- Add stage 3b to deploy.yml
- Map role to deploy-on-merge scope
Closes PESO-120
* Fix line length lint violations in systemd_exporter tasks
* Fix var-naming lint: use systemd_exporter_ prefix for role variables
prometheus.yml was missing the rule_files section, so alerting rules
deployed to /usr/local/etc/prometheus/rules/ were never loaded.
- Add rule_files glob so Prometheus evaluates the ZFS pool rules
- Document that alerting notifications go through Grafana, not
Alertmanager — no alerting: section needed
- Remove node-exporter.rules (all rules were commented out)
Resolves PESO-103
Docker is masked on copenhagen-a and Minecraft is no longer managed
via Docker Compose. Removes:
- copenhagen-a from [docker_hosts] inventory group
- docker_services var from copenhagen-a host_vars
- docker_services role from Stage 4d deploy play
MaNGOS systemd services remain unchanged.
Fixes PESO-104
alerting is handled by grafana, not alertmanager. removed the
stale reverse proxy block from caddyfile template and updated
caddy + prometheus docs to reflect grafana-only alerting.
Instead of deploying to the entire fleet on every merge, detect which
files changed and limit ansible-playbook to only affected hosts.
Maps ansible roles, services, and host_vars to their target hosts.
Falls back to full fleet deploy for unmapped paths or changes to
shared infrastructure (common role, deploy.yml, inventory).
Closes PESO-108
london-b had both a custom node_exporter.service and the
package-managed prometheus-node-exporter.service installed.
Both tried to bind port 9100, causing the package version to fail.
- Add cleanup tasks to remove custom /etc/systemd/system/node_exporter.service
and /usr/local/bin/node_exporter if present
- Add node_exporter_extra_collectors variable for configurable collectors
- Configure london-b with systemd/processes/sysctl/ethtool/zfs collectors
matching its previous custom setup
Resolves PESO-109
Both deploy-on-merge.yml and deploy.yml install ansible via pip but
never install the required Galaxy collections (community.docker,
community.general, ansible.posix) from ansible/requirements.yml.
This works by accident because the pip ansible package bundles some
collections, but it's fragile — a pip upgrade or runner image change
could break deploys silently.
Fixes PESO-110
Adds docker_services list to nuremberg-a host_vars so the docker_services
role deploys and manages the poste-io mail container via docker compose,
replacing the current manual container setup.
Services stopped and disabled in rc.conf on london-a.
Removed audit variables from host_vars, replaced with cleanup note.
All four were leftovers from a defunct pez_vps project:
- InfluxDB: no user databases, only _internal
- Redis: empty keyspace, no clients
- PostgreSQL: defunct pez_vps database (Pez approved removal)
- libvirtd: zero VMs defined
Resolves PESO-113