* Add systemd_exporter Ansible role and Prometheus scrape config
- Create systemd_exporter role (download binary, create user, deploy service)
- Add scrape job for london-b:9558 and copenhagen-a:9558
- Add systemd_exporter_hosts inventory group
- Add stage 3b to deploy.yml
- Map role to deploy-on-merge scope
Closes PESO-120
* Fix line length lint violations in systemd_exporter tasks
* Fix var-naming lint: use systemd_exporter_ prefix for role variables
prometheus.yml was missing the rule_files section, so alerting rules
deployed to /usr/local/etc/prometheus/rules/ were never loaded.
- Add rule_files glob so Prometheus evaluates the ZFS pool rules
- Document that alerting notifications go through Grafana, not
Alertmanager — no alerting: section needed
- Remove node-exporter.rules (all rules were commented out)
Resolves PESO-103
Docker is masked on copenhagen-a and Minecraft is no longer managed
via Docker Compose. Removes:
- copenhagen-a from [docker_hosts] inventory group
- docker_services var from copenhagen-a host_vars
- docker_services role from Stage 4d deploy play
MaNGOS systemd services remain unchanged.
Fixes PESO-104
alerting is handled by grafana, not alertmanager. removed the
stale reverse proxy block from caddyfile template and updated
caddy + prometheus docs to reflect grafana-only alerting.
london-b had both a custom node_exporter.service and the
package-managed prometheus-node-exporter.service installed.
Both tried to bind port 9100, causing the package version to fail.
- Add cleanup tasks to remove custom /etc/systemd/system/node_exporter.service
and /usr/local/bin/node_exporter if present
- Add node_exporter_extra_collectors variable for configurable collectors
- Configure london-b with systemd/processes/sysctl/ethtool/zfs collectors
matching its previous custom setup
Resolves PESO-109
Adds docker_services list to nuremberg-a host_vars so the docker_services
role deploys and manages the poste-io mail container via docker compose,
replacing the current manual container setup.
Services stopped and disabled in rc.conf on london-a.
Removed audit variables from host_vars, replaced with cleanup note.
All four were leftovers from a defunct pez_vps project:
- InfluxDB: no user databases, only _internal
- Redis: empty keyspace, no clients
- PostgreSQL: defunct pez_vps database (Pez approved removal)
- libvirtd: zero VMs defined
Resolves PESO-113
PostgreSQL 14 and Redis have been stopped, disabled, purged, and
data directories removed from copenhagen-a. These were leftovers
from an old WordPress project with no user data.
Resolves: PESO-114
- Move unit file from services/systemd/helsinki-a/ to
services/thiswebsitedoesnotexist/ (matches systemd_services role convention)
- Add systemd_services: [thiswebsitedoesnotexist] to helsinki-a host_vars
- Add systemd_services role to helsinki-a stage in deploy.yml
- Remove redundant caddy.service (apt manages this via the caddy role)
Closes PESO-117
* Bind node_exporter to Tailscale IP on public-facing hosts
node_exporter was listening on 0.0.0.0:9100 on helsinki-a and london-a,
exposing metrics to the public internet.
Changes:
- Add node_exporter_bind_tailscale flag (default false) to opt in
- Set flag on helsinki-a and london-a host_vars
- Debian: configure ARGS in /etc/default/prometheus-node-exporter
- FreeBSD: use native node_exporter_listen_address rc.conf variable
- Add handlers to restart on config change
Prometheus already scrapes via Tailscale IPs, no scrape config changes needed.
Fixes PESO-98
* fix(firewall_alpine): replace empty iptables ruleset with proper INPUT filtering
The rules.v4.j2 template deployed a ruleset with INPUT ACCEPT and zero
custom rules — effectively a no-op. nuremberg-a is a public-facing mail
server and needs actual filtering.
Changes:
- INPUT default policy set to DROP
- Allow loopback, established/related, Tailscale interface, SSH, ICMP
- FORWARD stays ACCEPT for Docker port-forwarding
- Added firewall_alpine_extra_input_rules variable for host-specific rules
Mail ports remain handled by Docker's FORWARD chain, not INPUT.
Closes PESO-119
node_exporter was listening on 0.0.0.0:9100 on helsinki-a and london-a,
exposing metrics to the public internet.
Changes:
- Add node_exporter_bind_tailscale flag (default false) to opt in
- Set flag on helsinki-a and london-a host_vars
- Debian: configure ARGS in /etc/default/prometheus-node-exporter
- FreeBSD: use native node_exporter_listen_address rc.conf variable
- Add handlers to restart on config change
Prometheus already scrapes via Tailscale IPs, no scrape config changes needed.
Fixes PESO-98
Audit of copenhagen-a found several running services not captured in
host_vars: cloudflared, node_exporter (systemd), and MariaDB. Also
found postgresql and redis running with no active consumers.
Updated host_vars to list all services and added undocumented_services
for the potentially unused ones. Updated docs with cloudflare tunnel,
monitoring, and notes about stale Docker images to clean up.
Closes PESO-100
Audit of london-a rc.conf found several services running but not
captured in host_vars or docs: cloudflared, InfluxDB, Redis,
PostgreSQL, and libvirtd.
- InfluxDB: only _internal db, completely unused
- Redis: empty keyspace, unused
- PostgreSQL: has pez_vps db from a dead project, needs data review
- libvirtd: zero VMs, related to same dead project
- cloudflared: running tunnel 168eccae, config now captured
Also documented the weekly ZFS scrub cron (Sundays at noon) which
is in root's crontab but not ansible-managed.
Ref: PESO-101
Remove webdav.pez.sh DNS record (WebDAV replaced by Nextcloud AIO on cloud.pez.sh)
Remove alertmanager.pez.sh DNS record and Caddyfile block (Alertmanager not running on london-a)
Remove status-https HTTPS record pointing to old statuspage.io (status.pez.sh is self-hosted on helsinki-a)
Remove commented-out WebDAV block from Caddyfile
Remove empty section headers for decommissioned hosts (london-c, copenhagen-b, copenhagen-c)
Closes PESO-102
copenhagen-a had Ubuntu's docker-compose-v2 package installed, which
conflicts with Docker's official docker-compose-plugin over
/usr/libexec/docker/cli-plugins/docker-compose.
Moved the removal task before the install task and added docker-compose-v2
to the removal list.
* Add Docker official apt repo to docker role
The docker role was installing docker-compose-plugin which is only
available from Docker's official apt repository. helsinki-a had it
configured manually, but london-b and copenhagen-a did not, causing
deploy failures.
Now the role:
- Adds Docker's GPG key and apt repo (handles both Debian and Ubuntu)
- Installs docker-ce, docker-ce-cli, containerd.io, docker-compose-plugin
- Removes conflicting stock packages (docker.io, docker-compose)
* fix: resolve yamllint violations in docker role
- Remove standalone comment blocks that caused indentation errors
- Collapse multiline repo string to single line
- Ensure document start marker is present
* fix: keep all lines under 160 chars for yamllint
Use set_fact to build the Docker repo line in parts instead of
one long inline string.
* fix: resolve yamllint errors in london-b host_vars and promtail config
- Remove trailing blank line in inventory/host_vars/london-b.yml
- Add missing document start marker to promtail config
- Fix indentation in promtail scrape_configs (indent list items under key)
* Remove ansible-lint on push, keep PR-only
Lint already runs on pull_request — no need to double up on push to main.
Add the full media automation stack (sonarr, radarr, prowlarr, lidarr,
readarr, whisparr), media servers (jellyfin, plex), and supporting
services (transmission, samba, ollama, promtail, cloudflared, vsftpd)
to the repo as a media_stack Ansible role.
Includes:
- Custom systemd unit files for non-package-managed services
- Config files for promtail, samba, transmission, vsftpd
- Cron jobs for movie-rename-fix, sonarr/radarr midnight restarts
- Updated deploy.yml to wire the role into london-b's stage
- Updated london-b docs with full service inventory
Backup script (backup.sh) already covered by the existing backup role.
Node/systemd exporters already covered by existing monitoring roles.
Closes PESO-92
- New zfs role with cron-based scrub scheduling for Linux and FreeBSD
- Weekly Sunday scrubs at noon (matching existing manual crons)
- Add zfs_hosts inventory group with london-a and london-b
- Configure zfs_pools per host: zroot (london-a), hdd (london-b)
- Add Prometheus alert rules for degraded/faulted/offline pools
- Add zfs.yml playbook for targeted deploys
Captures the previously untracked scrub cron on london-a and
re-enables the commented-out scrub on london-b.
Refs: PESO-93
* Allow Plex port (32400/tcp) through UFW on london-b
Plex needs direct access on port 32400 for remote streaming.
Adds common_ufw_allowed_ports to london-b host_vars.
* Add BitTorrent port (6881) to london-b UFW allowed ports
Port was already manually configured in UFW, bringing it under Ansible management.
* Add Samba port (445/tcp) to london-b UFW allowed ports
- Docker role: replace docker-compose with docker-compose-plugin (v2).
The old docker-compose package conflicts with docker-compose-plugin
already installed on helsinki-a. Also removes the conflicting package
if present.
- firewall_alpine handler: use ansible.builtin.shell instead of
ansible.builtin.command for iptables-restore, since the redirect
operator (<) requires a shell.
helsinki-a runs Docker containers (authelia, forgejo, bitwarden) but was
missing from docker_hosts. This means the docker role and docker-status
playbook weren't targeting it during deploys.
Closes PESO-91
add status_page role that deploys update-status.sh and its cron job.
script queries prometheus for caddy upstream health and writes
status.json + history to /srv/status/ every minute.
refs: PESO-94
Add firewall_alpine role for Alpine hosts with iptables persistence
and fail2ban SSH jails. Wire it into nuremberg-a's deploy stage.
Mail ports are already exposed via Docker port mappings in the
poste-io docker-compose — this captures the surrounding iptables
and fail2ban config that was previously undocumented.
Closes PESO-96
* Allow Plex port (32400/tcp) through UFW on london-b
Plex needs direct access on port 32400 for remote streaming.
Adds common_ufw_allowed_ports to london-b host_vars.
* Add BitTorrent port (6881) to london-b UFW allowed ports
Port was already manually configured in UFW, bringing it under Ansible management.
Alertmanager reverse_proxy was pointing to :3000 (Grafana) instead of
:9093 (Alertmanager). Copy-paste artifact. Fixed in both the Caddyfile
and the template.
* Configure UFW firewall rules in common Ansible role
Add UFW configuration to the common role for Debian hosts:
- Default deny incoming, allow outgoing
- Allow all traffic on tailscale0 interface (mesh comms)
- Allow SSH port 22 as safety net
- Per-host allowed ports via ufw_allowed_ports variable
- Enable UFW after rules are applied
helsinki-a gets ports 80/443 for reverse proxy traffic.
Other Debian hosts only need Tailscale + SSH.
Closes PESO-79
* Remove unused alerting and rule_files from prometheus.yml
Alerting is handled by Grafana, not Prometheus Alertmanager.
The empty alertmanagers and rule_files sections were just noise.
Resolves PESO-74
These services run on helsinki-a, not london-b. Verified via docker ps
on both hosts. deploy.yml would have managed them on the wrong host.
Fixes PESO-73
Replace local network IPs (192.168.1.x) with Tailscale IPs for
london-a and london-b in all scrape configs. This ensures consistent
connectivity via Tailscale mesh regardless of network topology changes.
Refs: PESO-80
- Add configuration.yml from running helsinki-a deployment
- Replace example secrets with real SOPS-encrypted config.enc.yml
- Add LDAP and SMTP password file env vars to docker-compose
(all secrets now via file mounts, zero inline passwords)
- Update README with secret mapping and deployment steps
Closes PESO-89
- add mangosd.conf, realmd.conf, ahbot.conf, aiplayerbot.conf from copenhagen-a
- db password replaced with {{ mangos_db_password }} placeholder
- fix mangos-world.service: was identical copy of realmd service, now points to mangosd
- add README for mangos-zero service