Commit graph

54 commits

Author SHA1 Message Date
8822078998 remove alertmanager caddyfile entry and clean up references
alerting is handled by grafana, not alertmanager. removed the
stale reverse proxy block from caddyfile template and updated
caddy + prometheus docs to reflect grafana-only alerting.
2026-04-03 01:48:17 +00:00
b6c8c18106
deploy-on-merge: add path-based host limiting (#41)
Instead of deploying to the entire fleet on every merge, detect which
files changed and limit ansible-playbook to only affected hosts.

Maps ansible roles, services, and host_vars to their target hosts.
Falls back to full fleet deploy for unmapped paths or changes to
shared infrastructure (common role, deploy.yml, inventory).

Closes PESO-108
2026-04-03 02:19:55 +01:00
853386ce2f
fix: remove custom node_exporter, standardise on package version (#40)
london-b had both a custom node_exporter.service and the
package-managed prometheus-node-exporter.service installed.
Both tried to bind port 9100, causing the package version to fail.

- Add cleanup tasks to remove custom /etc/systemd/system/node_exporter.service
  and /usr/local/bin/node_exporter if present
- Add node_exporter_extra_collectors variable for configurable collectors
- Configure london-b with systemd/processes/sysctl/ethtool/zfs collectors
  matching its previous custom setup

Resolves PESO-109
2026-04-03 01:50:13 +01:00
20274d49d4
ci: add ansible-galaxy collection install to deploy workflows (#39)
Both deploy-on-merge.yml and deploy.yml install ansible via pip but
never install the required Galaxy collections (community.docker,
community.general, ansible.posix) from ansible/requirements.yml.

This works by accident because the pip ansible package bundles some
collections, but it's fragile — a pip upgrade or runner image change
could break deploys silently.

Fixes PESO-110
2026-04-03 01:18:30 +01:00
d3bce0d5c2
nuremberg-a: add poste-io to docker_services (#38)
Adds docker_services list to nuremberg-a host_vars so the docker_services
role deploys and manages the poste-io mail container via docker compose,
replacing the current manual container setup.
2026-04-03 00:49:50 +01:00
5a5c60b6b2
london-a: disable unused services (InfluxDB, Redis, PostgreSQL, libvirtd) (#37)
Services stopped and disabled in rc.conf on london-a.
Removed audit variables from host_vars, replaced with cleanup note.

All four were leftovers from a defunct pez_vps project:
- InfluxDB: no user databases, only _internal
- Redis: empty keyspace, no clients
- PostgreSQL: defunct pez_vps database (Pez approved removal)
- libvirtd: zero VMs defined

Resolves PESO-113
2026-04-03 00:17:58 +01:00
6503bef2c6
Merge pull request #36 from RWejlgaard/fix/ansible-lint-yaml-violations
Fix ansible-lint yaml violations
2026-04-03 00:15:10 +01:00
00b967d930 fix trailing blank line in copenhagen-a host_vars and missing document start in cloudflared config 2026-04-02 23:13:18 +00:00
ca3d9c4261
Remove undocumented_services from copenhagen-a host_vars (#35)
PostgreSQL 14 and Redis have been stopped, disabled, purged, and
data directories removed from copenhagen-a. These were leftovers
from an old WordPress project with no user data.

Resolves: PESO-114
2026-04-02 23:53:15 +01:00
9317a712ec
Fix deployment methods in docs/services.md (#34)
Several services were incorrectly listed as Docker when they actually
run as native systemd services:

- helsinki-a: Caddy is apt-installed, not Docker
- london-b: Radarr, Sonarr, Lidarr, Readarr, Prowlarr are systemd
  services managed by media_stack role
- london-b: Jellyfin, Plex, Transmission are apt packages with systemd
  units

Updated Deployment column to reflect actual deployment method.

Fixes PESO-116
2026-04-02 22:48:14 +01:00
3ce559d7b9
Wire thiswebsitedoesnotexist.service into deployment pipeline
- Move unit file from services/systemd/helsinki-a/ to
  services/thiswebsitedoesnotexist/ (matches systemd_services role convention)
- Add systemd_services: [thiswebsitedoesnotexist] to helsinki-a host_vars
- Add systemd_services role to helsinki-a stage in deploy.yml
- Remove redundant caddy.service (apt manages this via the caddy role)

Closes PESO-117
2026-04-02 22:19:26 +01:00
3c751af3ce
fix(firewall_alpine): replace empty iptables ruleset with proper INPUT filtering (#32)
* Bind node_exporter to Tailscale IP on public-facing hosts

node_exporter was listening on 0.0.0.0:9100 on helsinki-a and london-a,
exposing metrics to the public internet.

Changes:
- Add node_exporter_bind_tailscale flag (default false) to opt in
- Set flag on helsinki-a and london-a host_vars
- Debian: configure ARGS in /etc/default/prometheus-node-exporter
- FreeBSD: use native node_exporter_listen_address rc.conf variable
- Add handlers to restart on config change

Prometheus already scrapes via Tailscale IPs, no scrape config changes needed.

Fixes PESO-98

* fix(firewall_alpine): replace empty iptables ruleset with proper INPUT filtering

The rules.v4.j2 template deployed a ruleset with INPUT ACCEPT and zero
custom rules — effectively a no-op. nuremberg-a is a public-facing mail
server and needs actual filtering.

Changes:
- INPUT default policy set to DROP
- Allow loopback, established/related, Tailscale interface, SSH, ICMP
- FORWARD stays ACCEPT for Docker port-forwarding
- Added firewall_alpine_extra_input_rules variable for host-specific rules

Mail ports remain handled by Docker's FORWARD chain, not INPUT.

Closes PESO-119
2026-04-02 21:18:11 +01:00
f2cebcdf38
Bind node_exporter to Tailscale IP on public-facing hosts (#31)
node_exporter was listening on 0.0.0.0:9100 on helsinki-a and london-a,
exposing metrics to the public internet.

Changes:
- Add node_exporter_bind_tailscale flag (default false) to opt in
- Set flag on helsinki-a and london-a host_vars
- Debian: configure ARGS in /etc/default/prometheus-node-exporter
- FreeBSD: use native node_exporter_listen_address rc.conf variable
- Add handlers to restart on config change

Prometheus already scrapes via Tailscale IPs, no scrape config changes needed.

Fixes PESO-98
2026-03-30 22:56:59 +01:00
a74213b4cb
copenhagen-a: document all live services in host_vars and docs (#30)
Audit of copenhagen-a found several running services not captured in
host_vars: cloudflared, node_exporter (systemd), and MariaDB. Also
found postgresql and redis running with no active consumers.

Updated host_vars to list all services and added undocumented_services
for the potentially unused ones. Updated docs with cloudflare tunnel,
monitoring, and notes about stale Docker images to clean up.

Closes PESO-100
2026-03-30 22:10:27 +01:00
0bcc53b01d
Document undocumented services on london-a (#29)
Audit of london-a rc.conf found several services running but not
captured in host_vars or docs: cloudflared, InfluxDB, Redis,
PostgreSQL, and libvirtd.

- InfluxDB: only _internal db, completely unused
- Redis: empty keyspace, unused
- PostgreSQL: has pez_vps db from a dead project, needs data review
- libvirtd: zero VMs, related to same dead project
- cloudflared: running tunnel 168eccae, config now captured

Also documented the weekly ZFS scrub cron (Sundays at noon) which
is in root's crontab but not ansible-managed.

Ref: PESO-101
2026-03-30 21:39:57 +01:00
eb9f026abd
Clean up stale DNS records and Caddyfile entries (#28)
Remove webdav.pez.sh DNS record (WebDAV replaced by Nextcloud AIO on cloud.pez.sh)
Remove alertmanager.pez.sh DNS record and Caddyfile block (Alertmanager not running on london-a)
Remove status-https HTTPS record pointing to old statuspage.io (status.pez.sh is self-hosted on helsinki-a)
Remove commented-out WebDAV block from Caddyfile
Remove empty section headers for decommissioned hosts (london-c, copenhagen-b, copenhagen-c)

Closes PESO-102
2026-03-30 21:12:52 +01:00
551d4c985b
Merge pull request #27 from RWejlgaard/readme-update
update readme
2026-03-30 19:49:52 +01:00
5b98ea4e6a update readme 2026-03-30 19:42:47 +01:00
94d7f20c9b
Merge pull request #26 from RWejlgaard/fix/docker-compose-v2-conflict
fix: remove docker-compose-v2 before installing docker-compose-plugin
2026-03-30 19:11:35 +01:00
cfb2e83070 fix: remove docker-compose-v2 before installing docker-compose-plugin
copenhagen-a had Ubuntu's docker-compose-v2 package installed, which
conflicts with Docker's official docker-compose-plugin over
/usr/libexec/docker/cli-plugins/docker-compose.

Moved the removal task before the install task and added docker-compose-v2
to the removal list.
2026-03-30 18:08:50 +00:00
b16f89357b
replace hard set ip with vars (#25)
* replace hard set ip with vars

* run all PR checks every time
2026-03-29 21:33:50 +01:00
431c65065a
Add Docker official apt repo to docker role (#24)
* Add Docker official apt repo to docker role

The docker role was installing docker-compose-plugin which is only
available from Docker's official apt repository. helsinki-a had it
configured manually, but london-b and copenhagen-a did not, causing
deploy failures.

Now the role:
- Adds Docker's GPG key and apt repo (handles both Debian and Ubuntu)
- Installs docker-ce, docker-ce-cli, containerd.io, docker-compose-plugin
- Removes conflicting stock packages (docker.io, docker-compose)

* fix: resolve yamllint violations in docker role

- Remove standalone comment blocks that caused indentation errors
- Collapse multiline repo string to single line
- Ensure document start marker is present

* fix: keep all lines under 160 chars for yamllint

Use set_fact to build the Docker repo line in parts instead of
one long inline string.

* fix: resolve yamllint errors in london-b host_vars and promtail config

- Remove trailing blank line in inventory/host_vars/london-b.yml
- Add missing document start marker to promtail config
- Fix indentation in promtail scrape_configs (indent list items under key)

* Remove ansible-lint on push, keep PR-only

Lint already runs on pull_request — no need to double up on push to main.
2026-03-29 21:11:33 +01:00
4be8f73ffe
add hetzner servers terraform (#23)
Co-authored-by: Rasmus Wejlgaard <pez@Mac.localdomain>
2026-03-29 20:58:50 +01:00
353c2ad790
Capture london-b media stack and systemd services (#19)
Add the full media automation stack (sonarr, radarr, prowlarr, lidarr,
readarr, whisparr), media servers (jellyfin, plex), and supporting
services (transmission, samba, ollama, promtail, cloudflared, vsftpd)
to the repo as a media_stack Ansible role.

Includes:
- Custom systemd unit files for non-package-managed services
- Config files for promtail, samba, transmission, vsftpd
- Cron jobs for movie-rename-fix, sonarr/radarr midnight restarts
- Updated deploy.yml to wire the role into london-b's stage
- Updated london-b docs with full service inventory

Backup script (backup.sh) already covered by the existing backup role.
Node/systemd exporters already covered by existing monitoring roles.

Closes PESO-92
2026-03-29 19:13:48 +01:00
69918c8619
Add ZFS management role: scrub scheduling and pool monitoring (#18)
- New zfs role with cron-based scrub scheduling for Linux and FreeBSD
- Weekly Sunday scrubs at noon (matching existing manual crons)
- Add zfs_hosts inventory group with london-a and london-b
- Configure zfs_pools per host: zroot (london-a), hdd (london-b)
- Add Prometheus alert rules for degraded/faulted/offline pools
- Add zfs.yml playbook for targeted deploys

Captures the previously untracked scrub cron on london-a and
re-enables the commented-out scrub on london-b.

Refs: PESO-93
2026-03-29 19:12:42 +01:00
3d8fb84d1f
Feat/london b plex ufw (#21)
* Allow Plex port (32400/tcp) through UFW on london-b

Plex needs direct access on port 32400 for remote streaming.
Adds common_ufw_allowed_ports to london-b host_vars.

* Add BitTorrent port (6881) to london-b UFW allowed ports

Port was already manually configured in UFW, bringing it under Ansible management.

* Add Samba port (445/tcp) to london-b UFW allowed ports
2026-03-29 19:12:10 +01:00
0247f6aa6b
Fix docker-compose package conflict and alpine firewall handler (#22)
- Docker role: replace docker-compose with docker-compose-plugin (v2).
  The old docker-compose package conflicts with docker-compose-plugin
  already installed on helsinki-a. Also removes the conflicting package
  if present.

- firewall_alpine handler: use ansible.builtin.shell instead of
  ansible.builtin.command for iptables-restore, since the redirect
  operator (<) requires a shell.
2026-03-29 19:11:52 +01:00
106c45fc81
Add helsinki-a to docker_hosts inventory group (#20)
helsinki-a runs Docker containers (authelia, forgejo, bitwarden) but was
missing from docker_hosts. This means the docker role and docker-status
playbook weren't targeting it during deploys.

Closes PESO-91
2026-03-29 17:08:34 +01:00
b0acdb72e3
capture helsinki-a status page cron in repo (#17)
add status_page role that deploys update-status.sh and its cron job.
script queries prometheus for caddy upstream health and writes
status.json + history to /srv/status/ every minute.

refs: PESO-94
2026-03-29 15:39:35 +01:00
42eba42522
Add backup role to deploy hdd-backup.sh and cron to london-b (#16)
Captures the existing /root/scripts/backup.sh and its 22:00 daily cron
job as an Ansible role so it's managed via pez-infra deploys.

Refs: PESO-95
2026-03-29 15:09:01 +01:00
a7a71e4f87
capture nuremberg-a firewall rules in pez-infra (#15)
Add firewall_alpine role for Alpine hosts with iptables persistence
and fail2ban SSH jails. Wire it into nuremberg-a's deploy stage.

Mail ports are already exposed via Docker port mappings in the
poste-io docker-compose — this captures the surrounding iptables
and fail2ban config that was previously undocumented.

Closes PESO-96
2026-03-29 14:40:10 +01:00
258a38aeb5
Remove stale DNS records: chimera, gopher, ecp-dev, and old verification TXT (#14)
Stale A records removed:
- chimera.pez.sh → 13.43.223.167 (AWS IP reassigned, now serving unrelated site)
- gopher.pez.sh → 83.94.248.182 (unreachable on all ports)
- 0o9lix.ecp-dev.pez.sh → 0.0.0.0 (placeholder, never valid)

Stale TXT verification records removed:
- protonmail-verification (mail is self-hosted now, not ProtonMail)
- keybase-site-verification (Keybase is effectively dead)
- MS=ms99554544 (Microsoft domain verification, no active MS services)
- google-site-verification (no active Google services using this domain)
- apple-domain (no longer using Apple services after GrapheneOS switch)

PESO-97
2026-03-29 14:08:45 +01:00
8dffd3732b
Allow Plex port (32400/tcp) through UFW on london-b (#12)
* Allow Plex port (32400/tcp) through UFW on london-b

Plex needs direct access on port 32400 for remote streaming.
Adds common_ufw_allowed_ports to london-b host_vars.

* Add BitTorrent port (6881) to london-b UFW allowed ports

Port was already manually configured in UFW, bringing it under Ansible management.
2026-03-29 11:29:06 +01:00
99cc0d6967
Fix Alertmanager Caddyfile route pointing to Grafana port (#13)
Alertmanager reverse_proxy was pointing to :3000 (Grafana) instead of
:9093 (Alertmanager). Copy-paste artifact. Fixed in both the Caddyfile
and the template.
2026-03-29 11:07:41 +01:00
f9d0a7ebf4
fix: resolve UFW ansible-lint failures and deploy error (#11)
- Fix 'interface_or_direction' → 'direction' (required param for ufw module)
- Rename ufw_enabled/ufw_allowed_ports → common_ufw_enabled/common_ufw_allowed_ports (role prefix convention)
- Fix yaml[braces] violations in helsinki-a host_vars
2026-03-29 10:53:54 +01:00
4554dec7d2
Remove unused Prometheus alerting config (#10)
* Configure UFW firewall rules in common Ansible role

Add UFW configuration to the common role for Debian hosts:
- Default deny incoming, allow outgoing
- Allow all traffic on tailscale0 interface (mesh comms)
- Allow SSH port 22 as safety net
- Per-host allowed ports via ufw_allowed_ports variable
- Enable UFW after rules are applied

helsinki-a gets ports 80/443 for reverse proxy traffic.
Other Debian hosts only need Tailscale + SSH.

Closes PESO-79

* Remove unused alerting and rule_files from prometheus.yml

Alerting is handled by Grafana, not Prometheus Alertmanager.
The empty alertmanagers and rule_files sections were just noise.

Resolves PESO-74
2026-03-29 10:37:25 +01:00
da80c58ca4
fix: move authelia, forgejo, bitwarden to helsinki-a host_vars (#8)
These services run on helsinki-a, not london-b. Verified via docker ps
on both hosts. deploy.yml would have managed them on the wrong host.

Fixes PESO-73
2026-03-28 22:08:16 +00:00
8548050772
Remove dead DNS record: satisfactory.pez.sh (#7)
nuremberg-b (162.55.55.2) has been decommissioned, this record is stale.

Closes PESO-75
2026-03-28 21:37:26 +00:00
69f895c5cd
Remove bogus PTR records from Cloudflare forward zone (#6)
PTR record for 83.94.248.182 (copenhagen-a) incorrectly claimed to be
mail.pez.sh. PTR records in a forward DNS zone don't control actual
reverse DNS (that's managed by the ISP), and this record was misleading.

Also removed the mail-ptr record which had a similarly misplaced
in-addr.arpa reference in the forward zone.

Fixes PESO-76
2026-03-28 21:08:31 +00:00
Rasmus Wejlgaard
80ddb31f8a update readme 2026-03-28 21:06:14 +00:00
b00791f1b1
Update SPF and tighten DMARC for poste.io (#5)
* update SPF record: replace protonmail with poste.io mail server

PESO-77

- replace include:_spf.protonmail.ch with ip4:167.235.134.154 and ip6:2a01:4f8:1c1e:9c53::1 (nuremberg-a / mail.pez.sh)
- tighten from ~all (softfail) to -all (hardfail)

* tighten DMARC policy from p=none to p=quarantine

PESO-78

- enforce DMARC with p=quarantine (failed messages get quarantined)
- add adkim=r and aspf=r for relaxed DKIM/SPF alignment
2026-03-28 20:46:50 +00:00
03ce524730
Standardise Prometheus targets to Tailscale IPs (#4)
Replace local network IPs (192.168.1.x) with Tailscale IPs for
london-a and london-b in all scrape configs. This ensures consistent
connectivity via Tailscale mesh regardless of network topology changes.

Refs: PESO-80
2026-03-28 20:08:09 +00:00
61502861e3
Merge pull request #3 from RWejlgaard/feat/authelia-config 2026-03-28 18:52:40 +00:00
92fb6f9d11 ignore all SOPS-encrypted files in yamllint 2026-03-28 18:50:08 +00:00
8bb91032f3 Add Authelia config and SOPS-encrypted secrets
- Add configuration.yml from running helsinki-a deployment
- Replace example secrets with real SOPS-encrypted config.enc.yml
- Add LDAP and SMTP password file env vars to docker-compose
  (all secrets now via file mounts, zero inline passwords)
- Update README with secret mapping and deployment steps

Closes PESO-89
2026-03-28 17:42:07 +00:00
8163b226b3
Merge pull request #2 from RWejlgaard/fix-lint-nitpicks
Fix ansible-lint yaml nitpicks
2026-03-28 13:19:37 +00:00
46063246a2 fix last 3 yaml lint failures
- add missing --- to notification-policy.yml
- prometheus.yml: replace commented-out template defaults with empty lists
2026-03-28 13:17:42 +00:00
dc198eea81 fix more yaml document-start and comment indentation
- add missing --- to 13 more yml files
- fix comment indentation in prometheus.yml
2026-03-28 13:15:46 +00:00
dc10ceacf5 fix remaining yaml lint nitpicks
- add missing document start (---) to contact-points.yml and docker-compose files
- fix extra spaces inside braces in dotfiles and common role tasks
2026-03-28 13:13:37 +00:00
6f5cb82ab9 remove pr-test.yml 2026-03-28 13:11:34 +00:00