RWejlgaard/pez-infra

mirror of https://github.com/RWejlgaard/pez-infra.git synced 2026-07-04 15:46:16 +00:00

Author	SHA1	Message	Date
Rasmus "Pez" Wejlgaard	56bec98afc	fix: Add octopus_exporter job configuration (#68 )	2026-04-22 21:28:14 +01:00
Rasmus "Pez" Wejlgaard	c495b73720	template prometheus config (#67 )	2026-04-21 20:44:37 +01:00
Rasmus "Pez" Wejlgaard	34820ee663	adding london-c (#66 )	2026-04-20 20:52:19 +01:00
Rasmus "Pez" Wejlgaard	177fbb4014	Change provider for plex metrics (#65 ) * change provider for plex metrics * update plex token * update plex token loading	2026-04-13 19:04:54 +01:00
Rasmus "Pez" Wejlgaard	2a98a89eb4	Change provider for plex metrics (#64 ) * change provider for plex metrics * update plex token	2026-04-12 21:21:24 +01:00
Rasmus "Pez" Wejlgaard	a0ec92dfdd	change provider for plex metrics (#63 )	2026-04-12 18:45:30 +01:00
Rasmus "Pez" Wejlgaard	49cee191b5	fix: bind mariadb to local ip (#62 )	2026-04-11 21:24:11 +01:00
Rasmus "Pez" Wejlgaard	1ef59ccc4a	fix: add mangos ports to firewall (#61 )	2026-04-11 20:42:17 +01:00
Rasmus "Pez" Wejlgaard	1ab278e47a	only send email if something went wrong with backups (#60 )	2026-04-06 18:33:07 +01:00
Rasmus "Pez" Wejlgaard	4c7ea76d81	fix: remove node_exporter from copenhagen-a systemd_services (#59 ) node_exporter is deployed by the dedicated node_exporter Ansible role using distro packages (prometheus-node-exporter). Having it in systemd_services causes the systemd_services role to look for a non-existent services/node_exporter/node_exporter.service file, producing errors during deploy. Resolves PESO-135	2026-04-04 12:51:52 +01:00
Rasmus "Pez" Wejlgaard	41d7876260	change provider for mc server for more configurability (#58 )	2026-04-04 12:01:28 +01:00
Rasmus "Pez" Wejlgaard	849ea208f0	fix grafana alert rules missing relativeTimeRange (#57 )	2026-04-04 09:58:13 +01:00
Rasmus "Pez" Wejlgaard	267b392996	Add sonarr service directory with README (#51 ) Sonarr is running on london-b as an apt-managed systemd service but was the only arr service without a services/ directory in the repo. Add services/sonarr/README.md documenting the install method, data paths, and how it differs from the other arr services. Closes PESO-133	2026-04-04 09:31:39 +01:00
Rasmus "Pez" Wejlgaard	ed6eb22f60	Remove cloudflared — replaced by Caddy reverse proxy (#56 ) Cloudflared tunnels are no longer used. All traffic now routes through Cloudflare DNS to Caddy on helsinki-a over Tailscale. - Remove cloudflared systemd unit files (copenhagen-a, london-b) - Remove cloudflared from media_stack role and copenhagen-a host_vars - Remove cloudflared references from services README and host docs - Remove cloudflared deploy trigger from CI workflow Live service on london-b stopped and disabled. copenhagen-a was unreachable but the tunnel is unused regardless.	2026-04-03 22:51:12 +01:00
Rasmus "Pez" Wejlgaard	99c2091b96	Add smartctl-exporter to copenhagen-a and Prometheus scrape (#55 ) - Add smartctl-exporter to copenhagen-a docker_services - Add copenhagen-a as a Prometheus smartmontools scrape target - Update compose file comment to reflect multi-host usage Closes PESO-128	2026-04-03 21:20:20 +01:00
Rasmus "Pez" Wejlgaard	88377f3e93	fix: remove \|\| true from compose lint so validation errors fail CI (#54 ) The lint-docker-compose workflow was swallowing all validation errors with \|\| true, meaning broken compose files would never fail the check. - Remove \|\| true and let validation failures propagate - Add a pre-step that creates empty stubs for referenced env_file entries (e.g. bitwarden/settings.env) so docker compose config can validate structure without needing real secrets - Track per-file pass/fail and exit non-zero if any file fails Closes PESO-130	2026-04-03 20:50:47 +01:00
Rasmus "Pez" Wejlgaard	d8757d37e1	fix(london-a): correct grafana provisioning dir path (#53 ) grafana.ini on london-a sets provisioning = /usr/local/etc/grafana/provisioning but grafana_provisioning_dir pointed at /usr/local/share/grafana/conf/provisioning. This meant deploy.yml synced alerting rules, dashboards provisioning, and datasources to a path Grafana never reads — a from-scratch deploy would have broken alerting entirely. Fixes PESO-131	2026-04-03 20:20:15 +01:00
Rasmus "Pez" Wejlgaard	25d201f930	Add copenhagen-a to docker_hosts and wire up minecraft docker service (#52 ) - Add copenhagen-a to [docker_hosts] inventory group so the docker role runs on it in Stage 2 - Add docker_services: [minecraft] to copenhagen-a host_vars - Add docker_services role to Stage 4d (copenhagen-a) in deploy.yml - Update deploy-on-merge scope mapping to include copenhagen-a for docker role changes Closes PESO-132	2026-04-03 19:50:51 +01:00
Rasmus "Pez" Wejlgaard	dca6a08ba1	Remove cloudflared from london-a (PESO-134) (#50 ) cloudflared has been replaced by Caddy + Authelia. Removed: - cloudflared service config (services/cloudflared/london-a/) - tunnel ID from london-a host_vars - cloudflared_enable from rc.conf Also synced rc.conf with live server state (disabled services from PESO-113, added node_exporter_listen_address). Live server: stopped service, removed from rc.conf, uninstalled pkg.	2026-04-03 18:51:51 +01:00
Rasmus "Pez" Wejlgaard	a31f8b5651	Add systemd_exporter Ansible role and Prometheus scrape config (#49 ) * Add systemd_exporter Ansible role and Prometheus scrape config - Create systemd_exporter role (download binary, create user, deploy service) - Add scrape job for london-b:9558 and copenhagen-a:9558 - Add systemd_exporter_hosts inventory group - Add stage 3b to deploy.yml - Map role to deploy-on-merge scope Closes PESO-120 * Fix line length lint violations in systemd_exporter tasks * Fix var-naming lint: use systemd_exporter_ prefix for role variables	2026-04-03 12:23:38 +01:00
Rasmus "Pez" Wejlgaard	8f5eb385cc	Remove copenhagen-a from docker role mapping in deploy-on-merge (#48 ) copenhagen-a is not in [docker_hosts] inventory group. Running the docker role play against it just gets skipped, wasting CI time. Fixes PESO-121	2026-04-03 11:49:41 +01:00
Rasmus "Pez" Wejlgaard	029c35fba6	Replace ASCII diagrams with mermaid in docs (#47 ) Convert remaining ASCII art diagrams to mermaid syntax: - monitoring.md: stack overview diagram - networking.md: Tailscale mesh diagram + DNS request flow architecture.md already used mermaid, no changes needed. PESO-123	2026-04-03 10:48:41 +01:00
Rasmus "Pez" Wejlgaard	8a4a95b596	Add ZFS role to deploy.yml for scrub scheduling (#44 )	2026-04-03 09:53:10 +01:00
Rasmus "Pez" Wejlgaard	49cea826e0	capture overseerr, syncthing, and fix slskd on london-b (#43 )	2026-04-03 09:52:10 +01:00
Rasmus "Pez" Wejlgaard	2d7723d145	Add rule_files to prometheus.yml, remove empty node-exporter.rules (#46 ) prometheus.yml was missing the rule_files section, so alerting rules deployed to /usr/local/etc/prometheus/rules/ were never loaded. - Add rule_files glob so Prometheus evaluates the ZFS pool rules - Document that alerting notifications go through Grafana, not Alertmanager — no alerting: section needed - Remove node-exporter.rules (all rules were commented out) Resolves PESO-103	2026-04-03 04:49:16 +01:00
Rasmus "Pez" Wejlgaard	ff8d7a53e7	Remove copenhagen-a from docker_hosts and docker_services (#45 ) Docker is masked on copenhagen-a and Minecraft is no longer managed via Docker Compose. Removes: - copenhagen-a from [docker_hosts] inventory group - docker_services var from copenhagen-a host_vars - docker_services role from Stage 4d deploy play MaNGOS systemd services remain unchanged. Fixes PESO-104	2026-04-03 04:18:46 +01:00
Rasmus "Pez" Wejlgaard	f75e2a8d5f	remove alertmanager caddyfile entry and clean up references (#42 ) alerting is handled by grafana, not alertmanager. removed the stale reverse proxy block from caddyfile template and updated caddy + prometheus docs to reflect grafana-only alerting.	2026-04-03 02:49:37 +01:00
Rasmus "Pez" Wejlgaard	b6c8c18106	deploy-on-merge: add path-based host limiting (#41 ) Instead of deploying to the entire fleet on every merge, detect which files changed and limit ansible-playbook to only affected hosts. Maps ansible roles, services, and host_vars to their target hosts. Falls back to full fleet deploy for unmapped paths or changes to shared infrastructure (common role, deploy.yml, inventory). Closes PESO-108	2026-04-03 02:19:55 +01:00
Rasmus "Pez" Wejlgaard	853386ce2f	fix: remove custom node_exporter, standardise on package version (#40 ) london-b had both a custom node_exporter.service and the package-managed prometheus-node-exporter.service installed. Both tried to bind port 9100, causing the package version to fail. - Add cleanup tasks to remove custom /etc/systemd/system/node_exporter.service and /usr/local/bin/node_exporter if present - Add node_exporter_extra_collectors variable for configurable collectors - Configure london-b with systemd/processes/sysctl/ethtool/zfs collectors matching its previous custom setup Resolves PESO-109	2026-04-03 01:50:13 +01:00
Rasmus "Pez" Wejlgaard	20274d49d4	ci: add ansible-galaxy collection install to deploy workflows (#39 ) Both deploy-on-merge.yml and deploy.yml install ansible via pip but never install the required Galaxy collections (community.docker, community.general, ansible.posix) from ansible/requirements.yml. This works by accident because the pip ansible package bundles some collections, but it's fragile — a pip upgrade or runner image change could break deploys silently. Fixes PESO-110	2026-04-03 01:18:30 +01:00
Rasmus "Pez" Wejlgaard	d3bce0d5c2	nuremberg-a: add poste-io to docker_services (#38 ) Adds docker_services list to nuremberg-a host_vars so the docker_services role deploys and manages the poste-io mail container via docker compose, replacing the current manual container setup.	2026-04-03 00:49:50 +01:00
Rasmus "Pez" Wejlgaard	5a5c60b6b2	london-a: disable unused services (InfluxDB, Redis, PostgreSQL, libvirtd) (#37 ) Services stopped and disabled in rc.conf on london-a. Removed audit variables from host_vars, replaced with cleanup note. All four were leftovers from a defunct pez_vps project: - InfluxDB: no user databases, only _internal - Redis: empty keyspace, no clients - PostgreSQL: defunct pez_vps database (Pez approved removal) - libvirtd: zero VMs defined Resolves PESO-113	2026-04-03 00:17:58 +01:00
Rasmus "Pez" Wejlgaard	6503bef2c6	Merge pull request #36 from RWejlgaard/fix/ansible-lint-yaml-violations Fix ansible-lint yaml violations	2026-04-03 00:15:10 +01:00
Rasmus Wejlgaard	00b967d930	fix trailing blank line in copenhagen-a host_vars and missing document start in cloudflared config	2026-04-02 23:13:18 +00:00
Rasmus "Pez" Wejlgaard	ca3d9c4261	Remove undocumented_services from copenhagen-a host_vars (#35 ) PostgreSQL 14 and Redis have been stopped, disabled, purged, and data directories removed from copenhagen-a. These were leftovers from an old WordPress project with no user data. Resolves: PESO-114	2026-04-02 23:53:15 +01:00
Rasmus "Pez" Wejlgaard	9317a712ec	Fix deployment methods in docs/services.md (#34 ) Several services were incorrectly listed as Docker when they actually run as native systemd services: - helsinki-a: Caddy is apt-installed, not Docker - london-b: Radarr, Sonarr, Lidarr, Readarr, Prowlarr are systemd services managed by media_stack role - london-b: Jellyfin, Plex, Transmission are apt packages with systemd units Updated Deployment column to reflect actual deployment method. Fixes PESO-116	2026-04-02 22:48:14 +01:00
Rasmus "Pez" Wejlgaard	3ce559d7b9	Wire thiswebsitedoesnotexist.service into deployment pipeline - Move unit file from services/systemd/helsinki-a/ to services/thiswebsitedoesnotexist/ (matches systemd_services role convention) - Add systemd_services: [thiswebsitedoesnotexist] to helsinki-a host_vars - Add systemd_services role to helsinki-a stage in deploy.yml - Remove redundant caddy.service (apt manages this via the caddy role) Closes PESO-117	2026-04-02 22:19:26 +01:00
Rasmus "Pez" Wejlgaard	3c751af3ce	fix(firewall_alpine): replace empty iptables ruleset with proper INPUT filtering (#32 ) * Bind node_exporter to Tailscale IP on public-facing hosts node_exporter was listening on 0.0.0.0:9100 on helsinki-a and london-a, exposing metrics to the public internet. Changes: - Add node_exporter_bind_tailscale flag (default false) to opt in - Set flag on helsinki-a and london-a host_vars - Debian: configure ARGS in /etc/default/prometheus-node-exporter - FreeBSD: use native node_exporter_listen_address rc.conf variable - Add handlers to restart on config change Prometheus already scrapes via Tailscale IPs, no scrape config changes needed. Fixes PESO-98 * fix(firewall_alpine): replace empty iptables ruleset with proper INPUT filtering The rules.v4.j2 template deployed a ruleset with INPUT ACCEPT and zero custom rules — effectively a no-op. nuremberg-a is a public-facing mail server and needs actual filtering. Changes: - INPUT default policy set to DROP - Allow loopback, established/related, Tailscale interface, SSH, ICMP - FORWARD stays ACCEPT for Docker port-forwarding - Added firewall_alpine_extra_input_rules variable for host-specific rules Mail ports remain handled by Docker's FORWARD chain, not INPUT. Closes PESO-119	2026-04-02 21:18:11 +01:00
Rasmus "Pez" Wejlgaard	f2cebcdf38	Bind node_exporter to Tailscale IP on public-facing hosts (#31 ) node_exporter was listening on 0.0.0.0:9100 on helsinki-a and london-a, exposing metrics to the public internet. Changes: - Add node_exporter_bind_tailscale flag (default false) to opt in - Set flag on helsinki-a and london-a host_vars - Debian: configure ARGS in /etc/default/prometheus-node-exporter - FreeBSD: use native node_exporter_listen_address rc.conf variable - Add handlers to restart on config change Prometheus already scrapes via Tailscale IPs, no scrape config changes needed. Fixes PESO-98	2026-03-30 22:56:59 +01:00
Rasmus "Pez" Wejlgaard	a74213b4cb	copenhagen-a: document all live services in host_vars and docs (#30 ) Audit of copenhagen-a found several running services not captured in host_vars: cloudflared, node_exporter (systemd), and MariaDB. Also found postgresql and redis running with no active consumers. Updated host_vars to list all services and added undocumented_services for the potentially unused ones. Updated docs with cloudflare tunnel, monitoring, and notes about stale Docker images to clean up. Closes PESO-100	2026-03-30 22:10:27 +01:00
Rasmus "Pez" Wejlgaard	0bcc53b01d	Document undocumented services on london-a (#29 ) Audit of london-a rc.conf found several services running but not captured in host_vars or docs: cloudflared, InfluxDB, Redis, PostgreSQL, and libvirtd. - InfluxDB: only _internal db, completely unused - Redis: empty keyspace, unused - PostgreSQL: has pez_vps db from a dead project, needs data review - libvirtd: zero VMs, related to same dead project - cloudflared: running tunnel 168eccae, config now captured Also documented the weekly ZFS scrub cron (Sundays at noon) which is in root's crontab but not ansible-managed. Ref: PESO-101	2026-03-30 21:39:57 +01:00
Rasmus "Pez" Wejlgaard	eb9f026abd	Clean up stale DNS records and Caddyfile entries (#28 ) Remove webdav.pez.sh DNS record (WebDAV replaced by Nextcloud AIO on cloud.pez.sh) Remove alertmanager.pez.sh DNS record and Caddyfile block (Alertmanager not running on london-a) Remove status-https HTTPS record pointing to old statuspage.io (status.pez.sh is self-hosted on helsinki-a) Remove commented-out WebDAV block from Caddyfile Remove empty section headers for decommissioned hosts (london-c, copenhagen-b, copenhagen-c) Closes PESO-102	2026-03-30 21:12:52 +01:00
Rasmus "Pez" Wejlgaard	551d4c985b	Merge pull request #27 from RWejlgaard/readme-update update readme	2026-03-30 19:49:52 +01:00
Rasmus Wejlgaard	5b98ea4e6a	update readme	2026-03-30 19:42:47 +01:00
Rasmus "Pez" Wejlgaard	94d7f20c9b	Merge pull request #26 from RWejlgaard/fix/docker-compose-v2-conflict fix: remove docker-compose-v2 before installing docker-compose-plugin	2026-03-30 19:11:35 +01:00
Rasmus Wejlgaard	cfb2e83070	fix: remove docker-compose-v2 before installing docker-compose-plugin copenhagen-a had Ubuntu's docker-compose-v2 package installed, which conflicts with Docker's official docker-compose-plugin over /usr/libexec/docker/cli-plugins/docker-compose. Moved the removal task before the install task and added docker-compose-v2 to the removal list.	2026-03-30 18:08:50 +00:00
Rasmus "Pez" Wejlgaard	b16f89357b	replace hard set ip with vars (#25 ) * replace hard set ip with vars * run all PR checks every time	2026-03-29 21:33:50 +01:00
Rasmus "Pez" Wejlgaard	431c65065a	Add Docker official apt repo to docker role (#24 ) * Add Docker official apt repo to docker role The docker role was installing docker-compose-plugin which is only available from Docker's official apt repository. helsinki-a had it configured manually, but london-b and copenhagen-a did not, causing deploy failures. Now the role: - Adds Docker's GPG key and apt repo (handles both Debian and Ubuntu) - Installs docker-ce, docker-ce-cli, containerd.io, docker-compose-plugin - Removes conflicting stock packages (docker.io, docker-compose) * fix: resolve yamllint violations in docker role - Remove standalone comment blocks that caused indentation errors - Collapse multiline repo string to single line - Ensure document start marker is present * fix: keep all lines under 160 chars for yamllint Use set_fact to build the Docker repo line in parts instead of one long inline string. * fix: resolve yamllint errors in london-b host_vars and promtail config - Remove trailing blank line in inventory/host_vars/london-b.yml - Add missing document start marker to promtail config - Fix indentation in promtail scrape_configs (indent list items under key) * Remove ansible-lint on push, keep PR-only Lint already runs on pull_request — no need to double up on push to main.	2026-03-29 21:11:33 +01:00
Rasmus "Pez" Wejlgaard	4be8f73ffe	add hetzner servers terraform (#23 ) Co-authored-by: Rasmus Wejlgaard <pez@Mac.localdomain>	2026-03-29 20:58:50 +01:00
Rasmus "Pez" Wejlgaard	353c2ad790	Capture london-b media stack and systemd services (#19 ) Add the full media automation stack (sonarr, radarr, prowlarr, lidarr, readarr, whisparr), media servers (jellyfin, plex), and supporting services (transmission, samba, ollama, promtail, cloudflared, vsftpd) to the repo as a media_stack Ansible role. Includes: - Custom systemd unit files for non-package-managed services - Config files for promtail, samba, transmission, vsftpd - Cron jobs for movie-rename-fix, sonarr/radarr midnight restarts - Updated deploy.yml to wire the role into london-b's stage - Updated london-b docs with full service inventory Backup script (backup.sh) already covered by the existing backup role. Node/systemd exporters already covered by existing monitoring roles. Closes PESO-92	2026-03-29 19:13:48 +01:00

1 2

80 commits