RWejlgaard/pez-infra

mirror of https://github.com/RWejlgaard/pez-infra.git synced 2026-07-04 15:46:16 +00:00

Author	SHA1	Message	Date
Rasmus "Pez" Wejlgaard	5391c500e1	fix: loki & alloy (#83 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details * fix: loki & alloy * fix linting	2026-04-28 16:40:45 +01:00
Rasmus "Pez" Wejlgaard	b82013c2f0	fix: actually decomission nextcloud and TWDNE (#72 ) * fix: actually decomission nextcloud and TWDNE * ignore spaces in lint and remove dns for the services * linting on the linting config wasn't linting the lints	2026-04-25 18:19:16 +01:00
Rasmus "Pez" Wejlgaard	7df62e8848	fix: adding octopus_exporter compose (#69 ) * fix: adding octopus_exporter compose * add the secret for octopus	2026-04-25 12:38:12 +01:00
Rasmus "Pez" Wejlgaard	c495b73720	template prometheus config (#67 )	2026-04-21 20:44:37 +01:00
Rasmus "Pez" Wejlgaard	34820ee663	adding london-c (#66 )	2026-04-20 20:52:19 +01:00
Rasmus "Pez" Wejlgaard	49cee191b5	fix: bind mariadb to local ip (#62 )	2026-04-11 21:24:11 +01:00
Rasmus "Pez" Wejlgaard	1ef59ccc4a	fix: add mangos ports to firewall (#61 )	2026-04-11 20:42:17 +01:00
Rasmus "Pez" Wejlgaard	4c7ea76d81	fix: remove node_exporter from copenhagen-a systemd_services (#59 ) node_exporter is deployed by the dedicated node_exporter Ansible role using distro packages (prometheus-node-exporter). Having it in systemd_services causes the systemd_services role to look for a non-existent services/node_exporter/node_exporter.service file, producing errors during deploy. Resolves PESO-135	2026-04-04 12:51:52 +01:00
Rasmus "Pez" Wejlgaard	ed6eb22f60	Remove cloudflared — replaced by Caddy reverse proxy (#56 ) Cloudflared tunnels are no longer used. All traffic now routes through Cloudflare DNS to Caddy on helsinki-a over Tailscale. - Remove cloudflared systemd unit files (copenhagen-a, london-b) - Remove cloudflared from media_stack role and copenhagen-a host_vars - Remove cloudflared references from services README and host docs - Remove cloudflared deploy trigger from CI workflow Live service on london-b stopped and disabled. copenhagen-a was unreachable but the tunnel is unused regardless.	2026-04-03 22:51:12 +01:00
Rasmus "Pez" Wejlgaard	99c2091b96	Add smartctl-exporter to copenhagen-a and Prometheus scrape (#55 ) - Add smartctl-exporter to copenhagen-a docker_services - Add copenhagen-a as a Prometheus smartmontools scrape target - Update compose file comment to reflect multi-host usage Closes PESO-128	2026-04-03 21:20:20 +01:00
Rasmus "Pez" Wejlgaard	d8757d37e1	fix(london-a): correct grafana provisioning dir path (#53 ) grafana.ini on london-a sets provisioning = /usr/local/etc/grafana/provisioning but grafana_provisioning_dir pointed at /usr/local/share/grafana/conf/provisioning. This meant deploy.yml synced alerting rules, dashboards provisioning, and datasources to a path Grafana never reads — a from-scratch deploy would have broken alerting entirely. Fixes PESO-131	2026-04-03 20:20:15 +01:00
Rasmus "Pez" Wejlgaard	25d201f930	Add copenhagen-a to docker_hosts and wire up minecraft docker service (#52 ) - Add copenhagen-a to [docker_hosts] inventory group so the docker role runs on it in Stage 2 - Add docker_services: [minecraft] to copenhagen-a host_vars - Add docker_services role to Stage 4d (copenhagen-a) in deploy.yml - Update deploy-on-merge scope mapping to include copenhagen-a for docker role changes Closes PESO-132	2026-04-03 19:50:51 +01:00
Rasmus "Pez" Wejlgaard	dca6a08ba1	Remove cloudflared from london-a (PESO-134) (#50 ) cloudflared has been replaced by Caddy + Authelia. Removed: - cloudflared service config (services/cloudflared/london-a/) - tunnel ID from london-a host_vars - cloudflared_enable from rc.conf Also synced rc.conf with live server state (disabled services from PESO-113, added node_exporter_listen_address). Live server: stopped service, removed from rc.conf, uninstalled pkg.	2026-04-03 18:51:51 +01:00
Rasmus "Pez" Wejlgaard	a31f8b5651	Add systemd_exporter Ansible role and Prometheus scrape config (#49 ) * Add systemd_exporter Ansible role and Prometheus scrape config - Create systemd_exporter role (download binary, create user, deploy service) - Add scrape job for london-b:9558 and copenhagen-a:9558 - Add systemd_exporter_hosts inventory group - Add stage 3b to deploy.yml - Map role to deploy-on-merge scope Closes PESO-120 * Fix line length lint violations in systemd_exporter tasks * Fix var-naming lint: use systemd_exporter_ prefix for role variables	2026-04-03 12:23:38 +01:00
Rasmus "Pez" Wejlgaard	49cea826e0	capture overseerr, syncthing, and fix slskd on london-b (#43 )	2026-04-03 09:52:10 +01:00
Rasmus "Pez" Wejlgaard	ff8d7a53e7	Remove copenhagen-a from docker_hosts and docker_services (#45 ) Docker is masked on copenhagen-a and Minecraft is no longer managed via Docker Compose. Removes: - copenhagen-a from [docker_hosts] inventory group - docker_services var from copenhagen-a host_vars - docker_services role from Stage 4d deploy play MaNGOS systemd services remain unchanged. Fixes PESO-104	2026-04-03 04:18:46 +01:00
Rasmus "Pez" Wejlgaard	853386ce2f	fix: remove custom node_exporter, standardise on package version (#40 ) london-b had both a custom node_exporter.service and the package-managed prometheus-node-exporter.service installed. Both tried to bind port 9100, causing the package version to fail. - Add cleanup tasks to remove custom /etc/systemd/system/node_exporter.service and /usr/local/bin/node_exporter if present - Add node_exporter_extra_collectors variable for configurable collectors - Configure london-b with systemd/processes/sysctl/ethtool/zfs collectors matching its previous custom setup Resolves PESO-109	2026-04-03 01:50:13 +01:00
Rasmus "Pez" Wejlgaard	d3bce0d5c2	nuremberg-a: add poste-io to docker_services (#38 ) Adds docker_services list to nuremberg-a host_vars so the docker_services role deploys and manages the poste-io mail container via docker compose, replacing the current manual container setup.	2026-04-03 00:49:50 +01:00
Rasmus "Pez" Wejlgaard	5a5c60b6b2	london-a: disable unused services (InfluxDB, Redis, PostgreSQL, libvirtd) (#37 ) Services stopped and disabled in rc.conf on london-a. Removed audit variables from host_vars, replaced with cleanup note. All four were leftovers from a defunct pez_vps project: - InfluxDB: no user databases, only _internal - Redis: empty keyspace, no clients - PostgreSQL: defunct pez_vps database (Pez approved removal) - libvirtd: zero VMs defined Resolves PESO-113	2026-04-03 00:17:58 +01:00
Rasmus Wejlgaard	00b967d930	fix trailing blank line in copenhagen-a host_vars and missing document start in cloudflared config	2026-04-02 23:13:18 +00:00
Rasmus "Pez" Wejlgaard	ca3d9c4261	Remove undocumented_services from copenhagen-a host_vars (#35 ) PostgreSQL 14 and Redis have been stopped, disabled, purged, and data directories removed from copenhagen-a. These were leftovers from an old WordPress project with no user data. Resolves: PESO-114	2026-04-02 23:53:15 +01:00
Rasmus "Pez" Wejlgaard	3ce559d7b9	Wire thiswebsitedoesnotexist.service into deployment pipeline - Move unit file from services/systemd/helsinki-a/ to services/thiswebsitedoesnotexist/ (matches systemd_services role convention) - Add systemd_services: [thiswebsitedoesnotexist] to helsinki-a host_vars - Add systemd_services role to helsinki-a stage in deploy.yml - Remove redundant caddy.service (apt manages this via the caddy role) Closes PESO-117	2026-04-02 22:19:26 +01:00
Rasmus "Pez" Wejlgaard	f2cebcdf38	Bind node_exporter to Tailscale IP on public-facing hosts (#31 ) node_exporter was listening on 0.0.0.0:9100 on helsinki-a and london-a, exposing metrics to the public internet. Changes: - Add node_exporter_bind_tailscale flag (default false) to opt in - Set flag on helsinki-a and london-a host_vars - Debian: configure ARGS in /etc/default/prometheus-node-exporter - FreeBSD: use native node_exporter_listen_address rc.conf variable - Add handlers to restart on config change Prometheus already scrapes via Tailscale IPs, no scrape config changes needed. Fixes PESO-98	2026-03-30 22:56:59 +01:00
Rasmus "Pez" Wejlgaard	a74213b4cb	copenhagen-a: document all live services in host_vars and docs (#30 ) Audit of copenhagen-a found several running services not captured in host_vars: cloudflared, node_exporter (systemd), and MariaDB. Also found postgresql and redis running with no active consumers. Updated host_vars to list all services and added undocumented_services for the potentially unused ones. Updated docs with cloudflare tunnel, monitoring, and notes about stale Docker images to clean up. Closes PESO-100	2026-03-30 22:10:27 +01:00
Rasmus "Pez" Wejlgaard	0bcc53b01d	Document undocumented services on london-a (#29 ) Audit of london-a rc.conf found several services running but not captured in host_vars or docs: cloudflared, InfluxDB, Redis, PostgreSQL, and libvirtd. - InfluxDB: only _internal db, completely unused - Redis: empty keyspace, unused - PostgreSQL: has pez_vps db from a dead project, needs data review - libvirtd: zero VMs, related to same dead project - cloudflared: running tunnel 168eccae, config now captured Also documented the weekly ZFS scrub cron (Sundays at noon) which is in root's crontab but not ansible-managed. Ref: PESO-101	2026-03-30 21:39:57 +01:00
Rasmus "Pez" Wejlgaard	431c65065a	Add Docker official apt repo to docker role (#24 ) * Add Docker official apt repo to docker role The docker role was installing docker-compose-plugin which is only available from Docker's official apt repository. helsinki-a had it configured manually, but london-b and copenhagen-a did not, causing deploy failures. Now the role: - Adds Docker's GPG key and apt repo (handles both Debian and Ubuntu) - Installs docker-ce, docker-ce-cli, containerd.io, docker-compose-plugin - Removes conflicting stock packages (docker.io, docker-compose) * fix: resolve yamllint violations in docker role - Remove standalone comment blocks that caused indentation errors - Collapse multiline repo string to single line - Ensure document start marker is present * fix: keep all lines under 160 chars for yamllint Use set_fact to build the Docker repo line in parts instead of one long inline string. * fix: resolve yamllint errors in london-b host_vars and promtail config - Remove trailing blank line in inventory/host_vars/london-b.yml - Add missing document start marker to promtail config - Fix indentation in promtail scrape_configs (indent list items under key) * Remove ansible-lint on push, keep PR-only Lint already runs on pull_request — no need to double up on push to main.	2026-03-29 21:11:33 +01:00
Rasmus "Pez" Wejlgaard	69918c8619	Add ZFS management role: scrub scheduling and pool monitoring (#18 ) - New zfs role with cron-based scrub scheduling for Linux and FreeBSD - Weekly Sunday scrubs at noon (matching existing manual crons) - Add zfs_hosts inventory group with london-a and london-b - Configure zfs_pools per host: zroot (london-a), hdd (london-b) - Add Prometheus alert rules for degraded/faulted/offline pools - Add zfs.yml playbook for targeted deploys Captures the previously untracked scrub cron on london-a and re-enables the commented-out scrub on london-b. Refs: PESO-93	2026-03-29 19:12:42 +01:00
Rasmus "Pez" Wejlgaard	3d8fb84d1f	Feat/london b plex ufw (#21 ) * Allow Plex port (32400/tcp) through UFW on london-b Plex needs direct access on port 32400 for remote streaming. Adds common_ufw_allowed_ports to london-b host_vars. * Add BitTorrent port (6881) to london-b UFW allowed ports Port was already manually configured in UFW, bringing it under Ansible management. * Add Samba port (445/tcp) to london-b UFW allowed ports	2026-03-29 19:12:10 +01:00
Rasmus "Pez" Wejlgaard	106c45fc81	Add helsinki-a to docker_hosts inventory group (#20 ) helsinki-a runs Docker containers (authelia, forgejo, bitwarden) but was missing from docker_hosts. This means the docker role and docker-status playbook weren't targeting it during deploys. Closes PESO-91	2026-03-29 17:08:34 +01:00
Rasmus "Pez" Wejlgaard	a7a71e4f87	capture nuremberg-a firewall rules in pez-infra (#15 ) Add firewall_alpine role for Alpine hosts with iptables persistence and fail2ban SSH jails. Wire it into nuremberg-a's deploy stage. Mail ports are already exposed via Docker port mappings in the poste-io docker-compose — this captures the surrounding iptables and fail2ban config that was previously undocumented. Closes PESO-96	2026-03-29 14:40:10 +01:00
Rasmus "Pez" Wejlgaard	8dffd3732b	Allow Plex port (32400/tcp) through UFW on london-b (#12 ) * Allow Plex port (32400/tcp) through UFW on london-b Plex needs direct access on port 32400 for remote streaming. Adds common_ufw_allowed_ports to london-b host_vars. * Add BitTorrent port (6881) to london-b UFW allowed ports Port was already manually configured in UFW, bringing it under Ansible management.	2026-03-29 11:29:06 +01:00
Rasmus "Pez" Wejlgaard	f9d0a7ebf4	fix: resolve UFW ansible-lint failures and deploy error (#11 ) - Fix 'interface_or_direction' → 'direction' (required param for ufw module) - Rename ufw_enabled/ufw_allowed_ports → common_ufw_enabled/common_ufw_allowed_ports (role prefix convention) - Fix yaml[braces] violations in helsinki-a host_vars	2026-03-29 10:53:54 +01:00
Rasmus "Pez" Wejlgaard	4554dec7d2	Remove unused Prometheus alerting config (#10 ) * Configure UFW firewall rules in common Ansible role Add UFW configuration to the common role for Debian hosts: - Default deny incoming, allow outgoing - Allow all traffic on tailscale0 interface (mesh comms) - Allow SSH port 22 as safety net - Per-host allowed ports via ufw_allowed_ports variable - Enable UFW after rules are applied helsinki-a gets ports 80/443 for reverse proxy traffic. Other Debian hosts only need Tailscale + SSH. Closes PESO-79 * Remove unused alerting and rule_files from prometheus.yml Alerting is handled by Grafana, not Prometheus Alertmanager. The empty alertmanagers and rule_files sections were just noise. Resolves PESO-74	2026-03-29 10:37:25 +01:00
Rasmus "Pez" Wejlgaard	da80c58ca4	fix: move authelia, forgejo, bitwarden to helsinki-a host_vars (#8 ) These services run on helsinki-a, not london-b. Verified via docker ps on both hosts. deploy.yml would have managed them on the wrong host. Fixes PESO-73	2026-03-28 22:08:16 +00:00
Rasmus Wejlgaard	737d6e0bc1	initial commit	2026-03-28 12:39:41 +00:00

35 commits