RWejlgaard/pez-infra

mirror of https://github.com/RWejlgaard/pez-infra.git synced 2026-07-04 15:46:16 +00:00

Author	SHA1	Message	Date
Rasmus Wejlgaard	0a1de0c85c	fix: add *.k8s endpoint	2026-06-28 15:24:29 +01:00
Rasmus "Pez" Wejlgaard	4cdb2d3fe4	fix: add n8n deployment to nuremberg-a (#139 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Terraform / Plan (push) Has been cancelled Details Deploy (on merge) / deploy (push) Has been cancelled Details Terraform / Apply (push) Has been cancelled Details	2026-06-26 19:54:30 +01:00
Rasmus "Pez" Wejlgaard	ac8dabe9a4	media_stack: capture london-b sonarr.service unit in repo (PESO-140) (#133 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / deploy (push) Has been cancelled Details sonarr was the only *arr service without its systemd unit in the repo — it was treated as package-managed and never captured, so a london-b rebuild would lose the unit. Capture the running unit (APT/mono Sonarr v3) into ansible/services/sonarr/sonarr.service and have the media_stack role deploy it to /etc/systemd/system like radarr/lidarr/prowlarr, overriding the package-owned copy. Move sonarr out of the package-managed enable loop into the custom-unit deploy + enable loops.	2026-06-14 21:10:43 +01:00
Rasmus "Pez" Wejlgaard	8665a5fe99	remove stale promtail/rc.d leftovers, rss DNS record, fix london-c host description (#131 )	2026-06-12 19:24:39 +01:00
Rasmus "Pez" Wejlgaard	0c00a3cb4d	docs: remove decommissioned Miniflux refs; fix status-page + minor drift (#129 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / deploy (push) Has been cancelled Details	2026-06-09 19:49:16 +01:00
Rasmus "Pez" Wejlgaard	9d56a22c30	Ansible-manage docker-log-cleanup script and cron (PESO-142) (#128 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details docker-log-cleanup.sh lived in the repo but nothing deployed it — the script and monthly cron on nuremberg-a were set up by hand and got wiped when the host was reinstalled. Fold both into the docker role so every docker_hosts member gets the script in /usr/local/bin and a monthly cron, and it survives a rebuild.	2026-06-08 18:38:19 +01:00
Rasmus "Pez" Wejlgaard	3945b8cafc	remove miniflux — decommissioned (#127 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details Stopped and removed containers on london-b. Removed compose definition, Caddy reverse proxy route for rss.pez.sh, and london-b host_vars entry.	2026-06-07 18:07:11 +01:00
Rasmus "Pez" Wejlgaard	9ac179dbec	Make Alloy resilient to transient failures; remove leftover Grafana (PESO-149) (#126 ) copenhagen-c stopped reporting to Grafana Cloud on 2026-05-20: a transient TLS failure to fleet-management tripped systemd's default start rate-limit, systemd gave up, and the host sat silently unmonitored for ~2.5 weeks. Add a 10-resilience.conf systemd drop-in for alloy.service on every host (StartLimitIntervalSec=0, Restart=always, RestartSec=30) so a momentary upstream/TLS blip can no longer permanently kill the collector. Also drop the old self-hosted Grafana package that was left enabled and failing on copenhagen-c after the move to Grafana Cloud.	2026-06-07 14:30:08 +01:00
Rasmus "Pez" Wejlgaard	81efa1b717	Remove stale cloudflared service from copenhagen-a (PESO-138) (#125 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details cloudflared was retired in #56 when Caddy + Authelia replaced Cloudflare Tunnels, but copenhagen-a was unreachable at the time so its cloudflared.service was never stopped and is still running. Add a cleanup task to the common role that stops, disables and purges cloudflared wherever the unit lingers. Gated on the unit file existing so it self-targets copenhagen-a and is a no-op everywhere else, and explicitly excludes copenhagen-c, which legitimately runs a hand-configured tunnel.	2026-06-07 11:45:35 +01:00
Rasmus "Pez" Wejlgaard	3871dc8f90	Restrict london-b Samba (445) to LAN + Tailscale, off public internet (#124 ) Samba on london-b was allowed on 445/tcp from anywhere via UFW, exposing SMB/CIFS to the public internet. Tailscale already reaches it through the tailscale0 allow-all rule, so scope the explicit rule to the local London LAN (192.168.1.0/24) instead of the world. The common UFW task only ever adds allow rules, so it gained support for an optional per-port from_ip, plus a follow-up task that deletes the superseded world-open variant of any source-restricted port — otherwise the old '445 ALLOW Anywhere' rule would linger on the host and defeat the change. PESO-145	2026-06-07 11:37:45 +01:00
Rasmus "Pez" Wejlgaard	644b608831	chore: retire readarr service, replaced by bookshelf (#123 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details Bookshelf (PR #122) is a Readarr revival and now owns port 8787 on london-b, so the old custom Readarr systemd unit is removed: - drop readarr from the media_stack role's unit-deploy and enable loops, and add an idempotent decommission task (stop, disable, remove unit) so the host tears it down via Ansible rather than ad-hoc SSH - delete services/readarr/readarr.service - update docs (services, london-b host, service inventory) to describe bookshelf as a Docker service instead of a custom systemd unit The public readarr.pez.sh hostname is kept and now reverse-proxies to bookshelf on :8787 — DNS, Caddy and Authelia (pez_readarr_users group) are unchanged.	2026-06-06 15:50:37 +01:00
Rasmus "Pez" Wejlgaard	98ac065056	feat: add bookshelf service on london-b (#122 ) Bookshelf (a Readarr revival) for managing the ebook/audiobook library. Runs on london-b with config at /root/bookshelf and the library at /hdd/books mounted into the container at the same path.	2026-06-06 15:34:57 +01:00
Rasmus "Pez" Wejlgaard	a40cd60d60	backup: keep deleted/overwritten versions instead of mirroring them away (#120 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details Terraform / Plan (push) Waiting to run Details Terraform / Apply (push) Blocked by required conditions Details The nightly job runs 'rclone sync', which permanently deletes or overwrites objects at the B2 destination. That means an accidental deletion or a ransomware encryption on /hdd propagates straight to the backup on the next run, leaving no clean copy. Add --backup-dir so every superseded version is moved into a dated folder under _versions/ rather than thrown away, and prune that folder after 30 days so it doesn't grow unbounded.	2026-06-05 21:23:04 +01:00
Rasmus "Pez" Wejlgaard	9815f44b84	fix: stop masking failed service deploys; trim dead config (#119 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / deploy (push) Has been cancelled Details The docker_services and systemd_services roles ran their "start the service" tasks with `failed_when: false`, so a container or unit that failed to come up still reported the deploy as green. Drop it from both start tasks so a broken deploy actually fails CI. The compose/unit copy tasks keep `failed_when: false` — that's load-bearing for the `item is not failed` filter that skips services without a compose/unit file. Also: - Remove a duplicate "Template service .env files" task in docker_services (second copy used a hardcoded path and didn't register; first one is the one the start task reads). - Don't trigger a full fleet deploy on docs/markdown/workflow-only pushes to main — add docs/, /.md and .github/* to paths-ignore. - Drop the dangling `update-freebsd` Make target (playbook doesn't exist; fleet has no FreeBSD hosts).	2026-06-04 18:41:24 +01:00
Rasmus "Pez" Wejlgaard	45dff99e7c	fix: update octopus exporter (#113 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / deploy (push) Has been cancelled Details	2026-05-26 20:56:07 +01:00
Rasmus "Pez" Wejlgaard	a031d4218b	fix: Documentation overhaul (#112 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / deploy (push) Has been cancelled Details * fix: Documentation overhaul * removing joke graph	2026-05-19 18:49:21 +01:00
Rasmus "Pez" Wejlgaard	9f84652102	fix: cleanup deploy.yml and share workflow (#108 ) * fix: cleanup deploy.yml and share workflow * lint issue	2026-05-15 20:17:28 +01:00
Rasmus "Pez" Wejlgaard	69145b3089	fix: add smb mount (#107 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details * fix: add smb mount * update secrets * address linting issues	2026-05-14 20:49:25 +01:00
Rasmus "Pez" Wejlgaard	5481292b7f	fix: remove subscription nag and lock down proxmox (#106 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details	2026-05-13 21:09:54 +01:00
Rasmus "Pez" Wejlgaard	d3b516c594	fix: cleanup freebsd and alpine stuff (#105 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details	2026-05-12 22:43:12 +01:00
Rasmus "Pez" Wejlgaard	e502a92451	fix: tracing on caddy services (#104 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Terraform / Plan (push) Has been cancelled Details Deploy (on merge) / Deploy → (push) Has been cancelled Details Terraform / Apply (push) Has been cancelled Details	2026-05-10 10:18:53 +01:00
Rasmus "Pez" Wejlgaard	b5d5537c1f	Proxmox ve on london a (#102 ) * fix: update config for london-a for new proxmox install * fix: update proxmox endpoint	2026-05-09 19:29:44 +01:00
Rasmus "Pez" Wejlgaard	928d1d0b99	fix: update config for london-a for new proxmox install (#101 )	2026-05-09 19:22:34 +01:00
Rasmus "Pez" Wejlgaard	7d22ad1ce1	bug: add retry to restarting caddy (#97 ) Some checks failed Terraform / Plan (push) Waiting to run Details Terraform / Apply (push) Blocked by required conditions Details Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / Deploy → (push) Has been cancelled Details * bug: add retry to restarting caddy * skip terraform pipeline when no terraform changes has been done	2026-05-05 20:42:52 +01:00
Rasmus "Pez" Wejlgaard	abb283c1d7	terraform plan on pr and caddy metrics on localhost since we have all… (#96 ) * terraform plan on pr and caddy metrics on localhost since we have alloy now * remove refreshing state	2026-05-05 13:35:37 +01:00
Rasmus "Pez" Wejlgaard	043c783361	Grafana Cloud Migration (#94 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details Terraform / Plan (push) Waiting to run Details Terraform / Apply (push) Blocked by required conditions Details * Grafana Cloud migration, adding dashboards, fleet, alloy and synthetics * modulize stuff now that we have multiple substantial things in here * provider updates and new secrets * remove grafana and prometheus from ansible	2026-05-04 13:40:30 +01:00
Rasmus "Pez" Wejlgaard	83f023aedd	Migration to Grafana Cloud, nuremberg-a reinstalled, london-a reinsta… (#93 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details Terraform / Plan (push) Waiting to run Details Terraform / Apply (push) Blocked by required conditions Details * Migration to Grafana Cloud, nuremberg-a reinstalled, london-a reinstalled * dns config for cockpit	2026-05-03 14:00:22 +01:00
Rasmus "Pez" Wejlgaard	e5306a5409	Fixing loki alloy (#87 ) * add alloy to docker group * fix: use docker driver instead of hacky alloy setup * fixing linting issue	2026-04-29 20:07:40 +01:00
Rasmus "Pez" Wejlgaard	a51a0879d3	add alloy to docker group (#86 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details	2026-04-28 20:53:19 +01:00
Rasmus "Pez" Wejlgaard	6a3618aa4a	fix: Fixing loki alloy (#85 ) * fix: alloy * fix: alpine doesn't need a hacky install	2026-04-28 20:30:30 +01:00
Rasmus "Pez" Wejlgaard	b474e28528	fix: alloy (#84 )	2026-04-28 20:10:20 +01:00
Rasmus "Pez" Wejlgaard	5391c500e1	fix: loki & alloy (#83 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details * fix: loki & alloy * fix linting	2026-04-28 16:40:45 +01:00
Rasmus "Pez" Wejlgaard	a7f51ec10c	fix: update octo exporter (#82 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details	2026-04-27 20:10:11 +01:00
Rasmus "Pez" Wejlgaard	5c404dca87	fix: update octopus_exporter to v1.1.1 (#81 ) Some checks failed Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details Terraform / Plan (push) Has been cancelled Details Terraform / Apply (push) Has been cancelled Details	2026-04-26 21:01:24 +01:00
Rasmus "Pez" Wejlgaard	10bb940f87	fix: update living room dashboard (#74 )	2026-04-26 14:09:09 +01:00
Rasmus "Pez" Wejlgaard	af2f462c1c	fix: prometheus retention and authelia fix (#73 ) Some checks are pending Deploy (on merge) / Deploy (push) Waiting to run Details Terraform / Plan (push) Waiting to run Details Terraform / Apply (push) Blocked by required conditions Details * fix: prometheus retention time * also fix bug with authelia * linting issues * more linting	2026-04-25 21:35:39 +01:00
Rasmus "Pez" Wejlgaard	b82013c2f0	fix: actually decomission nextcloud and TWDNE (#72 ) * fix: actually decomission nextcloud and TWDNE * ignore spaces in lint and remove dns for the services * linting on the linting config wasn't linting the lints	2026-04-25 18:19:16 +01:00
Rasmus "Pez" Wejlgaard	35c5079d8f	fix: remove cloud and TWDNE and add energy dashboard for grafana (#71 )	2026-04-25 17:46:17 +01:00
Rasmus "Pez" Wejlgaard	b3cc47f3d6	fix: optimize deploy playbook and get rid of deprecated stuff (#70 )	2026-04-25 15:04:16 +01:00
Rasmus "Pez" Wejlgaard	7df62e8848	fix: adding octopus_exporter compose (#69 ) * fix: adding octopus_exporter compose * add the secret for octopus	2026-04-25 12:38:12 +01:00
Rasmus "Pez" Wejlgaard	56bec98afc	fix: Add octopus_exporter job configuration (#68 )	2026-04-22 21:28:14 +01:00
Rasmus "Pez" Wejlgaard	c495b73720	template prometheus config (#67 )	2026-04-21 20:44:37 +01:00
Rasmus "Pez" Wejlgaard	34820ee663	adding london-c (#66 )	2026-04-20 20:52:19 +01:00
Rasmus "Pez" Wejlgaard	177fbb4014	Change provider for plex metrics (#65 ) * change provider for plex metrics * update plex token * update plex token loading	2026-04-13 19:04:54 +01:00
Rasmus "Pez" Wejlgaard	2a98a89eb4	Change provider for plex metrics (#64 ) * change provider for plex metrics * update plex token	2026-04-12 21:21:24 +01:00
Rasmus "Pez" Wejlgaard	a0ec92dfdd	change provider for plex metrics (#63 )	2026-04-12 18:45:30 +01:00
Rasmus "Pez" Wejlgaard	49cee191b5	fix: bind mariadb to local ip (#62 )	2026-04-11 21:24:11 +01:00
Rasmus "Pez" Wejlgaard	1ef59ccc4a	fix: add mangos ports to firewall (#61 )	2026-04-11 20:42:17 +01:00
Rasmus "Pez" Wejlgaard	1ab278e47a	only send email if something went wrong with backups (#60 )	2026-04-06 18:33:07 +01:00
Rasmus "Pez" Wejlgaard	4c7ea76d81	fix: remove node_exporter from copenhagen-a systemd_services (#59 ) node_exporter is deployed by the dedicated node_exporter Ansible role using distro packages (prometheus-node-exporter). Having it in systemd_services causes the systemd_services role to look for a non-existent services/node_exporter/node_exporter.service file, producing errors during deploy. Resolves PESO-135	2026-04-04 12:51:52 +01:00

1 2 3

101 commits