RWejlgaard/pez-infra

mirror of https://github.com/RWejlgaard/pez-infra.git synced 2026-07-04 15:46:16 +00:00

Author	SHA1	Message	Date
Rasmus "Pez" Wejlgaard	8665a5fe99	remove stale promtail/rc.d leftovers, rss DNS record, fix london-c host description (#131 )	2026-06-12 19:24:39 +01:00
Rasmus "Pez" Wejlgaard	0a357fc69a	docs: catch up with the Cloudflare to Hetzner DNS move, fix secrets/terraform drift (#130 ) Some checks failed Terraform / Plan (push) Has been cancelled Details Terraform / Apply (push) Has been cancelled Details The docs still described Cloudflare as DNS + CDN in front of helsinki-a, but that was dropped in #90 - pez.sh lives on Hetzner DNS via Terraform now and records point straight at the origin. Updated README, architecture, networking, getting-started and the nuremberg-a host doc to match, and noted that pez.solutions still resolves via Cloudflare outside Terraform. Also fixed while I was in there: - terraform/README: PagerDuty provider is ~> 3.32 (table said ~> 2.2), and the B2 secret keys are backblaze_keyID/backblaze_applicationKey - secrets docs: group_vars secrets file is .enc.yaml, dropped the FreeBSD install steps, the long-gone .sops.yaml placeholder note and the ANSIBLE_VAULT_PASS migration note, swapped the cloudflare_record example for hcloud - getting-started referenced ansible/scripts/sops-setup.sh which doesn't exist - added naveen.pez.sh to the subdomain tables and a note about the DNS-only records (mail, minecraft, wow, public)	2026-06-10 20:59:23 +01:00
Rasmus "Pez" Wejlgaard	0c00a3cb4d	docs: remove decommissioned Miniflux refs; fix status-page + minor drift (#129 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / deploy (push) Has been cancelled Details	2026-06-09 19:49:16 +01:00
Rasmus "Pez" Wejlgaard	9d56a22c30	Ansible-manage docker-log-cleanup script and cron (PESO-142) (#128 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details docker-log-cleanup.sh lived in the repo but nothing deployed it — the script and monthly cron on nuremberg-a were set up by hand and got wiped when the host was reinstalled. Fold both into the docker role so every docker_hosts member gets the script in /usr/local/bin and a monthly cron, and it survives a rebuild.	2026-06-08 18:38:19 +01:00
Rasmus "Pez" Wejlgaard	3945b8cafc	remove miniflux — decommissioned (#127 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details Stopped and removed containers on london-b. Removed compose definition, Caddy reverse proxy route for rss.pez.sh, and london-b host_vars entry.	2026-06-07 18:07:11 +01:00
Rasmus "Pez" Wejlgaard	9ac179dbec	Make Alloy resilient to transient failures; remove leftover Grafana (PESO-149) (#126 ) copenhagen-c stopped reporting to Grafana Cloud on 2026-05-20: a transient TLS failure to fleet-management tripped systemd's default start rate-limit, systemd gave up, and the host sat silently unmonitored for ~2.5 weeks. Add a 10-resilience.conf systemd drop-in for alloy.service on every host (StartLimitIntervalSec=0, Restart=always, RestartSec=30) so a momentary upstream/TLS blip can no longer permanently kill the collector. Also drop the old self-hosted Grafana package that was left enabled and failing on copenhagen-c after the move to Grafana Cloud.	2026-06-07 14:30:08 +01:00
Rasmus "Pez" Wejlgaard	81efa1b717	Remove stale cloudflared service from copenhagen-a (PESO-138) (#125 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details cloudflared was retired in #56 when Caddy + Authelia replaced Cloudflare Tunnels, but copenhagen-a was unreachable at the time so its cloudflared.service was never stopped and is still running. Add a cleanup task to the common role that stops, disables and purges cloudflared wherever the unit lingers. Gated on the unit file existing so it self-targets copenhagen-a and is a no-op everywhere else, and explicitly excludes copenhagen-c, which legitimately runs a hand-configured tunnel.	2026-06-07 11:45:35 +01:00
Rasmus "Pez" Wejlgaard	3871dc8f90	Restrict london-b Samba (445) to LAN + Tailscale, off public internet (#124 ) Samba on london-b was allowed on 445/tcp from anywhere via UFW, exposing SMB/CIFS to the public internet. Tailscale already reaches it through the tailscale0 allow-all rule, so scope the explicit rule to the local London LAN (192.168.1.0/24) instead of the world. The common UFW task only ever adds allow rules, so it gained support for an optional per-port from_ip, plus a follow-up task that deletes the superseded world-open variant of any source-restricted port — otherwise the old '445 ALLOW Anywhere' rule would linger on the host and defeat the change. PESO-145	2026-06-07 11:37:45 +01:00
Rasmus "Pez" Wejlgaard	644b608831	chore: retire readarr service, replaced by bookshelf (#123 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details Bookshelf (PR #122) is a Readarr revival and now owns port 8787 on london-b, so the old custom Readarr systemd unit is removed: - drop readarr from the media_stack role's unit-deploy and enable loops, and add an idempotent decommission task (stop, disable, remove unit) so the host tears it down via Ansible rather than ad-hoc SSH - delete services/readarr/readarr.service - update docs (services, london-b host, service inventory) to describe bookshelf as a Docker service instead of a custom systemd unit The public readarr.pez.sh hostname is kept and now reverse-proxies to bookshelf on :8787 — DNS, Caddy and Authelia (pez_readarr_users group) are unchanged.	2026-06-06 15:50:37 +01:00
Rasmus "Pez" Wejlgaard	98ac065056	feat: add bookshelf service on london-b (#122 ) Bookshelf (a Readarr revival) for managing the ebook/audiobook library. Runs on london-b with config at /root/bookshelf and the library at /hdd/books mounted into the container at the same path.	2026-06-06 15:34:57 +01:00
Rasmus "Pez" Wejlgaard	85d1cb945e	chore: commit terraform lock file for reproducible provider versions (#121 ) Some checks failed Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details Terraform / Plan (push) Has been cancelled Details Terraform / Apply (push) Has been cancelled Details The .terraform.lock.hcl was gitignored while providers use floating ~> constraints, so every CI 'tofu init' resolved provider versions fresh and could drift from what was tested locally, with no checksum verification on the providers. Track the lock file instead, with hashes for linux_amd64 (CI) plus darwin_arm64/amd64 (local). Dependabot's terraform updates now surface exact provider version bumps as reviewable, hash-pinned changes.	2026-06-06 13:19:08 +01:00
Rasmus "Pez" Wejlgaard	a40cd60d60	backup: keep deleted/overwritten versions instead of mirroring them away (#120 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details Terraform / Plan (push) Waiting to run Details Terraform / Apply (push) Blocked by required conditions Details The nightly job runs 'rclone sync', which permanently deletes or overwrites objects at the B2 destination. That means an accidental deletion or a ransomware encryption on /hdd propagates straight to the backup on the next run, leaving no clean copy. Add --backup-dir so every superseded version is moved into a dated folder under _versions/ rather than thrown away, and prune that folder after 30 days so it doesn't grow unbounded.	2026-06-05 21:23:04 +01:00
dependabot[bot]	7f2cbd4af1	chore(deps): bump the github-actions group across 1 directory with 2 updates (#117 ) Bumps the github-actions group with 2 updates in the / directory: [ansible/ansible-lint](https://github.com/ansible/ansible-lint) and [actions/github-script](https://github.com/actions/github-script). Updates `ansible/ansible-lint` from 25 to 26 - [Release notes](https://github.com/ansible/ansible-lint/releases) - [Commits](https://github.com/ansible/ansible-lint/compare/v25...v26) Updates `actions/github-script` from 7 to 9 - [Release notes](https://github.com/actions/github-script/releases) - [Commits](https://github.com/actions/github-script/compare/v7...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: ansible/ansible-lint dependency-version: '26' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-05 21:13:03 +01:00
dependabot[bot]	24431466c5	chore(deps): bump the terraform group across 2 directories with 1 update (#116 ) Updates the requirements on and [pagerduty/pagerduty](https://github.com/PagerDuty/terraform-provider-pagerduty) to permit the latest version. Updates `pagerduty/pagerduty` to 3.32.4 - [Release notes](https://github.com/PagerDuty/terraform-provider-pagerduty/releases) - [Changelog](https://github.com/PagerDuty/terraform-provider-pagerduty/blob/master/CHANGELOG.md) - [Commits](https://github.com/PagerDuty/terraform-provider-pagerduty/compare/v2.2.0...v3.32.4) Updates `pagerduty/pagerduty` to 3.32.4 - [Release notes](https://github.com/PagerDuty/terraform-provider-pagerduty/releases) - [Changelog](https://github.com/PagerDuty/terraform-provider-pagerduty/blob/master/CHANGELOG.md) - [Commits](https://github.com/PagerDuty/terraform-provider-pagerduty/compare/v2.2.0...v3.32.4) --- updated-dependencies: - dependency-name: pagerduty/pagerduty dependency-version: 3.32.4 dependency-type: direct:production dependency-group: terraform - dependency-name: pagerduty/pagerduty dependency-version: 3.32.4 dependency-type: direct:production dependency-group: terraform ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-05 21:12:59 +01:00
Rasmus "Pez" Wejlgaard	9815f44b84	fix: stop masking failed service deploys; trim dead config (#119 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / deploy (push) Has been cancelled Details The docker_services and systemd_services roles ran their "start the service" tasks with `failed_when: false`, so a container or unit that failed to come up still reported the deploy as green. Drop it from both start tasks so a broken deploy actually fails CI. The compose/unit copy tasks keep `failed_when: false` — that's load-bearing for the `item is not failed` filter that skips services without a compose/unit file. Also: - Remove a duplicate "Template service .env files" task in docker_services (second copy used a hardcoded path and didn't register; first one is the one the start task reads). - Don't trigger a full fleet deploy on docs/markdown/workflow-only pushes to main — add docs/, /.md and .github/* to paths-ignore. - Drop the dangling `update-freebsd` Make target (playbook doesn't exist; fleet has no FreeBSD hosts).	2026-06-04 18:41:24 +01:00
Rasmus "Pez" Wejlgaard	7b2552fea5	chore: fix dependabot PRs (#118 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details * chore: add dependabot config Add Dependabot for the three supported ecosystems in this repo: GitHub Actions, Terraform (root + grafana/hetzner/pagerduty modules), and Docker (service compose files + dotfile Dockerfiles). Weekly schedule with per-ecosystem grouping to keep PR noise down. * ci: make terraform validation work on dependabot PRs Dependabot PRs run with no access to repository secrets and a read-only token, so the SOPS decrypt step (and the PR-comment step) fail. Give Dependabot a secret-free path: stub the secrets.yaml keys it reads and run init -backend=false + validate, skipping decrypt/plan/comment. Human PRs are unchanged and still get a full plan.	2026-06-03 19:29:23 +01:00
Rasmus "Pez" Wejlgaard	7e74232d64	chore: add dependabot config (#115 ) Add Dependabot for the three supported ecosystems in this repo: GitHub Actions, Terraform (root + grafana/hetzner/pagerduty modules), and Docker (service compose files + dotfile Dockerfiles). Weekly schedule with per-ecosystem grouping to keep PR noise down.	2026-06-03 19:15:12 +01:00
Rasmus "Pez" Wejlgaard	65090ca9d6	ci: serialize terraform and deploy runs with concurrency guards (#114 ) Some checks failed Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details Terraform / Plan (push) Has been cancelled Details Terraform / Apply (push) Has been cancelled Details * ci: serialize infra runs and enable terraform state locking Add concurrency guards to the terraform and deploy-on-merge workflows so two merges in quick succession can't run against the same state or the same hosts at once (queue, never cancel an in-flight run). Enable native S3 state locking (use_lockfile) on the Backblaze B2 backend, which needs OpenTofu 1.10+, so bump the CI tofu version 1.9.0 -> 1.10.10 and the required_version constraint to >= 1.10.0. * ci: bump tofu to 1.10.10 in the validate workflow too Missed this one in the last commit — the PR-time validate still pinned 1.9.0, which trips the new required_version >= 1.10.0 constraint. * ci: drop use_lockfile — Backblaze B2 can't do native state locking B2's S3 API returns 501 NotImplemented for the conditional PutObject that use_lockfile relies on, so tofu plan/apply fails to acquire the lock. Revert the lockfile and the 1.10 version bump it required; rely on the concurrency guard to serialize applies instead. Left a note in the backend block so this isn't re-attempted.	2026-06-02 19:39:13 +01:00
Rasmus "Pez" Wejlgaard	45dff99e7c	fix: update octopus exporter (#113 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / deploy (push) Has been cancelled Details	2026-05-26 20:56:07 +01:00
Rasmus "Pez" Wejlgaard	a031d4218b	fix: Documentation overhaul (#112 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / deploy (push) Has been cancelled Details * fix: Documentation overhaul * removing joke graph	2026-05-19 18:49:21 +01:00
Rasmus "Pez" Wejlgaard	1ec4e10eb1	Update cache action (#111 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / deploy (push) Has been cancelled Details * fix: update cache version * fix: update cache	2026-05-16 11:13:38 +01:00
Rasmus "Pez" Wejlgaard	a6aa561147	fix: update cache version (#110 )	2026-05-16 11:03:12 +01:00
Rasmus "Pez" Wejlgaard	7ad2766f94	hotfix: broken pipeline (#109 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details * fix: cleanup deploy.yml and share workflow * lint issue * hotfix: broken pipeline	2026-05-15 20:19:56 +01:00
Rasmus "Pez" Wejlgaard	9f84652102	fix: cleanup deploy.yml and share workflow (#108 ) * fix: cleanup deploy.yml and share workflow * lint issue	2026-05-15 20:17:28 +01:00
Rasmus "Pez" Wejlgaard	69145b3089	fix: add smb mount (#107 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details * fix: add smb mount * update secrets * address linting issues	2026-05-14 20:49:25 +01:00
Rasmus "Pez" Wejlgaard	5481292b7f	fix: remove subscription nag and lock down proxmox (#106 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details	2026-05-13 21:09:54 +01:00
Rasmus "Pez" Wejlgaard	d3b516c594	fix: cleanup freebsd and alpine stuff (#105 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details	2026-05-12 22:43:12 +01:00
Rasmus "Pez" Wejlgaard	e502a92451	fix: tracing on caddy services (#104 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Terraform / Plan (push) Has been cancelled Details Deploy (on merge) / Deploy → (push) Has been cancelled Details Terraform / Apply (push) Has been cancelled Details	2026-05-10 10:18:53 +01:00
Rasmus "Pez" Wejlgaard	06552c5b75	fix: slight tweaks (#103 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details Terraform / Plan (push) Waiting to run Details Terraform / Apply (push) Blocked by required conditions Details * fix: slight tweaks * remove vendor	2026-05-09 20:49:46 +01:00
Rasmus "Pez" Wejlgaard	b5d5537c1f	Proxmox ve on london a (#102 ) * fix: update config for london-a for new proxmox install * fix: update proxmox endpoint	2026-05-09 19:29:44 +01:00
Rasmus "Pez" Wejlgaard	928d1d0b99	fix: update config for london-a for new proxmox install (#101 )	2026-05-09 19:22:34 +01:00
Rasmus "Pez" Wejlgaard	51efda6053	Update fleet_pipelines.tf Some checks are pending Terraform / Plan (push) Waiting to run Details Terraform / Apply (push) Blocked by required conditions Details	2026-05-08 19:32:58 +01:00
Rasmus "Pez" Wejlgaard	d88d2e5d12	Add git synthetic check (#99 ) Some checks failed Terraform / Plan (push) Has been cancelled Details Terraform / Apply (push) Has been cancelled Details	2026-05-06 06:01:59 +01:00
Rasmus "Pez" Wejlgaard	7d22ad1ce1	bug: add retry to restarting caddy (#97 ) Some checks failed Terraform / Plan (push) Waiting to run Details Terraform / Apply (push) Blocked by required conditions Details Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / Deploy → (push) Has been cancelled Details * bug: add retry to restarting caddy * skip terraform pipeline when no terraform changes has been done	2026-05-05 20:42:52 +01:00
Rasmus "Pez" Wejlgaard	abb283c1d7	terraform plan on pr and caddy metrics on localhost since we have all… (#96 ) * terraform plan on pr and caddy metrics on localhost since we have alloy now * remove refreshing state	2026-05-05 13:35:37 +01:00
Rasmus "Pez" Wejlgaard	9bde71fbf9	adding pagerduty stack (#95 ) Some checks are pending Terraform / Plan (push) Waiting to run Details Terraform / Apply (push) Blocked by required conditions Details * adding pagerduty stack * rename files to not be overly descriptive	2026-05-04 20:50:31 +01:00
Rasmus "Pez" Wejlgaard	043c783361	Grafana Cloud Migration (#94 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details Terraform / Plan (push) Waiting to run Details Terraform / Apply (push) Blocked by required conditions Details * Grafana Cloud migration, adding dashboards, fleet, alloy and synthetics * modulize stuff now that we have multiple substantial things in here * provider updates and new secrets * remove grafana and prometheus from ansible	2026-05-04 13:40:30 +01:00
Rasmus "Pez" Wejlgaard	83f023aedd	Migration to Grafana Cloud, nuremberg-a reinstalled, london-a reinsta… (#93 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details Terraform / Plan (push) Waiting to run Details Terraform / Apply (push) Blocked by required conditions Details * Migration to Grafana Cloud, nuremberg-a reinstalled, london-a reinstalled * dns config for cockpit	2026-05-03 14:00:22 +01:00
Rasmus "Pez" Wejlgaard	d22f7a52a0	fix: clean up of terraform (#92 ) Some checks failed Terraform / Plan (push) Has been cancelled Details Terraform / Apply (push) Has been cancelled Details	2026-05-02 14:46:03 +01:00
Rasmus "Pez" Wejlgaard	03ad9b476d	make dns more neat (#91 ) Some checks are pending Terraform / Plan (push) Waiting to run Details Terraform / Apply (push) Blocked by required conditions Details	2026-05-01 21:05:53 +01:00
Rasmus "Pez" Wejlgaard	b5cef4b985	fix: remove cloudflare resources (#90 ) Some checks failed Terraform / Plan (push) Has been cancelled Details Terraform / Apply (push) Has been cancelled Details * phase 1 - add all the records to both providers to A/B test * dkim fix * remove cloudflare resources	2026-04-30 15:55:14 +01:00
Rasmus "Pez" Wejlgaard	ba04d49c4e	Clou dflaring out mayday mayday mayday (#89 ) Some checks failed Terraform / Plan (push) Waiting to run Details Terraform / Apply (push) Blocked by required conditions Details Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / Deploy → (push) Has been cancelled Details * phase 1 - add all the records to both providers to A/B test * dkim fix	2026-04-29 21:23:15 +01:00
Rasmus "Pez" Wejlgaard	dd112fd505	phase 1 - add all the records to both providers to A/B test (#88 )	2026-04-29 20:47:34 +01:00
Rasmus "Pez" Wejlgaard	e5306a5409	Fixing loki alloy (#87 ) * add alloy to docker group * fix: use docker driver instead of hacky alloy setup * fixing linting issue	2026-04-29 20:07:40 +01:00
Rasmus "Pez" Wejlgaard	a51a0879d3	add alloy to docker group (#86 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details	2026-04-28 20:53:19 +01:00
Rasmus "Pez" Wejlgaard	6a3618aa4a	fix: Fixing loki alloy (#85 ) * fix: alloy * fix: alpine doesn't need a hacky install	2026-04-28 20:30:30 +01:00
Rasmus "Pez" Wejlgaard	b474e28528	fix: alloy (#84 )	2026-04-28 20:10:20 +01:00
Rasmus "Pez" Wejlgaard	5391c500e1	fix: loki & alloy (#83 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details * fix: loki & alloy * fix linting	2026-04-28 16:40:45 +01:00
Rasmus "Pez" Wejlgaard	a7f51ec10c	fix: update octo exporter (#82 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details	2026-04-27 20:10:11 +01:00
Rasmus "Pez" Wejlgaard	5c404dca87	fix: update octopus_exporter to v1.1.1 (#81 ) Some checks failed Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details Terraform / Plan (push) Has been cancelled Details Terraform / Apply (push) Has been cancelled Details	2026-04-26 21:01:24 +01:00

1 2 3

142 commits