RWejlgaard/pez-infra

mirror of https://github.com/RWejlgaard/pez-infra.git synced 2026-07-04 15:46:16 +00:00

Author	SHA1	Message	Date
Rasmus "Pez" Wejlgaard	87439d47b8	ci: extract shared SOPS/tofu steps into composite actions (#135 ) Some checks failed Terraform / Plan (push) Has been cancelled Details Terraform / Apply (push) Has been cancelled Details The SOPS install + version, the decrypt loop, the OpenTofu version, and the Backblaze backend-credential extraction were copy-pasted across terraform.yml (twice), validate-terraform.yml, and _deploy-core.yml. A version bump meant editing the same string in up to four places and was easy to do partially. Pull them into three local composite actions so each is defined once: - setup-tofu (pins OpenTofu version) - sops-decrypt (installs SOPS, decrypts .enc. in place) - tofu-backend-creds (exports Backblaze S3 creds to GITHUB_ENV) Behaviour is unchanged; sops-decrypt also matches *.enc.env everywhere (previously only _deploy-core did), which is a no-op in terraform/.	2026-06-18 20:27:54 +01:00
Rasmus "Pez" Wejlgaard	e9d5f9bc76	ci: make Caddyfile validation download robust (#134 ) The validate-caddyfile workflow fetched the Caddy binary by first hitting api.github.com/releases/latest to resolve the version tag, then building a release-asset URL from it. That API call is unauthenticated, so it shares the 60-requests/hour-per-IP limit across all GitHub-hosted runners and returns 403 under load. On failure jq emits "null", the URL becomes caddy_null_linux_amd64.tar.gz, and `curl -sL` silently pipes a 404 page into tar — a confusing, flaky failure on every PR that touches the Caddyfile. Switch to Caddy's official download API, which serves the latest linux/amd64 binary directly: one request, no GitHub API, no jq/tar parsing. Add `-f` so curl fails loudly on an HTTP error instead of writing an error page to disk.	2026-06-15 20:38:21 +01:00
Rasmus "Pez" Wejlgaard	26f8224941	make Dependabot tofu validate stubs satisfy provider validators (#132 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Terraform / Plan (push) Has been cancelled Details Deploy (on merge) / deploy (push) Has been cancelled Details Terraform / Apply (push) Has been cancelled Details	2026-06-12 19:25:24 +01:00
dependabot[bot]	7f2cbd4af1	chore(deps): bump the github-actions group across 1 directory with 2 updates (#117 ) Bumps the github-actions group with 2 updates in the / directory: [ansible/ansible-lint](https://github.com/ansible/ansible-lint) and [actions/github-script](https://github.com/actions/github-script). Updates `ansible/ansible-lint` from 25 to 26 - [Release notes](https://github.com/ansible/ansible-lint/releases) - [Commits](https://github.com/ansible/ansible-lint/compare/v25...v26) Updates `actions/github-script` from 7 to 9 - [Release notes](https://github.com/actions/github-script/releases) - [Commits](https://github.com/actions/github-script/compare/v7...v9) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '9' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: ansible/ansible-lint dependency-version: '26' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-06-05 21:13:03 +01:00
Rasmus "Pez" Wejlgaard	9815f44b84	fix: stop masking failed service deploys; trim dead config (#119 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / deploy (push) Has been cancelled Details The docker_services and systemd_services roles ran their "start the service" tasks with `failed_when: false`, so a container or unit that failed to come up still reported the deploy as green. Drop it from both start tasks so a broken deploy actually fails CI. The compose/unit copy tasks keep `failed_when: false` — that's load-bearing for the `item is not failed` filter that skips services without a compose/unit file. Also: - Remove a duplicate "Template service .env files" task in docker_services (second copy used a hardcoded path and didn't register; first one is the one the start task reads). - Don't trigger a full fleet deploy on docs/markdown/workflow-only pushes to main — add docs/, /.md and .github/* to paths-ignore. - Drop the dangling `update-freebsd` Make target (playbook doesn't exist; fleet has no FreeBSD hosts).	2026-06-04 18:41:24 +01:00
Rasmus "Pez" Wejlgaard	7b2552fea5	chore: fix dependabot PRs (#118 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details * chore: add dependabot config Add Dependabot for the three supported ecosystems in this repo: GitHub Actions, Terraform (root + grafana/hetzner/pagerduty modules), and Docker (service compose files + dotfile Dockerfiles). Weekly schedule with per-ecosystem grouping to keep PR noise down. * ci: make terraform validation work on dependabot PRs Dependabot PRs run with no access to repository secrets and a read-only token, so the SOPS decrypt step (and the PR-comment step) fail. Give Dependabot a secret-free path: stub the secrets.yaml keys it reads and run init -backend=false + validate, skipping decrypt/plan/comment. Human PRs are unchanged and still get a full plan.	2026-06-03 19:29:23 +01:00
Rasmus "Pez" Wejlgaard	65090ca9d6	ci: serialize terraform and deploy runs with concurrency guards (#114 ) Some checks failed Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details Terraform / Plan (push) Has been cancelled Details Terraform / Apply (push) Has been cancelled Details * ci: serialize infra runs and enable terraform state locking Add concurrency guards to the terraform and deploy-on-merge workflows so two merges in quick succession can't run against the same state or the same hosts at once (queue, never cancel an in-flight run). Enable native S3 state locking (use_lockfile) on the Backblaze B2 backend, which needs OpenTofu 1.10+, so bump the CI tofu version 1.9.0 -> 1.10.10 and the required_version constraint to >= 1.10.0. * ci: bump tofu to 1.10.10 in the validate workflow too Missed this one in the last commit — the PR-time validate still pinned 1.9.0, which trips the new required_version >= 1.10.0 constraint. * ci: drop use_lockfile — Backblaze B2 can't do native state locking B2's S3 API returns 501 NotImplemented for the conditional PutObject that use_lockfile relies on, so tofu plan/apply fails to acquire the lock. Revert the lockfile and the 1.10 version bump it required; rely on the concurrency guard to serialize applies instead. Left a note in the backend block so this isn't re-attempted.	2026-06-02 19:39:13 +01:00
Rasmus "Pez" Wejlgaard	1ec4e10eb1	Update cache action (#111 ) Some checks failed Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / deploy (push) Has been cancelled Details * fix: update cache version * fix: update cache	2026-05-16 11:13:38 +01:00
Rasmus "Pez" Wejlgaard	a6aa561147	fix: update cache version (#110 )	2026-05-16 11:03:12 +01:00
Rasmus "Pez" Wejlgaard	7ad2766f94	hotfix: broken pipeline (#109 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / deploy (push) Blocked by required conditions Details * fix: cleanup deploy.yml and share workflow * lint issue * hotfix: broken pipeline	2026-05-15 20:19:56 +01:00
Rasmus "Pez" Wejlgaard	9f84652102	fix: cleanup deploy.yml and share workflow (#108 ) * fix: cleanup deploy.yml and share workflow * lint issue	2026-05-15 20:17:28 +01:00
Rasmus "Pez" Wejlgaard	7d22ad1ce1	bug: add retry to restarting caddy (#97 ) Some checks failed Terraform / Plan (push) Waiting to run Details Terraform / Apply (push) Blocked by required conditions Details Deploy (on merge) / Discover hosts (push) Has been cancelled Details Deploy (on merge) / Deploy → (push) Has been cancelled Details * bug: add retry to restarting caddy * skip terraform pipeline when no terraform changes has been done	2026-05-05 20:42:52 +01:00
Rasmus "Pez" Wejlgaard	abb283c1d7	terraform plan on pr and caddy metrics on localhost since we have all… (#96 ) * terraform plan on pr and caddy metrics on localhost since we have alloy now * remove refreshing state	2026-05-05 13:35:37 +01:00
Rasmus "Pez" Wejlgaard	19928358c5	fix: Update node version for gha (#79 ) * fix: update checkout version to dodge deprecation * fix: more deprecations * forgot one	2026-04-26 18:35:15 +01:00
Rasmus "Pez" Wejlgaard	7c3fec983b	fix: Update node version for gha (#78 ) * fix: update checkout version to dodge deprecation * fix: more deprecations	2026-04-26 18:23:22 +01:00
Rasmus "Pez" Wejlgaard	98be03c273	fix: update checkout version to dodge deprecation (#77 )	2026-04-26 18:13:38 +01:00
Rasmus "Pez" Wejlgaard	1c6784eade	fix: replace tailscale authkey use with oauth (#76 ) Some checks are pending Deploy (on merge) / Discover hosts (push) Waiting to run Details Deploy (on merge) / Deploy → (push) Blocked by required conditions Details	2026-04-26 17:30:15 +01:00
Rasmus "Pez" Wejlgaard	e9fbd41cb4	fix: deploy using a matrix (#75 )	2026-04-26 14:35:12 +01:00
Rasmus "Pez" Wejlgaard	ed6eb22f60	Remove cloudflared — replaced by Caddy reverse proxy (#56 ) Cloudflared tunnels are no longer used. All traffic now routes through Cloudflare DNS to Caddy on helsinki-a over Tailscale. - Remove cloudflared systemd unit files (copenhagen-a, london-b) - Remove cloudflared from media_stack role and copenhagen-a host_vars - Remove cloudflared references from services README and host docs - Remove cloudflared deploy trigger from CI workflow Live service on london-b stopped and disabled. copenhagen-a was unreachable but the tunnel is unused regardless.	2026-04-03 22:51:12 +01:00
Rasmus "Pez" Wejlgaard	88377f3e93	fix: remove \|\| true from compose lint so validation errors fail CI (#54 ) The lint-docker-compose workflow was swallowing all validation errors with \|\| true, meaning broken compose files would never fail the check. - Remove \|\| true and let validation failures propagate - Add a pre-step that creates empty stubs for referenced env_file entries (e.g. bitwarden/settings.env) so docker compose config can validate structure without needing real secrets - Track per-file pass/fail and exit non-zero if any file fails Closes PESO-130	2026-04-03 20:50:47 +01:00
Rasmus "Pez" Wejlgaard	25d201f930	Add copenhagen-a to docker_hosts and wire up minecraft docker service (#52 ) - Add copenhagen-a to [docker_hosts] inventory group so the docker role runs on it in Stage 2 - Add docker_services: [minecraft] to copenhagen-a host_vars - Add docker_services role to Stage 4d (copenhagen-a) in deploy.yml - Update deploy-on-merge scope mapping to include copenhagen-a for docker role changes Closes PESO-132	2026-04-03 19:50:51 +01:00
Rasmus "Pez" Wejlgaard	a31f8b5651	Add systemd_exporter Ansible role and Prometheus scrape config (#49 ) * Add systemd_exporter Ansible role and Prometheus scrape config - Create systemd_exporter role (download binary, create user, deploy service) - Add scrape job for london-b:9558 and copenhagen-a:9558 - Add systemd_exporter_hosts inventory group - Add stage 3b to deploy.yml - Map role to deploy-on-merge scope Closes PESO-120 * Fix line length lint violations in systemd_exporter tasks * Fix var-naming lint: use systemd_exporter_ prefix for role variables	2026-04-03 12:23:38 +01:00
Rasmus "Pez" Wejlgaard	8f5eb385cc	Remove copenhagen-a from docker role mapping in deploy-on-merge (#48 ) copenhagen-a is not in [docker_hosts] inventory group. Running the docker role play against it just gets skipped, wasting CI time. Fixes PESO-121	2026-04-03 11:49:41 +01:00
Rasmus "Pez" Wejlgaard	b6c8c18106	deploy-on-merge: add path-based host limiting (#41 ) Instead of deploying to the entire fleet on every merge, detect which files changed and limit ansible-playbook to only affected hosts. Maps ansible roles, services, and host_vars to their target hosts. Falls back to full fleet deploy for unmapped paths or changes to shared infrastructure (common role, deploy.yml, inventory). Closes PESO-108	2026-04-03 02:19:55 +01:00
Rasmus "Pez" Wejlgaard	20274d49d4	ci: add ansible-galaxy collection install to deploy workflows (#39 ) Both deploy-on-merge.yml and deploy.yml install ansible via pip but never install the required Galaxy collections (community.docker, community.general, ansible.posix) from ansible/requirements.yml. This works by accident because the pip ansible package bundles some collections, but it's fragile — a pip upgrade or runner image change could break deploys silently. Fixes PESO-110	2026-04-03 01:18:30 +01:00
Rasmus "Pez" Wejlgaard	b16f89357b	replace hard set ip with vars (#25 ) * replace hard set ip with vars * run all PR checks every time	2026-03-29 21:33:50 +01:00
Rasmus "Pez" Wejlgaard	431c65065a	Add Docker official apt repo to docker role (#24 ) * Add Docker official apt repo to docker role The docker role was installing docker-compose-plugin which is only available from Docker's official apt repository. helsinki-a had it configured manually, but london-b and copenhagen-a did not, causing deploy failures. Now the role: - Adds Docker's GPG key and apt repo (handles both Debian and Ubuntu) - Installs docker-ce, docker-ce-cli, containerd.io, docker-compose-plugin - Removes conflicting stock packages (docker.io, docker-compose) * fix: resolve yamllint violations in docker role - Remove standalone comment blocks that caused indentation errors - Collapse multiline repo string to single line - Ensure document start marker is present * fix: keep all lines under 160 chars for yamllint Use set_fact to build the Docker repo line in parts instead of one long inline string. * fix: resolve yamllint errors in london-b host_vars and promtail config - Remove trailing blank line in inventory/host_vars/london-b.yml - Add missing document start marker to promtail config - Fix indentation in promtail scrape_configs (indent list items under key) * Remove ansible-lint on push, keep PR-only Lint already runs on pull_request — no need to double up on push to main.	2026-03-29 21:11:33 +01:00
Rasmus Wejlgaard	737d6e0bc1	initial commit	2026-03-28 12:39:41 +00:00

28 commits