The docker_services and systemd_services roles ran their "start the
service" tasks with `failed_when: false`, so a container or unit that
failed to come up still reported the deploy as green. Drop it from both
start tasks so a broken deploy actually fails CI. The compose/unit *copy*
tasks keep `failed_when: false` — that's load-bearing for the
`item is not failed` filter that skips services without a compose/unit file.
Also:
- Remove a duplicate "Template service .env files" task in docker_services
(second copy used a hardcoded path and didn't register; first one is the
one the start task reads).
- Don't trigger a full fleet deploy on docs/markdown/workflow-only pushes
to main — add docs/**, **/*.md and .github/** to paths-ignore.
- Drop the dangling `update-freebsd` Make target (playbook doesn't exist;
fleet has no FreeBSD hosts).
* chore: add dependabot config
Add Dependabot for the three supported ecosystems in this repo:
GitHub Actions, Terraform (root + grafana/hetzner/pagerduty modules),
and Docker (service compose files + dotfile Dockerfiles). Weekly
schedule with per-ecosystem grouping to keep PR noise down.
* ci: make terraform validation work on dependabot PRs
Dependabot PRs run with no access to repository secrets and a read-only
token, so the SOPS decrypt step (and the PR-comment step) fail. Give
Dependabot a secret-free path: stub the secrets.yaml keys it reads and
run init -backend=false + validate, skipping decrypt/plan/comment. Human
PRs are unchanged and still get a full plan.
Add Dependabot for the three supported ecosystems in this repo:
GitHub Actions, Terraform (root + grafana/hetzner/pagerduty modules),
and Docker (service compose files + dotfile Dockerfiles). Weekly
schedule with per-ecosystem grouping to keep PR noise down.
* ci: serialize infra runs and enable terraform state locking
Add concurrency guards to the terraform and deploy-on-merge workflows so
two merges in quick succession can't run against the same state or the
same hosts at once (queue, never cancel an in-flight run).
Enable native S3 state locking (use_lockfile) on the Backblaze B2 backend,
which needs OpenTofu 1.10+, so bump the CI tofu version 1.9.0 -> 1.10.10
and the required_version constraint to >= 1.10.0.
* ci: bump tofu to 1.10.10 in the validate workflow too
Missed this one in the last commit — the PR-time validate still pinned
1.9.0, which trips the new required_version >= 1.10.0 constraint.
* ci: drop use_lockfile — Backblaze B2 can't do native state locking
B2's S3 API returns 501 NotImplemented for the conditional PutObject that
use_lockfile relies on, so tofu plan/apply fails to acquire the lock.
Revert the lockfile and the 1.10 version bump it required; rely on the
concurrency guard to serialize applies instead. Left a note in the
backend block so this isn't re-attempted.
* Grafana Cloud migration, adding dashboards, fleet, alloy and synthetics
* modulize stuff now that we have multiple substantial things in here
* provider updates and new secrets
* remove grafana and prometheus from ansible
* fix: actually decomission nextcloud and TWDNE
* ignore spaces in lint and remove dns for the services
* linting on the linting config wasn't linting the lints