Commit graph

131 commits

Author SHA1 Message Date
a40cd60d60
backup: keep deleted/overwritten versions instead of mirroring them away (#120)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
The nightly job runs 'rclone sync', which permanently deletes or overwrites
objects at the B2 destination. That means an accidental deletion or a
ransomware encryption on /hdd propagates straight to the backup on the next
run, leaving no clean copy.

Add --backup-dir so every superseded version is moved into a dated folder
under _versions/ rather than thrown away, and prune that folder after 30
days so it doesn't grow unbounded.
2026-06-05 21:23:04 +01:00
dependabot[bot]
7f2cbd4af1
chore(deps): bump the github-actions group across 1 directory with 2 updates (#117)
Bumps the github-actions group with 2 updates in the / directory: [ansible/ansible-lint](https://github.com/ansible/ansible-lint) and [actions/github-script](https://github.com/actions/github-script).


Updates `ansible/ansible-lint` from 25 to 26
- [Release notes](https://github.com/ansible/ansible-lint/releases)
- [Commits](https://github.com/ansible/ansible-lint/compare/v25...v26)

Updates `actions/github-script` from 7 to 9
- [Release notes](https://github.com/actions/github-script/releases)
- [Commits](https://github.com/actions/github-script/compare/v7...v9)

---
updated-dependencies:
- dependency-name: actions/github-script
  dependency-version: '9'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: ansible/ansible-lint
  dependency-version: '26'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-05 21:13:03 +01:00
dependabot[bot]
24431466c5
chore(deps): bump the terraform group across 2 directories with 1 update (#116)
Updates the requirements on  and [pagerduty/pagerduty](https://github.com/PagerDuty/terraform-provider-pagerduty) to permit the latest version.

Updates `pagerduty/pagerduty` to 3.32.4
- [Release notes](https://github.com/PagerDuty/terraform-provider-pagerduty/releases)
- [Changelog](https://github.com/PagerDuty/terraform-provider-pagerduty/blob/master/CHANGELOG.md)
- [Commits](https://github.com/PagerDuty/terraform-provider-pagerduty/compare/v2.2.0...v3.32.4)

Updates `pagerduty/pagerduty` to 3.32.4
- [Release notes](https://github.com/PagerDuty/terraform-provider-pagerduty/releases)
- [Changelog](https://github.com/PagerDuty/terraform-provider-pagerduty/blob/master/CHANGELOG.md)
- [Commits](https://github.com/PagerDuty/terraform-provider-pagerduty/compare/v2.2.0...v3.32.4)

---
updated-dependencies:
- dependency-name: pagerduty/pagerduty
  dependency-version: 3.32.4
  dependency-type: direct:production
  dependency-group: terraform
- dependency-name: pagerduty/pagerduty
  dependency-version: 3.32.4
  dependency-type: direct:production
  dependency-group: terraform
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-05 21:12:59 +01:00
9815f44b84
fix: stop masking failed service deploys; trim dead config (#119)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
The docker_services and systemd_services roles ran their "start the
service" tasks with `failed_when: false`, so a container or unit that
failed to come up still reported the deploy as green. Drop it from both
start tasks so a broken deploy actually fails CI. The compose/unit *copy*
tasks keep `failed_when: false` — that's load-bearing for the
`item is not failed` filter that skips services without a compose/unit file.

Also:
- Remove a duplicate "Template service .env files" task in docker_services
  (second copy used a hardcoded path and didn't register; first one is the
  one the start task reads).
- Don't trigger a full fleet deploy on docs/markdown/workflow-only pushes
  to main — add docs/**, **/*.md and .github/** to paths-ignore.
- Drop the dangling `update-freebsd` Make target (playbook doesn't exist;
  fleet has no FreeBSD hosts).
2026-06-04 18:41:24 +01:00
7b2552fea5
chore: fix dependabot PRs (#118)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
* chore: add dependabot config

Add Dependabot for the three supported ecosystems in this repo:
GitHub Actions, Terraform (root + grafana/hetzner/pagerduty modules),
and Docker (service compose files + dotfile Dockerfiles). Weekly
schedule with per-ecosystem grouping to keep PR noise down.

* ci: make terraform validation work on dependabot PRs

Dependabot PRs run with no access to repository secrets and a read-only
token, so the SOPS decrypt step (and the PR-comment step) fail. Give
Dependabot a secret-free path: stub the secrets.yaml keys it reads and
run init -backend=false + validate, skipping decrypt/plan/comment. Human
PRs are unchanged and still get a full plan.
2026-06-03 19:29:23 +01:00
7e74232d64
chore: add dependabot config (#115)
Add Dependabot for the three supported ecosystems in this repo:
GitHub Actions, Terraform (root + grafana/hetzner/pagerduty modules),
and Docker (service compose files + dotfile Dockerfiles). Weekly
schedule with per-ecosystem grouping to keep PR noise down.
2026-06-03 19:15:12 +01:00
65090ca9d6
ci: serialize terraform and deploy runs with concurrency guards (#114)
Some checks failed
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
* ci: serialize infra runs and enable terraform state locking

Add concurrency guards to the terraform and deploy-on-merge workflows so
two merges in quick succession can't run against the same state or the
same hosts at once (queue, never cancel an in-flight run).

Enable native S3 state locking (use_lockfile) on the Backblaze B2 backend,
which needs OpenTofu 1.10+, so bump the CI tofu version 1.9.0 -> 1.10.10
and the required_version constraint to >= 1.10.0.

* ci: bump tofu to 1.10.10 in the validate workflow too

Missed this one in the last commit — the PR-time validate still pinned
1.9.0, which trips the new required_version >= 1.10.0 constraint.

* ci: drop use_lockfile — Backblaze B2 can't do native state locking

B2's S3 API returns 501 NotImplemented for the conditional PutObject that
use_lockfile relies on, so tofu plan/apply fails to acquire the lock.
Revert the lockfile and the 1.10 version bump it required; rely on the
concurrency guard to serialize applies instead. Left a note in the
backend block so this isn't re-attempted.
2026-06-02 19:39:13 +01:00
45dff99e7c
fix: update octopus exporter (#113)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
2026-05-26 20:56:07 +01:00
a031d4218b
fix: Documentation overhaul (#112)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
* fix: Documentation overhaul

* removing joke graph
2026-05-19 18:49:21 +01:00
1ec4e10eb1
Update cache action (#111)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
* fix: update cache version

* fix: update cache
2026-05-16 11:13:38 +01:00
a6aa561147
fix: update cache version (#110) 2026-05-16 11:03:12 +01:00
7ad2766f94
hotfix: broken pipeline (#109)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
* fix: cleanup deploy.yml and share workflow

* lint issue

* hotfix: broken pipeline
2026-05-15 20:19:56 +01:00
9f84652102
fix: cleanup deploy.yml and share workflow (#108)
* fix: cleanup deploy.yml and share workflow

* lint issue
2026-05-15 20:17:28 +01:00
69145b3089
fix: add smb mount (#107)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
* fix: add smb mount

* update secrets

* address linting issues
2026-05-14 20:49:25 +01:00
5481292b7f
fix: remove subscription nag and lock down proxmox (#106)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-05-13 21:09:54 +01:00
d3b516c594
fix: cleanup freebsd and alpine stuff (#105)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-05-12 22:43:12 +01:00
e502a92451
fix: tracing on caddy services (#104)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Terraform / Plan (push) Has been cancelled
Deploy (on merge) / Deploy → (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-05-10 10:18:53 +01:00
06552c5b75
fix: slight tweaks (#103)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* fix: slight tweaks

* remove vendor
2026-05-09 20:49:46 +01:00
b5d5537c1f
Proxmox ve on london a (#102)
* fix: update config for london-a for new proxmox install

* fix: update proxmox endpoint
2026-05-09 19:29:44 +01:00
928d1d0b99
fix: update config for london-a for new proxmox install (#101) 2026-05-09 19:22:34 +01:00
51efda6053
Update fleet_pipelines.tf
Some checks are pending
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
2026-05-08 19:32:58 +01:00
d88d2e5d12
Add git synthetic check (#99)
Some checks failed
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-05-06 06:01:59 +01:00
7d22ad1ce1
bug: add retry to restarting caddy (#97)
Some checks failed
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / Deploy → (push) Has been cancelled
* bug: add retry to restarting caddy

* skip terraform pipeline when no terraform changes has been done
2026-05-05 20:42:52 +01:00
abb283c1d7
terraform plan on pr and caddy metrics on localhost since we have all… (#96)
* terraform plan on pr and caddy metrics on localhost since we have alloy now

* remove refreshing state
2026-05-05 13:35:37 +01:00
9bde71fbf9
adding pagerduty stack (#95)
Some checks are pending
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* adding pagerduty stack

* rename files to not be overly descriptive
2026-05-04 20:50:31 +01:00
043c783361
Grafana Cloud Migration (#94)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* Grafana Cloud migration, adding dashboards, fleet, alloy and synthetics

* modulize stuff now that we have multiple substantial things in here

* provider updates and new secrets

* remove grafana and prometheus from ansible
2026-05-04 13:40:30 +01:00
83f023aedd
Migration to Grafana Cloud, nuremberg-a reinstalled, london-a reinsta… (#93)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* Migration to Grafana Cloud, nuremberg-a reinstalled, london-a reinstalled

* dns config for cockpit
2026-05-03 14:00:22 +01:00
d22f7a52a0
fix: clean up of terraform (#92)
Some checks failed
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-05-02 14:46:03 +01:00
03ad9b476d
make dns more neat (#91)
Some checks are pending
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
2026-05-01 21:05:53 +01:00
b5cef4b985
fix: remove cloudflare resources (#90)
Some checks failed
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
* phase 1 - add all the records to both providers to A/B test

* dkim fix

* remove cloudflare resources
2026-04-30 15:55:14 +01:00
ba04d49c4e
Clou dflaring out mayday mayday mayday (#89)
Some checks failed
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / Deploy → (push) Has been cancelled
* phase 1 - add all the records to both providers to A/B test

* dkim fix
2026-04-29 21:23:15 +01:00
dd112fd505
phase 1 - add all the records to both providers to A/B test (#88) 2026-04-29 20:47:34 +01:00
e5306a5409
Fixing loki alloy (#87)
* add alloy to docker group

* fix: use docker driver instead of hacky alloy setup

* fixing linting issue
2026-04-29 20:07:40 +01:00
a51a0879d3
add alloy to docker group (#86)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-04-28 20:53:19 +01:00
6a3618aa4a
fix: Fixing loki alloy (#85)
* fix: alloy

* fix: alpine doesn't need a hacky install
2026-04-28 20:30:30 +01:00
b474e28528
fix: alloy (#84) 2026-04-28 20:10:20 +01:00
5391c500e1
fix: loki & alloy (#83)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
* fix: loki & alloy

* fix linting
2026-04-28 16:40:45 +01:00
a7f51ec10c
fix: update octo exporter (#82)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-04-27 20:10:11 +01:00
5c404dca87
fix: update octopus_exporter to v1.1.1 (#81)
Some checks failed
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-04-26 21:01:24 +01:00
d76be4828c
fix: add ssh key resource (#80) 2026-04-26 20:08:45 +01:00
19928358c5
fix: Update node version for gha (#79)
* fix: update checkout version to dodge deprecation

* fix: more deprecations

* forgot one
2026-04-26 18:35:15 +01:00
7c3fec983b
fix: Update node version for gha (#78)
* fix: update checkout version to dodge deprecation

* fix: more deprecations
2026-04-26 18:23:22 +01:00
98be03c273
fix: update checkout version to dodge deprecation (#77) 2026-04-26 18:13:38 +01:00
1c6784eade
fix: replace tailscale authkey use with oauth (#76)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-04-26 17:30:15 +01:00
e9fbd41cb4
fix: deploy using a matrix (#75) 2026-04-26 14:35:12 +01:00
10bb940f87
fix: update living room dashboard (#74) 2026-04-26 14:09:09 +01:00
af2f462c1c
fix: prometheus retention and authelia fix (#73)
Some checks are pending
Deploy (on merge) / Deploy (push) Waiting to run
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* fix: prometheus retention time

* also fix bug with authelia

* linting issues

* more linting
2026-04-25 21:35:39 +01:00
b82013c2f0
fix: actually decomission nextcloud and TWDNE (#72)
* fix: actually decomission nextcloud and TWDNE

* ignore spaces in lint and remove dns for the services

* linting on the linting config wasn't linting the lints
2026-04-25 18:19:16 +01:00
35c5079d8f
fix: remove cloud and TWDNE and add energy dashboard for grafana (#71) 2026-04-25 17:46:17 +01:00
b3cc47f3d6
fix: optimize deploy playbook and get rid of deprecated stuff (#70) 2026-04-25 15:04:16 +01:00