Commit graph

136 commits

Author SHA1 Message Date
81efa1b717
Remove stale cloudflared service from copenhagen-a (PESO-138) (#125)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
cloudflared was retired in #56 when Caddy + Authelia replaced Cloudflare
Tunnels, but copenhagen-a was unreachable at the time so its
cloudflared.service was never stopped and is still running.

Add a cleanup task to the common role that stops, disables and purges
cloudflared wherever the unit lingers. Gated on the unit file existing so
it self-targets copenhagen-a and is a no-op everywhere else, and explicitly
excludes copenhagen-c, which legitimately runs a hand-configured tunnel.
2026-06-07 11:45:35 +01:00
3871dc8f90
Restrict london-b Samba (445) to LAN + Tailscale, off public internet (#124)
Samba on london-b was allowed on 445/tcp from anywhere via UFW, exposing
SMB/CIFS to the public internet. Tailscale already reaches it through the
tailscale0 allow-all rule, so scope the explicit rule to the local London
LAN (192.168.1.0/24) instead of the world.

The common UFW task only ever adds allow rules, so it gained support for an
optional per-port from_ip, plus a follow-up task that deletes the superseded
world-open variant of any source-restricted port — otherwise the old
'445 ALLOW Anywhere' rule would linger on the host and defeat the change.

PESO-145
2026-06-07 11:37:45 +01:00
644b608831
chore: retire readarr service, replaced by bookshelf (#123)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
Bookshelf (PR #122) is a Readarr revival and now owns port 8787 on
london-b, so the old custom Readarr systemd unit is removed:

- drop readarr from the media_stack role's unit-deploy and enable loops,
  and add an idempotent decommission task (stop, disable, remove unit)
  so the host tears it down via Ansible rather than ad-hoc SSH
- delete services/readarr/readarr.service
- update docs (services, london-b host, service inventory) to describe
  bookshelf as a Docker service instead of a custom systemd unit

The public readarr.pez.sh hostname is kept and now reverse-proxies to
bookshelf on :8787 — DNS, Caddy and Authelia (pez_readarr_users group)
are unchanged.
2026-06-06 15:50:37 +01:00
98ac065056
feat: add bookshelf service on london-b (#122)
Bookshelf (a Readarr revival) for managing the ebook/audiobook library.
Runs on london-b with config at /root/bookshelf and the library at
/hdd/books mounted into the container at the same path.
2026-06-06 15:34:57 +01:00
85d1cb945e
chore: commit terraform lock file for reproducible provider versions (#121)
Some checks failed
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
The .terraform.lock.hcl was gitignored while providers use floating
~> constraints, so every CI 'tofu init' resolved provider versions
fresh and could drift from what was tested locally, with no checksum
verification on the providers.

Track the lock file instead, with hashes for linux_amd64 (CI) plus
darwin_arm64/amd64 (local). Dependabot's terraform updates now surface
exact provider version bumps as reviewable, hash-pinned changes.
2026-06-06 13:19:08 +01:00
a40cd60d60
backup: keep deleted/overwritten versions instead of mirroring them away (#120)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
The nightly job runs 'rclone sync', which permanently deletes or overwrites
objects at the B2 destination. That means an accidental deletion or a
ransomware encryption on /hdd propagates straight to the backup on the next
run, leaving no clean copy.

Add --backup-dir so every superseded version is moved into a dated folder
under _versions/ rather than thrown away, and prune that folder after 30
days so it doesn't grow unbounded.
2026-06-05 21:23:04 +01:00
dependabot[bot]
7f2cbd4af1
chore(deps): bump the github-actions group across 1 directory with 2 updates (#117)
Bumps the github-actions group with 2 updates in the / directory: [ansible/ansible-lint](https://github.com/ansible/ansible-lint) and [actions/github-script](https://github.com/actions/github-script).


Updates `ansible/ansible-lint` from 25 to 26
- [Release notes](https://github.com/ansible/ansible-lint/releases)
- [Commits](https://github.com/ansible/ansible-lint/compare/v25...v26)

Updates `actions/github-script` from 7 to 9
- [Release notes](https://github.com/actions/github-script/releases)
- [Commits](https://github.com/actions/github-script/compare/v7...v9)

---
updated-dependencies:
- dependency-name: actions/github-script
  dependency-version: '9'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: ansible/ansible-lint
  dependency-version: '26'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-05 21:13:03 +01:00
dependabot[bot]
24431466c5
chore(deps): bump the terraform group across 2 directories with 1 update (#116)
Updates the requirements on  and [pagerduty/pagerduty](https://github.com/PagerDuty/terraform-provider-pagerduty) to permit the latest version.

Updates `pagerduty/pagerduty` to 3.32.4
- [Release notes](https://github.com/PagerDuty/terraform-provider-pagerduty/releases)
- [Changelog](https://github.com/PagerDuty/terraform-provider-pagerduty/blob/master/CHANGELOG.md)
- [Commits](https://github.com/PagerDuty/terraform-provider-pagerduty/compare/v2.2.0...v3.32.4)

Updates `pagerduty/pagerduty` to 3.32.4
- [Release notes](https://github.com/PagerDuty/terraform-provider-pagerduty/releases)
- [Changelog](https://github.com/PagerDuty/terraform-provider-pagerduty/blob/master/CHANGELOG.md)
- [Commits](https://github.com/PagerDuty/terraform-provider-pagerduty/compare/v2.2.0...v3.32.4)

---
updated-dependencies:
- dependency-name: pagerduty/pagerduty
  dependency-version: 3.32.4
  dependency-type: direct:production
  dependency-group: terraform
- dependency-name: pagerduty/pagerduty
  dependency-version: 3.32.4
  dependency-type: direct:production
  dependency-group: terraform
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-05 21:12:59 +01:00
9815f44b84
fix: stop masking failed service deploys; trim dead config (#119)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
The docker_services and systemd_services roles ran their "start the
service" tasks with `failed_when: false`, so a container or unit that
failed to come up still reported the deploy as green. Drop it from both
start tasks so a broken deploy actually fails CI. The compose/unit *copy*
tasks keep `failed_when: false` — that's load-bearing for the
`item is not failed` filter that skips services without a compose/unit file.

Also:
- Remove a duplicate "Template service .env files" task in docker_services
  (second copy used a hardcoded path and didn't register; first one is the
  one the start task reads).
- Don't trigger a full fleet deploy on docs/markdown/workflow-only pushes
  to main — add docs/**, **/*.md and .github/** to paths-ignore.
- Drop the dangling `update-freebsd` Make target (playbook doesn't exist;
  fleet has no FreeBSD hosts).
2026-06-04 18:41:24 +01:00
7b2552fea5
chore: fix dependabot PRs (#118)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
* chore: add dependabot config

Add Dependabot for the three supported ecosystems in this repo:
GitHub Actions, Terraform (root + grafana/hetzner/pagerduty modules),
and Docker (service compose files + dotfile Dockerfiles). Weekly
schedule with per-ecosystem grouping to keep PR noise down.

* ci: make terraform validation work on dependabot PRs

Dependabot PRs run with no access to repository secrets and a read-only
token, so the SOPS decrypt step (and the PR-comment step) fail. Give
Dependabot a secret-free path: stub the secrets.yaml keys it reads and
run init -backend=false + validate, skipping decrypt/plan/comment. Human
PRs are unchanged and still get a full plan.
2026-06-03 19:29:23 +01:00
7e74232d64
chore: add dependabot config (#115)
Add Dependabot for the three supported ecosystems in this repo:
GitHub Actions, Terraform (root + grafana/hetzner/pagerduty modules),
and Docker (service compose files + dotfile Dockerfiles). Weekly
schedule with per-ecosystem grouping to keep PR noise down.
2026-06-03 19:15:12 +01:00
65090ca9d6
ci: serialize terraform and deploy runs with concurrency guards (#114)
Some checks failed
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
* ci: serialize infra runs and enable terraform state locking

Add concurrency guards to the terraform and deploy-on-merge workflows so
two merges in quick succession can't run against the same state or the
same hosts at once (queue, never cancel an in-flight run).

Enable native S3 state locking (use_lockfile) on the Backblaze B2 backend,
which needs OpenTofu 1.10+, so bump the CI tofu version 1.9.0 -> 1.10.10
and the required_version constraint to >= 1.10.0.

* ci: bump tofu to 1.10.10 in the validate workflow too

Missed this one in the last commit — the PR-time validate still pinned
1.9.0, which trips the new required_version >= 1.10.0 constraint.

* ci: drop use_lockfile — Backblaze B2 can't do native state locking

B2's S3 API returns 501 NotImplemented for the conditional PutObject that
use_lockfile relies on, so tofu plan/apply fails to acquire the lock.
Revert the lockfile and the 1.10 version bump it required; rely on the
concurrency guard to serialize applies instead. Left a note in the
backend block so this isn't re-attempted.
2026-06-02 19:39:13 +01:00
45dff99e7c
fix: update octopus exporter (#113)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
2026-05-26 20:56:07 +01:00
a031d4218b
fix: Documentation overhaul (#112)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
* fix: Documentation overhaul

* removing joke graph
2026-05-19 18:49:21 +01:00
1ec4e10eb1
Update cache action (#111)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
* fix: update cache version

* fix: update cache
2026-05-16 11:13:38 +01:00
a6aa561147
fix: update cache version (#110) 2026-05-16 11:03:12 +01:00
7ad2766f94
hotfix: broken pipeline (#109)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
* fix: cleanup deploy.yml and share workflow

* lint issue

* hotfix: broken pipeline
2026-05-15 20:19:56 +01:00
9f84652102
fix: cleanup deploy.yml and share workflow (#108)
* fix: cleanup deploy.yml and share workflow

* lint issue
2026-05-15 20:17:28 +01:00
69145b3089
fix: add smb mount (#107)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
* fix: add smb mount

* update secrets

* address linting issues
2026-05-14 20:49:25 +01:00
5481292b7f
fix: remove subscription nag and lock down proxmox (#106)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-05-13 21:09:54 +01:00
d3b516c594
fix: cleanup freebsd and alpine stuff (#105)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-05-12 22:43:12 +01:00
e502a92451
fix: tracing on caddy services (#104)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Terraform / Plan (push) Has been cancelled
Deploy (on merge) / Deploy → (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-05-10 10:18:53 +01:00
06552c5b75
fix: slight tweaks (#103)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* fix: slight tweaks

* remove vendor
2026-05-09 20:49:46 +01:00
b5d5537c1f
Proxmox ve on london a (#102)
* fix: update config for london-a for new proxmox install

* fix: update proxmox endpoint
2026-05-09 19:29:44 +01:00
928d1d0b99
fix: update config for london-a for new proxmox install (#101) 2026-05-09 19:22:34 +01:00
51efda6053
Update fleet_pipelines.tf
Some checks are pending
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
2026-05-08 19:32:58 +01:00
d88d2e5d12
Add git synthetic check (#99)
Some checks failed
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-05-06 06:01:59 +01:00
7d22ad1ce1
bug: add retry to restarting caddy (#97)
Some checks failed
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / Deploy → (push) Has been cancelled
* bug: add retry to restarting caddy

* skip terraform pipeline when no terraform changes has been done
2026-05-05 20:42:52 +01:00
abb283c1d7
terraform plan on pr and caddy metrics on localhost since we have all… (#96)
* terraform plan on pr and caddy metrics on localhost since we have alloy now

* remove refreshing state
2026-05-05 13:35:37 +01:00
9bde71fbf9
adding pagerduty stack (#95)
Some checks are pending
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* adding pagerduty stack

* rename files to not be overly descriptive
2026-05-04 20:50:31 +01:00
043c783361
Grafana Cloud Migration (#94)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* Grafana Cloud migration, adding dashboards, fleet, alloy and synthetics

* modulize stuff now that we have multiple substantial things in here

* provider updates and new secrets

* remove grafana and prometheus from ansible
2026-05-04 13:40:30 +01:00
83f023aedd
Migration to Grafana Cloud, nuremberg-a reinstalled, london-a reinsta… (#93)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* Migration to Grafana Cloud, nuremberg-a reinstalled, london-a reinstalled

* dns config for cockpit
2026-05-03 14:00:22 +01:00
d22f7a52a0
fix: clean up of terraform (#92)
Some checks failed
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-05-02 14:46:03 +01:00
03ad9b476d
make dns more neat (#91)
Some checks are pending
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
2026-05-01 21:05:53 +01:00
b5cef4b985
fix: remove cloudflare resources (#90)
Some checks failed
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
* phase 1 - add all the records to both providers to A/B test

* dkim fix

* remove cloudflare resources
2026-04-30 15:55:14 +01:00
ba04d49c4e
Clou dflaring out mayday mayday mayday (#89)
Some checks failed
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / Deploy → (push) Has been cancelled
* phase 1 - add all the records to both providers to A/B test

* dkim fix
2026-04-29 21:23:15 +01:00
dd112fd505
phase 1 - add all the records to both providers to A/B test (#88) 2026-04-29 20:47:34 +01:00
e5306a5409
Fixing loki alloy (#87)
* add alloy to docker group

* fix: use docker driver instead of hacky alloy setup

* fixing linting issue
2026-04-29 20:07:40 +01:00
a51a0879d3
add alloy to docker group (#86)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-04-28 20:53:19 +01:00
6a3618aa4a
fix: Fixing loki alloy (#85)
* fix: alloy

* fix: alpine doesn't need a hacky install
2026-04-28 20:30:30 +01:00
b474e28528
fix: alloy (#84) 2026-04-28 20:10:20 +01:00
5391c500e1
fix: loki & alloy (#83)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
* fix: loki & alloy

* fix linting
2026-04-28 16:40:45 +01:00
a7f51ec10c
fix: update octo exporter (#82)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-04-27 20:10:11 +01:00
5c404dca87
fix: update octopus_exporter to v1.1.1 (#81)
Some checks failed
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-04-26 21:01:24 +01:00
d76be4828c
fix: add ssh key resource (#80) 2026-04-26 20:08:45 +01:00
19928358c5
fix: Update node version for gha (#79)
* fix: update checkout version to dodge deprecation

* fix: more deprecations

* forgot one
2026-04-26 18:35:15 +01:00
7c3fec983b
fix: Update node version for gha (#78)
* fix: update checkout version to dodge deprecation

* fix: more deprecations
2026-04-26 18:23:22 +01:00
98be03c273
fix: update checkout version to dodge deprecation (#77) 2026-04-26 18:13:38 +01:00
1c6784eade
fix: replace tailscale authkey use with oauth (#76)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-04-26 17:30:15 +01:00
e9fbd41cb4
fix: deploy using a matrix (#75) 2026-04-26 14:35:12 +01:00