Commit graph

27 commits

Author SHA1 Message Date
cd113ca03c remove stale promtail/rc.d leftovers, rss DNS record, fix london-c host description
- ansible/services/promtail/ - replaced by Grafana Alloy ages ago, nothing
  references it
- ansible/services/rc.d/london-a/ - FreeBSD rc.conf from before london-a
  was repaved as Proxmox
- rss A record in dns.tf - Miniflux was decommissioned in #127, the Caddy
  block is long gone but the record stayed behind
- london-c host_vars still said idle/available - it runs the Octopus
  exporter
2026-06-11 18:49:29 +01:00
0a357fc69a
docs: catch up with the Cloudflare to Hetzner DNS move, fix secrets/terraform drift (#130)
Some checks failed
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
The docs still described Cloudflare as DNS + CDN in front of helsinki-a,
but that was dropped in #90 - pez.sh lives on Hetzner DNS via Terraform
now and records point straight at the origin. Updated README,
architecture, networking, getting-started and the nuremberg-a host doc
to match, and noted that pez.solutions still resolves via Cloudflare
outside Terraform.

Also fixed while I was in there:
- terraform/README: PagerDuty provider is ~> 3.32 (table said ~> 2.2),
  and the B2 secret keys are backblaze_keyID/backblaze_applicationKey
- secrets docs: group_vars secrets file is .enc.yaml, dropped the
  FreeBSD install steps, the long-gone .sops.yaml placeholder note and
  the ANSIBLE_VAULT_PASS migration note, swapped the cloudflare_record
  example for hcloud
- getting-started referenced ansible/scripts/sops-setup.sh which
  doesn't exist
- added naveen.pez.sh to the subdomain tables and a note about the
  DNS-only records (mail, minecraft, wow, public)
2026-06-10 20:59:23 +01:00
85d1cb945e
chore: commit terraform lock file for reproducible provider versions (#121)
Some checks failed
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
The .terraform.lock.hcl was gitignored while providers use floating
~> constraints, so every CI 'tofu init' resolved provider versions
fresh and could drift from what was tested locally, with no checksum
verification on the providers.

Track the lock file instead, with hashes for linux_amd64 (CI) plus
darwin_arm64/amd64 (local). Dependabot's terraform updates now surface
exact provider version bumps as reviewable, hash-pinned changes.
2026-06-06 13:19:08 +01:00
dependabot[bot]
24431466c5
chore(deps): bump the terraform group across 2 directories with 1 update (#116)
Updates the requirements on  and [pagerduty/pagerduty](https://github.com/PagerDuty/terraform-provider-pagerduty) to permit the latest version.

Updates `pagerduty/pagerduty` to 3.32.4
- [Release notes](https://github.com/PagerDuty/terraform-provider-pagerduty/releases)
- [Changelog](https://github.com/PagerDuty/terraform-provider-pagerduty/blob/master/CHANGELOG.md)
- [Commits](https://github.com/PagerDuty/terraform-provider-pagerduty/compare/v2.2.0...v3.32.4)

Updates `pagerduty/pagerduty` to 3.32.4
- [Release notes](https://github.com/PagerDuty/terraform-provider-pagerduty/releases)
- [Changelog](https://github.com/PagerDuty/terraform-provider-pagerduty/blob/master/CHANGELOG.md)
- [Commits](https://github.com/PagerDuty/terraform-provider-pagerduty/compare/v2.2.0...v3.32.4)

---
updated-dependencies:
- dependency-name: pagerduty/pagerduty
  dependency-version: 3.32.4
  dependency-type: direct:production
  dependency-group: terraform
- dependency-name: pagerduty/pagerduty
  dependency-version: 3.32.4
  dependency-type: direct:production
  dependency-group: terraform
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-05 21:12:59 +01:00
65090ca9d6
ci: serialize terraform and deploy runs with concurrency guards (#114)
Some checks failed
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
* ci: serialize infra runs and enable terraform state locking

Add concurrency guards to the terraform and deploy-on-merge workflows so
two merges in quick succession can't run against the same state or the
same hosts at once (queue, never cancel an in-flight run).

Enable native S3 state locking (use_lockfile) on the Backblaze B2 backend,
which needs OpenTofu 1.10+, so bump the CI tofu version 1.9.0 -> 1.10.10
and the required_version constraint to >= 1.10.0.

* ci: bump tofu to 1.10.10 in the validate workflow too

Missed this one in the last commit — the PR-time validate still pinned
1.9.0, which trips the new required_version >= 1.10.0 constraint.

* ci: drop use_lockfile — Backblaze B2 can't do native state locking

B2's S3 API returns 501 NotImplemented for the conditional PutObject that
use_lockfile relies on, so tofu plan/apply fails to acquire the lock.
Revert the lockfile and the 1.10 version bump it required; rely on the
concurrency guard to serialize applies instead. Left a note in the
backend block so this isn't re-attempted.
2026-06-02 19:39:13 +01:00
e502a92451
fix: tracing on caddy services (#104)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Terraform / Plan (push) Has been cancelled
Deploy (on merge) / Deploy → (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-05-10 10:18:53 +01:00
06552c5b75
fix: slight tweaks (#103)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* fix: slight tweaks

* remove vendor
2026-05-09 20:49:46 +01:00
51efda6053
Update fleet_pipelines.tf
Some checks are pending
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
2026-05-08 19:32:58 +01:00
d88d2e5d12
Add git synthetic check (#99)
Some checks failed
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-05-06 06:01:59 +01:00
9bde71fbf9
adding pagerduty stack (#95)
Some checks are pending
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* adding pagerduty stack

* rename files to not be overly descriptive
2026-05-04 20:50:31 +01:00
043c783361
Grafana Cloud Migration (#94)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* Grafana Cloud migration, adding dashboards, fleet, alloy and synthetics

* modulize stuff now that we have multiple substantial things in here

* provider updates and new secrets

* remove grafana and prometheus from ansible
2026-05-04 13:40:30 +01:00
83f023aedd
Migration to Grafana Cloud, nuremberg-a reinstalled, london-a reinsta… (#93)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* Migration to Grafana Cloud, nuremberg-a reinstalled, london-a reinstalled

* dns config for cockpit
2026-05-03 14:00:22 +01:00
d22f7a52a0
fix: clean up of terraform (#92)
Some checks failed
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-05-02 14:46:03 +01:00
03ad9b476d
make dns more neat (#91)
Some checks are pending
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
2026-05-01 21:05:53 +01:00
b5cef4b985
fix: remove cloudflare resources (#90)
Some checks failed
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
* phase 1 - add all the records to both providers to A/B test

* dkim fix

* remove cloudflare resources
2026-04-30 15:55:14 +01:00
ba04d49c4e
Clou dflaring out mayday mayday mayday (#89)
Some checks failed
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / Deploy → (push) Has been cancelled
* phase 1 - add all the records to both providers to A/B test

* dkim fix
2026-04-29 21:23:15 +01:00
dd112fd505
phase 1 - add all the records to both providers to A/B test (#88) 2026-04-29 20:47:34 +01:00
d76be4828c
fix: add ssh key resource (#80) 2026-04-26 20:08:45 +01:00
b82013c2f0
fix: actually decomission nextcloud and TWDNE (#72)
* fix: actually decomission nextcloud and TWDNE

* ignore spaces in lint and remove dns for the services

* linting on the linting config wasn't linting the lints
2026-04-25 18:19:16 +01:00
eb9f026abd
Clean up stale DNS records and Caddyfile entries (#28)
Remove webdav.pez.sh DNS record (WebDAV replaced by Nextcloud AIO on cloud.pez.sh)
Remove alertmanager.pez.sh DNS record and Caddyfile block (Alertmanager not running on london-a)
Remove status-https HTTPS record pointing to old statuspage.io (status.pez.sh is self-hosted on helsinki-a)
Remove commented-out WebDAV block from Caddyfile
Remove empty section headers for decommissioned hosts (london-c, copenhagen-b, copenhagen-c)

Closes PESO-102
2026-03-30 21:12:52 +01:00
b16f89357b
replace hard set ip with vars (#25)
* replace hard set ip with vars

* run all PR checks every time
2026-03-29 21:33:50 +01:00
4be8f73ffe
add hetzner servers terraform (#23)
Co-authored-by: Rasmus Wejlgaard <pez@Mac.localdomain>
2026-03-29 20:58:50 +01:00
258a38aeb5
Remove stale DNS records: chimera, gopher, ecp-dev, and old verification TXT (#14)
Stale A records removed:
- chimera.pez.sh → 13.43.223.167 (AWS IP reassigned, now serving unrelated site)
- gopher.pez.sh → 83.94.248.182 (unreachable on all ports)
- 0o9lix.ecp-dev.pez.sh → 0.0.0.0 (placeholder, never valid)

Stale TXT verification records removed:
- protonmail-verification (mail is self-hosted now, not ProtonMail)
- keybase-site-verification (Keybase is effectively dead)
- MS=ms99554544 (Microsoft domain verification, no active MS services)
- google-site-verification (no active Google services using this domain)
- apple-domain (no longer using Apple services after GrapheneOS switch)

PESO-97
2026-03-29 14:08:45 +01:00
8548050772
Remove dead DNS record: satisfactory.pez.sh (#7)
nuremberg-b (162.55.55.2) has been decommissioned, this record is stale.

Closes PESO-75
2026-03-28 21:37:26 +00:00
69f895c5cd
Remove bogus PTR records from Cloudflare forward zone (#6)
PTR record for 83.94.248.182 (copenhagen-a) incorrectly claimed to be
mail.pez.sh. PTR records in a forward DNS zone don't control actual
reverse DNS (that's managed by the ISP), and this record was misleading.

Also removed the mail-ptr record which had a similarly misplaced
in-addr.arpa reference in the forward zone.

Fixes PESO-76
2026-03-28 21:08:31 +00:00
b00791f1b1
Update SPF and tighten DMARC for poste.io (#5)
* update SPF record: replace protonmail with poste.io mail server

PESO-77

- replace include:_spf.protonmail.ch with ip4:167.235.134.154 and ip6:2a01:4f8:1c1e:9c53::1 (nuremberg-a / mail.pez.sh)
- tighten from ~all (softfail) to -all (hardfail)

* tighten DMARC policy from p=none to p=quarantine

PESO-78

- enforce DMARC with p=quarantine (failed messages get quarantined)
- add adkim=r and aspf=r for relaxed DKIM/SPF alignment
2026-03-28 20:46:50 +00:00
Rasmus Wejlgaard
737d6e0bc1 initial commit 2026-03-28 12:39:41 +00:00