The docs still described Cloudflare as DNS + CDN in front of helsinki-a,
but that was dropped in #90 - pez.sh lives on Hetzner DNS via Terraform
now and records point straight at the origin. Updated README,
architecture, networking, getting-started and the nuremberg-a host doc
to match, and noted that pez.solutions still resolves via Cloudflare
outside Terraform.
Also fixed while I was in there:
- terraform/README: PagerDuty provider is ~> 3.32 (table said ~> 2.2),
and the B2 secret keys are backblaze_keyID/backblaze_applicationKey
- secrets docs: group_vars secrets file is .enc.yaml, dropped the
FreeBSD install steps, the long-gone .sops.yaml placeholder note and
the ANSIBLE_VAULT_PASS migration note, swapped the cloudflare_record
example for hcloud
- getting-started referenced ansible/scripts/sops-setup.sh which
doesn't exist
- added naveen.pez.sh to the subdomain tables and a note about the
DNS-only records (mail, minecraft, wow, public)
The .terraform.lock.hcl was gitignored while providers use floating
~> constraints, so every CI 'tofu init' resolved provider versions
fresh and could drift from what was tested locally, with no checksum
verification on the providers.
Track the lock file instead, with hashes for linux_amd64 (CI) plus
darwin_arm64/amd64 (local). Dependabot's terraform updates now surface
exact provider version bumps as reviewable, hash-pinned changes.
* ci: serialize infra runs and enable terraform state locking
Add concurrency guards to the terraform and deploy-on-merge workflows so
two merges in quick succession can't run against the same state or the
same hosts at once (queue, never cancel an in-flight run).
Enable native S3 state locking (use_lockfile) on the Backblaze B2 backend,
which needs OpenTofu 1.10+, so bump the CI tofu version 1.9.0 -> 1.10.10
and the required_version constraint to >= 1.10.0.
* ci: bump tofu to 1.10.10 in the validate workflow too
Missed this one in the last commit — the PR-time validate still pinned
1.9.0, which trips the new required_version >= 1.10.0 constraint.
* ci: drop use_lockfile — Backblaze B2 can't do native state locking
B2's S3 API returns 501 NotImplemented for the conditional PutObject that
use_lockfile relies on, so tofu plan/apply fails to acquire the lock.
Revert the lockfile and the 1.10 version bump it required; rely on the
concurrency guard to serialize applies instead. Left a note in the
backend block so this isn't re-attempted.
* Grafana Cloud migration, adding dashboards, fleet, alloy and synthetics
* modulize stuff now that we have multiple substantial things in here
* provider updates and new secrets
* remove grafana and prometheus from ansible
* fix: actually decomission nextcloud and TWDNE
* ignore spaces in lint and remove dns for the services
* linting on the linting config wasn't linting the lints
Remove webdav.pez.sh DNS record (WebDAV replaced by Nextcloud AIO on cloud.pez.sh)
Remove alertmanager.pez.sh DNS record and Caddyfile block (Alertmanager not running on london-a)
Remove status-https HTTPS record pointing to old statuspage.io (status.pez.sh is self-hosted on helsinki-a)
Remove commented-out WebDAV block from Caddyfile
Remove empty section headers for decommissioned hosts (london-c, copenhagen-b, copenhagen-c)
Closes PESO-102
Stale A records removed:
- chimera.pez.sh → 13.43.223.167 (AWS IP reassigned, now serving unrelated site)
- gopher.pez.sh → 83.94.248.182 (unreachable on all ports)
- 0o9lix.ecp-dev.pez.sh → 0.0.0.0 (placeholder, never valid)
Stale TXT verification records removed:
- protonmail-verification (mail is self-hosted now, not ProtonMail)
- keybase-site-verification (Keybase is effectively dead)
- MS=ms99554544 (Microsoft domain verification, no active MS services)
- google-site-verification (no active Google services using this domain)
- apple-domain (no longer using Apple services after GrapheneOS switch)
PESO-97
PTR record for 83.94.248.182 (copenhagen-a) incorrectly claimed to be
mail.pez.sh. PTR records in a forward DNS zone don't control actual
reverse DNS (that's managed by the ISP), and this record was misleading.
Also removed the mail-ptr record which had a similarly misplaced
in-addr.arpa reference in the forward zone.
Fixes PESO-76
* update SPF record: replace protonmail with poste.io mail server
PESO-77
- replace include:_spf.protonmail.ch with ip4:167.235.134.154 and ip6:2a01:4f8:1c1e:9c53::1 (nuremberg-a / mail.pez.sh)
- tighten from ~all (softfail) to -all (hardfail)
* tighten DMARC policy from p=none to p=quarantine
PESO-78
- enforce DMARC with p=quarantine (failed messages get quarantined)
- add adkim=r and aspf=r for relaxed DKIM/SPF alignment