Commit graph

100 commits

Author SHA1 Message Date
4cdb2d3fe4
fix: add n8n deployment to nuremberg-a (#139)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Terraform / Plan (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-06-26 19:54:30 +01:00
ac8dabe9a4
media_stack: capture london-b sonarr.service unit in repo (PESO-140) (#133)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
sonarr was the only *arr service without its systemd unit in the repo —
it was treated as package-managed and never captured, so a london-b
rebuild would lose the unit. Capture the running unit (APT/mono Sonarr
v3) into ansible/services/sonarr/sonarr.service and have the media_stack
role deploy it to /etc/systemd/system like radarr/lidarr/prowlarr,
overriding the package-owned copy. Move sonarr out of the
package-managed enable loop into the custom-unit deploy + enable loops.
2026-06-14 21:10:43 +01:00
8665a5fe99
remove stale promtail/rc.d leftovers, rss DNS record, fix london-c host description (#131) 2026-06-12 19:24:39 +01:00
0c00a3cb4d
docs: remove decommissioned Miniflux refs; fix status-page + minor drift (#129)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
2026-06-09 19:49:16 +01:00
9d56a22c30
Ansible-manage docker-log-cleanup script and cron (PESO-142) (#128)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
docker-log-cleanup.sh lived in the repo but nothing deployed it — the
script and monthly cron on nuremberg-a were set up by hand and got wiped
when the host was reinstalled. Fold both into the docker role so every
docker_hosts member gets the script in /usr/local/bin and a monthly cron,
and it survives a rebuild.
2026-06-08 18:38:19 +01:00
3945b8cafc
remove miniflux — decommissioned (#127)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
Stopped and removed containers on london-b. Removed compose definition,
Caddy reverse proxy route for rss.pez.sh, and london-b host_vars entry.
2026-06-07 18:07:11 +01:00
9ac179dbec
Make Alloy resilient to transient failures; remove leftover Grafana (PESO-149) (#126)
copenhagen-c stopped reporting to Grafana Cloud on 2026-05-20: a transient
TLS failure to fleet-management tripped systemd's default start rate-limit,
systemd gave up, and the host sat silently unmonitored for ~2.5 weeks.

Add a 10-resilience.conf systemd drop-in for alloy.service on every host
(StartLimitIntervalSec=0, Restart=always, RestartSec=30) so a momentary
upstream/TLS blip can no longer permanently kill the collector.

Also drop the old self-hosted Grafana package that was left enabled and
failing on copenhagen-c after the move to Grafana Cloud.
2026-06-07 14:30:08 +01:00
81efa1b717
Remove stale cloudflared service from copenhagen-a (PESO-138) (#125)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
cloudflared was retired in #56 when Caddy + Authelia replaced Cloudflare
Tunnels, but copenhagen-a was unreachable at the time so its
cloudflared.service was never stopped and is still running.

Add a cleanup task to the common role that stops, disables and purges
cloudflared wherever the unit lingers. Gated on the unit file existing so
it self-targets copenhagen-a and is a no-op everywhere else, and explicitly
excludes copenhagen-c, which legitimately runs a hand-configured tunnel.
2026-06-07 11:45:35 +01:00
3871dc8f90
Restrict london-b Samba (445) to LAN + Tailscale, off public internet (#124)
Samba on london-b was allowed on 445/tcp from anywhere via UFW, exposing
SMB/CIFS to the public internet. Tailscale already reaches it through the
tailscale0 allow-all rule, so scope the explicit rule to the local London
LAN (192.168.1.0/24) instead of the world.

The common UFW task only ever adds allow rules, so it gained support for an
optional per-port from_ip, plus a follow-up task that deletes the superseded
world-open variant of any source-restricted port — otherwise the old
'445 ALLOW Anywhere' rule would linger on the host and defeat the change.

PESO-145
2026-06-07 11:37:45 +01:00
644b608831
chore: retire readarr service, replaced by bookshelf (#123)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
Bookshelf (PR #122) is a Readarr revival and now owns port 8787 on
london-b, so the old custom Readarr systemd unit is removed:

- drop readarr from the media_stack role's unit-deploy and enable loops,
  and add an idempotent decommission task (stop, disable, remove unit)
  so the host tears it down via Ansible rather than ad-hoc SSH
- delete services/readarr/readarr.service
- update docs (services, london-b host, service inventory) to describe
  bookshelf as a Docker service instead of a custom systemd unit

The public readarr.pez.sh hostname is kept and now reverse-proxies to
bookshelf on :8787 — DNS, Caddy and Authelia (pez_readarr_users group)
are unchanged.
2026-06-06 15:50:37 +01:00
98ac065056
feat: add bookshelf service on london-b (#122)
Bookshelf (a Readarr revival) for managing the ebook/audiobook library.
Runs on london-b with config at /root/bookshelf and the library at
/hdd/books mounted into the container at the same path.
2026-06-06 15:34:57 +01:00
a40cd60d60
backup: keep deleted/overwritten versions instead of mirroring them away (#120)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / deploy (push) Blocked by required conditions
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
The nightly job runs 'rclone sync', which permanently deletes or overwrites
objects at the B2 destination. That means an accidental deletion or a
ransomware encryption on /hdd propagates straight to the backup on the next
run, leaving no clean copy.

Add --backup-dir so every superseded version is moved into a dated folder
under _versions/ rather than thrown away, and prune that folder after 30
days so it doesn't grow unbounded.
2026-06-05 21:23:04 +01:00
9815f44b84
fix: stop masking failed service deploys; trim dead config (#119)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
The docker_services and systemd_services roles ran their "start the
service" tasks with `failed_when: false`, so a container or unit that
failed to come up still reported the deploy as green. Drop it from both
start tasks so a broken deploy actually fails CI. The compose/unit *copy*
tasks keep `failed_when: false` — that's load-bearing for the
`item is not failed` filter that skips services without a compose/unit file.

Also:
- Remove a duplicate "Template service .env files" task in docker_services
  (second copy used a hardcoded path and didn't register; first one is the
  one the start task reads).
- Don't trigger a full fleet deploy on docs/markdown/workflow-only pushes
  to main — add docs/**, **/*.md and .github/** to paths-ignore.
- Drop the dangling `update-freebsd` Make target (playbook doesn't exist;
  fleet has no FreeBSD hosts).
2026-06-04 18:41:24 +01:00
45dff99e7c
fix: update octopus exporter (#113)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
2026-05-26 20:56:07 +01:00
a031d4218b
fix: Documentation overhaul (#112)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / deploy (push) Has been cancelled
* fix: Documentation overhaul

* removing joke graph
2026-05-19 18:49:21 +01:00
9f84652102
fix: cleanup deploy.yml and share workflow (#108)
* fix: cleanup deploy.yml and share workflow

* lint issue
2026-05-15 20:17:28 +01:00
69145b3089
fix: add smb mount (#107)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
* fix: add smb mount

* update secrets

* address linting issues
2026-05-14 20:49:25 +01:00
5481292b7f
fix: remove subscription nag and lock down proxmox (#106)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-05-13 21:09:54 +01:00
d3b516c594
fix: cleanup freebsd and alpine stuff (#105)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-05-12 22:43:12 +01:00
e502a92451
fix: tracing on caddy services (#104)
Some checks failed
Deploy (on merge) / Discover hosts (push) Has been cancelled
Terraform / Plan (push) Has been cancelled
Deploy (on merge) / Deploy → (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-05-10 10:18:53 +01:00
b5d5537c1f
Proxmox ve on london a (#102)
* fix: update config for london-a for new proxmox install

* fix: update proxmox endpoint
2026-05-09 19:29:44 +01:00
928d1d0b99
fix: update config for london-a for new proxmox install (#101) 2026-05-09 19:22:34 +01:00
7d22ad1ce1
bug: add retry to restarting caddy (#97)
Some checks failed
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / Deploy → (push) Has been cancelled
* bug: add retry to restarting caddy

* skip terraform pipeline when no terraform changes has been done
2026-05-05 20:42:52 +01:00
abb283c1d7
terraform plan on pr and caddy metrics on localhost since we have all… (#96)
* terraform plan on pr and caddy metrics on localhost since we have alloy now

* remove refreshing state
2026-05-05 13:35:37 +01:00
043c783361
Grafana Cloud Migration (#94)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* Grafana Cloud migration, adding dashboards, fleet, alloy and synthetics

* modulize stuff now that we have multiple substantial things in here

* provider updates and new secrets

* remove grafana and prometheus from ansible
2026-05-04 13:40:30 +01:00
83f023aedd
Migration to Grafana Cloud, nuremberg-a reinstalled, london-a reinsta… (#93)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* Migration to Grafana Cloud, nuremberg-a reinstalled, london-a reinstalled

* dns config for cockpit
2026-05-03 14:00:22 +01:00
e5306a5409
Fixing loki alloy (#87)
* add alloy to docker group

* fix: use docker driver instead of hacky alloy setup

* fixing linting issue
2026-04-29 20:07:40 +01:00
a51a0879d3
add alloy to docker group (#86)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-04-28 20:53:19 +01:00
6a3618aa4a
fix: Fixing loki alloy (#85)
* fix: alloy

* fix: alpine doesn't need a hacky install
2026-04-28 20:30:30 +01:00
b474e28528
fix: alloy (#84) 2026-04-28 20:10:20 +01:00
5391c500e1
fix: loki & alloy (#83)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
* fix: loki & alloy

* fix linting
2026-04-28 16:40:45 +01:00
a7f51ec10c
fix: update octo exporter (#82)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-04-27 20:10:11 +01:00
5c404dca87
fix: update octopus_exporter to v1.1.1 (#81)
Some checks failed
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-04-26 21:01:24 +01:00
10bb940f87
fix: update living room dashboard (#74) 2026-04-26 14:09:09 +01:00
af2f462c1c
fix: prometheus retention and authelia fix (#73)
Some checks are pending
Deploy (on merge) / Deploy (push) Waiting to run
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* fix: prometheus retention time

* also fix bug with authelia

* linting issues

* more linting
2026-04-25 21:35:39 +01:00
b82013c2f0
fix: actually decomission nextcloud and TWDNE (#72)
* fix: actually decomission nextcloud and TWDNE

* ignore spaces in lint and remove dns for the services

* linting on the linting config wasn't linting the lints
2026-04-25 18:19:16 +01:00
35c5079d8f
fix: remove cloud and TWDNE and add energy dashboard for grafana (#71) 2026-04-25 17:46:17 +01:00
b3cc47f3d6
fix: optimize deploy playbook and get rid of deprecated stuff (#70) 2026-04-25 15:04:16 +01:00
7df62e8848
fix: adding octopus_exporter compose (#69)
* fix: adding octopus_exporter compose

* add the secret for octopus
2026-04-25 12:38:12 +01:00
56bec98afc
fix: Add octopus_exporter job configuration (#68) 2026-04-22 21:28:14 +01:00
c495b73720
template prometheus config (#67) 2026-04-21 20:44:37 +01:00
34820ee663
adding london-c (#66) 2026-04-20 20:52:19 +01:00
177fbb4014
Change provider for plex metrics (#65)
* change provider for plex metrics

* update plex token

* update plex token loading
2026-04-13 19:04:54 +01:00
2a98a89eb4
Change provider for plex metrics (#64)
* change provider for plex metrics

* update plex token
2026-04-12 21:21:24 +01:00
a0ec92dfdd
change provider for plex metrics (#63) 2026-04-12 18:45:30 +01:00
49cee191b5
fix: bind mariadb to local ip (#62) 2026-04-11 21:24:11 +01:00
1ef59ccc4a
fix: add mangos ports to firewall (#61) 2026-04-11 20:42:17 +01:00
1ab278e47a
only send email if something went wrong with backups (#60) 2026-04-06 18:33:07 +01:00
4c7ea76d81
fix: remove node_exporter from copenhagen-a systemd_services (#59)
node_exporter is deployed by the dedicated node_exporter Ansible role
using distro packages (prometheus-node-exporter). Having it in
systemd_services causes the systemd_services role to look for a
non-existent services/node_exporter/node_exporter.service file,
producing errors during deploy.

Resolves PESO-135
2026-04-04 12:51:52 +01:00
41d7876260
change provider for mc server for more configurability (#58) 2026-04-04 12:01:28 +01:00