Commit graph

108 commits

Author SHA1 Message Date
fd435854e4 provider updates and new secrets 2026-05-04 13:30:18 +01:00
e89f062d3c modulize stuff now that we have multiple substantial things in here 2026-05-04 13:29:59 +01:00
c9aa3a07bb Grafana Cloud migration, adding dashboards, fleet, alloy and synthetics 2026-05-04 13:28:51 +01:00
83f023aedd
Migration to Grafana Cloud, nuremberg-a reinstalled, london-a reinsta… (#93)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* Migration to Grafana Cloud, nuremberg-a reinstalled, london-a reinstalled

* dns config for cockpit
2026-05-03 14:00:22 +01:00
d22f7a52a0
fix: clean up of terraform (#92)
Some checks failed
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-05-02 14:46:03 +01:00
03ad9b476d
make dns more neat (#91)
Some checks are pending
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
2026-05-01 21:05:53 +01:00
b5cef4b985
fix: remove cloudflare resources (#90)
Some checks failed
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
* phase 1 - add all the records to both providers to A/B test

* dkim fix

* remove cloudflare resources
2026-04-30 15:55:14 +01:00
ba04d49c4e
Clou dflaring out mayday mayday mayday (#89)
Some checks failed
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
Deploy (on merge) / Discover hosts (push) Has been cancelled
Deploy (on merge) / Deploy → (push) Has been cancelled
* phase 1 - add all the records to both providers to A/B test

* dkim fix
2026-04-29 21:23:15 +01:00
dd112fd505
phase 1 - add all the records to both providers to A/B test (#88) 2026-04-29 20:47:34 +01:00
e5306a5409
Fixing loki alloy (#87)
* add alloy to docker group

* fix: use docker driver instead of hacky alloy setup

* fixing linting issue
2026-04-29 20:07:40 +01:00
a51a0879d3
add alloy to docker group (#86)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-04-28 20:53:19 +01:00
6a3618aa4a
fix: Fixing loki alloy (#85)
* fix: alloy

* fix: alpine doesn't need a hacky install
2026-04-28 20:30:30 +01:00
b474e28528
fix: alloy (#84) 2026-04-28 20:10:20 +01:00
5391c500e1
fix: loki & alloy (#83)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
* fix: loki & alloy

* fix linting
2026-04-28 16:40:45 +01:00
a7f51ec10c
fix: update octo exporter (#82)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-04-27 20:10:11 +01:00
5c404dca87
fix: update octopus_exporter to v1.1.1 (#81)
Some checks failed
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
Terraform / Plan (push) Has been cancelled
Terraform / Apply (push) Has been cancelled
2026-04-26 21:01:24 +01:00
d76be4828c
fix: add ssh key resource (#80) 2026-04-26 20:08:45 +01:00
19928358c5
fix: Update node version for gha (#79)
* fix: update checkout version to dodge deprecation

* fix: more deprecations

* forgot one
2026-04-26 18:35:15 +01:00
7c3fec983b
fix: Update node version for gha (#78)
* fix: update checkout version to dodge deprecation

* fix: more deprecations
2026-04-26 18:23:22 +01:00
98be03c273
fix: update checkout version to dodge deprecation (#77) 2026-04-26 18:13:38 +01:00
1c6784eade
fix: replace tailscale authkey use with oauth (#76)
Some checks are pending
Deploy (on merge) / Discover hosts (push) Waiting to run
Deploy (on merge) / Deploy → (push) Blocked by required conditions
2026-04-26 17:30:15 +01:00
e9fbd41cb4
fix: deploy using a matrix (#75) 2026-04-26 14:35:12 +01:00
10bb940f87
fix: update living room dashboard (#74) 2026-04-26 14:09:09 +01:00
af2f462c1c
fix: prometheus retention and authelia fix (#73)
Some checks are pending
Deploy (on merge) / Deploy (push) Waiting to run
Terraform / Plan (push) Waiting to run
Terraform / Apply (push) Blocked by required conditions
* fix: prometheus retention time

* also fix bug with authelia

* linting issues

* more linting
2026-04-25 21:35:39 +01:00
b82013c2f0
fix: actually decomission nextcloud and TWDNE (#72)
* fix: actually decomission nextcloud and TWDNE

* ignore spaces in lint and remove dns for the services

* linting on the linting config wasn't linting the lints
2026-04-25 18:19:16 +01:00
35c5079d8f
fix: remove cloud and TWDNE and add energy dashboard for grafana (#71) 2026-04-25 17:46:17 +01:00
b3cc47f3d6
fix: optimize deploy playbook and get rid of deprecated stuff (#70) 2026-04-25 15:04:16 +01:00
7df62e8848
fix: adding octopus_exporter compose (#69)
* fix: adding octopus_exporter compose

* add the secret for octopus
2026-04-25 12:38:12 +01:00
56bec98afc
fix: Add octopus_exporter job configuration (#68) 2026-04-22 21:28:14 +01:00
c495b73720
template prometheus config (#67) 2026-04-21 20:44:37 +01:00
34820ee663
adding london-c (#66) 2026-04-20 20:52:19 +01:00
177fbb4014
Change provider for plex metrics (#65)
* change provider for plex metrics

* update plex token

* update plex token loading
2026-04-13 19:04:54 +01:00
2a98a89eb4
Change provider for plex metrics (#64)
* change provider for plex metrics

* update plex token
2026-04-12 21:21:24 +01:00
a0ec92dfdd
change provider for plex metrics (#63) 2026-04-12 18:45:30 +01:00
49cee191b5
fix: bind mariadb to local ip (#62) 2026-04-11 21:24:11 +01:00
1ef59ccc4a
fix: add mangos ports to firewall (#61) 2026-04-11 20:42:17 +01:00
1ab278e47a
only send email if something went wrong with backups (#60) 2026-04-06 18:33:07 +01:00
4c7ea76d81
fix: remove node_exporter from copenhagen-a systemd_services (#59)
node_exporter is deployed by the dedicated node_exporter Ansible role
using distro packages (prometheus-node-exporter). Having it in
systemd_services causes the systemd_services role to look for a
non-existent services/node_exporter/node_exporter.service file,
producing errors during deploy.

Resolves PESO-135
2026-04-04 12:51:52 +01:00
41d7876260
change provider for mc server for more configurability (#58) 2026-04-04 12:01:28 +01:00
849ea208f0
fix grafana alert rules missing relativeTimeRange (#57) 2026-04-04 09:58:13 +01:00
267b392996
Add sonarr service directory with README (#51)
Sonarr is running on london-b as an apt-managed systemd service
but was the only *arr service without a services/ directory in the
repo. Add services/sonarr/README.md documenting the install method,
data paths, and how it differs from the other *arr services.

Closes PESO-133
2026-04-04 09:31:39 +01:00
ed6eb22f60
Remove cloudflared — replaced by Caddy reverse proxy (#56)
Cloudflared tunnels are no longer used. All traffic now routes through
Cloudflare DNS to Caddy on helsinki-a over Tailscale.

- Remove cloudflared systemd unit files (copenhagen-a, london-b)
- Remove cloudflared from media_stack role and copenhagen-a host_vars
- Remove cloudflared references from services README and host docs
- Remove cloudflared deploy trigger from CI workflow

Live service on london-b stopped and disabled. copenhagen-a was
unreachable but the tunnel is unused regardless.
2026-04-03 22:51:12 +01:00
99c2091b96
Add smartctl-exporter to copenhagen-a and Prometheus scrape (#55)
- Add smartctl-exporter to copenhagen-a docker_services
- Add copenhagen-a as a Prometheus smartmontools scrape target
- Update compose file comment to reflect multi-host usage

Closes PESO-128
2026-04-03 21:20:20 +01:00
88377f3e93
fix: remove || true from compose lint so validation errors fail CI (#54)
The lint-docker-compose workflow was swallowing all validation errors with
|| true, meaning broken compose files would never fail the check.

- Remove || true and let validation failures propagate
- Add a pre-step that creates empty stubs for referenced env_file entries
  (e.g. bitwarden/settings.env) so docker compose config can validate
  structure without needing real secrets
- Track per-file pass/fail and exit non-zero if any file fails

Closes PESO-130
2026-04-03 20:50:47 +01:00
d8757d37e1
fix(london-a): correct grafana provisioning dir path (#53)
grafana.ini on london-a sets provisioning = /usr/local/etc/grafana/provisioning
but grafana_provisioning_dir pointed at /usr/local/share/grafana/conf/provisioning.

This meant deploy.yml synced alerting rules, dashboards provisioning, and
datasources to a path Grafana never reads — a from-scratch deploy would have
broken alerting entirely.

Fixes PESO-131
2026-04-03 20:20:15 +01:00
25d201f930
Add copenhagen-a to docker_hosts and wire up minecraft docker service (#52)
- Add copenhagen-a to [docker_hosts] inventory group so the docker role
  runs on it in Stage 2
- Add docker_services: [minecraft] to copenhagen-a host_vars
- Add docker_services role to Stage 4d (copenhagen-a) in deploy.yml
- Update deploy-on-merge scope mapping to include copenhagen-a for
  docker role changes

Closes PESO-132
2026-04-03 19:50:51 +01:00
dca6a08ba1
Remove cloudflared from london-a (PESO-134) (#50)
cloudflared has been replaced by Caddy + Authelia. Removed:
- cloudflared service config (services/cloudflared/london-a/)
- tunnel ID from london-a host_vars
- cloudflared_enable from rc.conf

Also synced rc.conf with live server state (disabled services
from PESO-113, added node_exporter_listen_address).

Live server: stopped service, removed from rc.conf, uninstalled pkg.
2026-04-03 18:51:51 +01:00
a31f8b5651
Add systemd_exporter Ansible role and Prometheus scrape config (#49)
* Add systemd_exporter Ansible role and Prometheus scrape config

- Create systemd_exporter role (download binary, create user, deploy service)
- Add scrape job for london-b:9558 and copenhagen-a:9558
- Add systemd_exporter_hosts inventory group
- Add stage 3b to deploy.yml
- Map role to deploy-on-merge scope

Closes PESO-120

* Fix line length lint violations in systemd_exporter tasks

* Fix var-naming lint: use systemd_exporter_ prefix for role variables
2026-04-03 12:23:38 +01:00
8f5eb385cc
Remove copenhagen-a from docker role mapping in deploy-on-merge (#48)
copenhagen-a is not in [docker_hosts] inventory group. Running the
docker role play against it just gets skipped, wasting CI time.

Fixes PESO-121
2026-04-03 11:49:41 +01:00
029c35fba6
Replace ASCII diagrams with mermaid in docs (#47)
Convert remaining ASCII art diagrams to mermaid syntax:
- monitoring.md: stack overview diagram
- networking.md: Tailscale mesh diagram + DNS request flow

architecture.md already used mermaid, no changes needed.

PESO-123
2026-04-03 10:48:41 +01:00