# Deployment ```{eval-rst} .. verified:: 2026-02-23 :reviewer: Christof Buchbender ``` ## Overview Deployments are driven by a **causal build-state orchestrator** in `ci/check_builds.py`, triggered via GitHub Actions (`orchestrate-deploy.yml`) and executed by Jenkins. Repos are divided into **deploy groups**. Each group has its own gate check and its own Jenkins job — a docs rebuild never blocks or is blocked by a main-stack rebuild. ```{eval-rst} .. list-table:: Deploy groups :header-rows: 1 :widths: 15 45 40 * - Group - Repos - Jenkins jobs * - ``default`` - ops-db, ops-db-api, data-transfer, ops-db-ui - ``deploy-staging`` / ``deploy-production`` * - ``docs`` - data-center-documentation - ``deploy-data-center-documentation-develop`` / ``deploy-data-center-documentation-production`` ``` ## Main-stack deploy (`default` group) 1. A developer pushes to one of the main-stack repos. 2. GitHub Actions builds and pushes the Docker image, then calls `notify-build-complete.yml` which sends a `repository_dispatch` (type `build-complete`) to `system-integration`. 3. `orchestrate-deploy.yml` runs (serialised per branch via a concurrency group): 1. **Update state** — records `sha`, `image_digest`, and `built_with` in `BUILD_STATE_JSON_`. 2. **Cascade** — dispatches `workflow_dispatch` to immediate downstream repos so they rebuild against the new upstream image. 3. **Cross-group dispatch** — dispatches `update-submodules.yml` in `data-center-documentation` for each `cross_group_trigger` entry in `dependency-graph.yml`. 4. **Gate check** — resolves the triggering repo's deploy group, then verifies every repo in that group is causally consistent (`built_with` matches current upstream digest). If all green, triggers the Jenkins job via its REST API. 4. Jenkins pipeline (`deploy_staging` / `deploy_production`): - SSHs into `input-b.{staging.}data.ccat.uni-koeln.de` - Runs Alembic migrations (ops-db pipeline only) - `docker compose pull + up -d` (main stack compose file) - Smoke test ## Docs deploy (`docs` group) The docs stack is **independent** — it has its own compose files (`docker-compose.docs.input-b.yml` / `docker-compose.docs.staging.input-b.yml`) and its own Jenkins jobs. The trigger chain: 1. Any of the four tracked upstream repos publishes a new image. 2. The orchestrator fires a cross-group `workflow_dispatch` to `data-center-documentation`, targeting `update-submodules.yml`. 3. `update-submodules.yml` bumps all tracked submodule pointers to their latest HEAD on the dispatched branch, then commits and pushes (if anything changed). 4. The push triggers `docker-docs.yml`, which builds and pushes the docs image and calls `notify-build-complete` with the submodule SHAs as `built_with_json`. 5. The orchestrator receives the `build-complete` event, updates state, and runs the gate check for the `docs` group. The gate uses `source_commit` comparison: `built_with[dep]` is compared against `state[dep].sha` (git commit SHA of each submodule pointer) rather than an image digest. 6. If all four submodule SHAs match, the Jenkins docs deploy job is triggered. ## Config-only updates When only config changes are pushed (e.g., Docker Compose files, environment variables, monitoring configs) without rebuilding container images, a lightweight update pipeline automatically syncs those changes to all data-center machines **without waiting for a full deploy cycle**. Trigger: 1. Developer pushes to `develop` or `main` 2. GitHub Actions workflow `update-system-integration` fires (triggered via `workflows/update-system-integration.yml`) 3. Workflow runs on `ccat-internal` runner (VPN access to Jenkins) 4. Triggers Jenkins job: - `update-system-integration-staging` (for `develop` branch) - `update-system-integration-production` (for `main` branch) Pipeline: 1. SSH into all machines: - **Staging**: input-a, input-b, input-c on staging environment - **Production**: input-a, input-b, input-c on production environment 2. On each machine, run: ``` cd /opt/data-center/system-integration CI=true ccat update -y --no-image-pull ``` This: - Refreshes GitHub authentication tokens (avoiding failures on short-lived HTTPS tokens) - Pulls latest code from the repository - Runs `docker compose up -d` to apply config changes immediately - Does **not** pull new container images (unlike full `ccat update`) Benefits: - Config changes are applied immediately without waiting for Docker builds - Avoids unnecessary rebuilds when only compose files or configs change - Maintains causal consistency: triggered directly by git push, not via orchestration state machine - Lightweight and fast: minimal overhead vs. full deploy cycle **Manual trigger** (`workflow_dispatch`): You can also manually trigger the workflow from the GitHub UI, selecting either `staging` or `production` environment. To verify the update was applied: ``` ssh @input-{a,b,c}.{staging.}data.ccat.uni-koeln.de cd /opt/data-center/system-integration git log -1 docker compose ps # or: ccat ps ``` ## Causal consistency model Build state is stored in two GitHub repository variables: `BUILD_STATE_JSON_DEVELOP` and `BUILD_STATE_JSON_MAIN`. Each per-repo entry has the form: ```json { "sha": "", "image_digest": "sha256:...", "ts": "2026-01-01T00:00:00+00:00", "built_with": { "ops-db": "sha256:..." } } ``` The `built_with_type` field in `ci/dependency-graph.yml` controls what the gate check compares: ```{eval-rst} .. list-table:: :header-rows: 1 :widths: 20 80 * - Type - Gate compares ``built_with[dep]`` against * - ``runtime`` - ``state[dep].image_digest`` * - ``base_image`` - ``state[dep].image_digest`` * - ``source_commit`` - ``state[dep].sha`` (used by docs: submodule pointer vs upstream git SHA) ``` ## Redis TLS Certificate Management Redis connections between machines use mutual TLS. Four **cert variants** exist, one per Redis instance: ```{eval-rst} .. list-table:: :header-rows: 1 :widths: 20 25 55 * - Variant - Redis server - Deployed to * - ``main`` - input-b (production) - input-a, input-b, input-c, reuna * - ``ccat`` - reuna (Chile) - input-a, input-b, input-c, reuna * - ``develop`` - input-b.staging - input-a.staging, input-b.staging, input-c.staging * - ``develop-ccat`` - *(future staging Chile)* - input-a.staging, input-b.staging, input-c.staging ``` All production machines receive **both** `main` and `ccat` certs so that services can connect to either Redis instance without future cert changes. **File permissions on remote hosts:** ```{eval-rst} .. list-table:: :header-rows: 1 :widths: 20 15 20 45 * - File - Mode - Owner - Notes * - ``ca.crt`` - 644 - root:root - Public; establishes trust in the server cert * - ``client.crt`` - 644 - root:root - Presented by all clients during handshake * - ``client.key`` - 644 - root:root - World-readable — required by multi-UID containers * - ``redis.crt`` - 644 - root:root - Server cert; only on server machines * - ``redis.key`` - 600 - 999:root - UID 999 = Redis container user; only on server machines * - ``ca.key`` - — - local only - **Never deployed**; only needed for cert signing ``` Certs are deployed to `/opt/redis-certs/{variant}/` on each machine. ### Workflow The `ccat redis-certs` sub-command handles the full lifecycle: ```bash # 1. Generate certs locally (runs 8 openssl steps, sets permissions) ccat redis-certs generate main # 2. Check sync status before distributing ccat redis-certs status --variant main # 3. Dry-run first ccat redis-certs distribute --variant main --dry-run # 4. Deploy to all target hosts via Ansible ccat redis-certs distribute --variant main # 5. Verify all machines are in sync ccat redis-certs status --variant main # Rotate (regenerate + distribute in one step) ccat redis-certs rotate --variant main ``` The `status` command shows a Rich table comparing the local `ca.crt` SHA256 fingerprint against the fingerprint read from each remote host via Ansible ad-hoc. Use `--verbose` to see the full fingerprint instead of the last 16 characters. After distributing new certs, restart Redis on each affected machine: ``` ssh @input-b.data.ccat.uni-koeln.de cd /opt/data-center/system-integration docker compose restart redis ``` ### Ansible role The `redis_certs` role (`ansible/roles/redis_certs/`) is included in the `ccat`, `input_staging`, and `input_ccat` plays in `playbook_setup_vms.yml` and can be run in isolation via: ``` ccat provision --group input_ccat --tag redis_certs ``` Host-specific behaviour (client vs. server) is controlled by two variables: - `redis_cert_variants` — which variants to deploy on this host (set in `group_vars//vars_redis_certs.yml`) - `redis_cert_server_variants` — variants where *this* host is the Redis server (set in `host_vars//redis_certs.yml` for input-b and input-b.staging; overrides the group-level empty default) ## Local Development ```bash git clone cd system-integration git submodule update --init --recursive cp .env.example .env # Edit .env — set POSTGRES_PASSWORD, REDIS_PASSWORD, etc. ccat update # or: make start_main ``` ## Staging Environment Staging runs on `input-b.staging.data.ccat.uni-koeln.de`. Jenkins `deploy-staging` / `deploy-data-center-documentation-develop` handle staging deploys automatically when the gate check passes on the `develop` branch. To deploy manually: ``` ssh @input-b.staging.data.ccat.uni-koeln.de cd /opt/data-center/system-integration docker compose -f docker-compose.staging.input-b.yml pull docker compose -f docker-compose.staging.input-b.yml up -d ``` ## Production Deployment Production runs on `input-b.data.ccat.uni-koeln.de`. Jenkins `deploy-production` / `deploy-data-center-documentation-production` handle production deploys automatically when the gate check passes on the `main` branch. To deploy manually: ``` ssh @input-b.data.ccat.uni-koeln.de cd /opt/data-center/system-integration # Main stack: docker compose -f docker-compose.production.input-b.yml pull docker compose -f docker-compose.production.input-b.yml up -d # Docs (independent): docker compose -f docker-compose.docs.input-b.yml pull docker compose -f docker-compose.docs.input-b.yml up -d ```