Deployment#

Documentation Verified Last checked: 2026-02-23 Reviewer: Christof Buchbender

Overview#

Deployments are driven by a causal build-state orchestrator in ci/check_builds.py, triggered via GitHub Actions (orchestrate-deploy.yml) and executed by Jenkins.

Repos are divided into deploy groups. Each group has its own gate check and its own Jenkins job — a docs rebuild never blocks or is blocked by a main-stack rebuild.

Deploy groups#

Group

Repos

Jenkins jobs

default

ops-db, ops-db-api, data-transfer, ops-db-ui

deploy-staging / deploy-production

docs

data-center-documentation

deploy-data-center-documentation-develop / deploy-data-center-documentation-production

Main-stack deploy (default group)#

  1. A developer pushes to one of the main-stack repos.

  2. GitHub Actions builds and pushes the Docker image, then calls notify-build-complete.yml which sends a repository_dispatch (type build-complete) to system-integration.

  3. orchestrate-deploy.yml runs (serialised per branch via a concurrency group):

    1. Update state — records sha, image_digest, and built_with in BUILD_STATE_JSON_<BRANCH>.

    2. Cascade — dispatches workflow_dispatch to immediate downstream repos so they rebuild against the new upstream image.

    3. Cross-group dispatch — dispatches update-submodules.yml in data-center-documentation for each cross_group_trigger entry in dependency-graph.yml.

    4. Gate check — resolves the triggering repo’s deploy group, then verifies every repo in that group is causally consistent (built_with matches current upstream digest). If all green, triggers the Jenkins job via its REST API.

  4. Jenkins pipeline (deploy_staging / deploy_production):

    • SSHs into input-b.{staging.}data.ccat.uni-koeln.de

    • Runs Alembic migrations (ops-db pipeline only)

    • docker compose pull + up -d (main stack compose file)

    • Smoke test

Docs deploy (docs group)#

The docs stack is independent — it has its own compose files (docker-compose.docs.input-b.yml / docker-compose.docs.staging.input-b.yml) and its own Jenkins jobs.

The trigger chain:

  1. Any of the four tracked upstream repos publishes a new image.

  2. The orchestrator fires a cross-group workflow_dispatch to data-center-documentation, targeting update-submodules.yml.

  3. update-submodules.yml bumps all tracked submodule pointers to their latest HEAD on the dispatched branch, then commits and pushes (if anything changed).

  4. The push triggers docker-docs.yml, which builds and pushes the docs image and calls notify-build-complete with the submodule SHAs as built_with_json.

  5. The orchestrator receives the build-complete event, updates state, and runs the gate check for the docs group. The gate uses source_commit comparison: built_with[dep] is compared against state[dep].sha (git commit SHA of each submodule pointer) rather than an image digest.

  6. If all four submodule SHAs match, the Jenkins docs deploy job is triggered.

Config-only updates#

When only config changes are pushed (e.g., Docker Compose files, environment variables, monitoring configs) without rebuilding container images, a lightweight update pipeline automatically syncs those changes to all data-center machines without waiting for a full deploy cycle.

Trigger:

  1. Developer pushes to develop or main

  2. GitHub Actions workflow update-system-integration fires (triggered via workflows/update-system-integration.yml)

  3. Workflow runs on ccat-internal runner (VPN access to Jenkins)

  4. Triggers Jenkins job:

    • update-system-integration-staging (for develop branch)

    • update-system-integration-production (for main branch)

Pipeline:

  1. SSH into all machines:

    • Staging: input-a, input-b, input-c on staging environment

    • Production: input-a, input-b, input-c on production environment

  2. On each machine, run:

    cd /opt/data-center/system-integration
    CI=true ccat update -y --no-image-pull
    

    This:

    • Refreshes GitHub authentication tokens (avoiding failures on short-lived HTTPS tokens)

    • Pulls latest code from the repository

    • Runs docker compose up -d to apply config changes immediately

    • Does not pull new container images (unlike full ccat update)

Benefits:

  • Config changes are applied immediately without waiting for Docker builds

  • Avoids unnecessary rebuilds when only compose files or configs change

  • Maintains causal consistency: triggered directly by git push, not via orchestration state machine

  • Lightweight and fast: minimal overhead vs. full deploy cycle

Manual trigger (workflow_dispatch):

You can also manually trigger the workflow from the GitHub UI, selecting either staging or production environment.

To verify the update was applied:

ssh <user>@input-{a,b,c}.{staging.}data.ccat.uni-koeln.de
cd /opt/data-center/system-integration
git log -1
docker compose ps  # or: ccat ps

Causal consistency model#

Build state is stored in two GitHub repository variables: BUILD_STATE_JSON_DEVELOP and BUILD_STATE_JSON_MAIN. Each per-repo entry has the form:

{
  "sha":          "<git commit SHA>",
  "image_digest": "sha256:...",
  "ts":           "2026-01-01T00:00:00+00:00",
  "built_with":   { "ops-db": "sha256:..." }
}

The built_with_type field in ci/dependency-graph.yml controls what the gate check compares:

Type

Gate compares built_with[dep] against

runtime

state[dep].image_digest

base_image

state[dep].image_digest

source_commit

state[dep].sha (used by docs: submodule pointer vs upstream git SHA)

Redis TLS Certificate Management#

Redis connections between machines use mutual TLS. Four cert variants exist, one per Redis instance:

Variant

Redis server

Deployed to

main

input-b (production)

input-a, input-b, input-c, reuna

ccat

reuna (Chile)

input-a, input-b, input-c, reuna

develop

input-b.staging

input-a.staging, input-b.staging, input-c.staging

develop-ccat

(future staging Chile)

input-a.staging, input-b.staging, input-c.staging

All production machines receive both main and ccat certs so that services can connect to either Redis instance without future cert changes.

File permissions on remote hosts:

File

Mode

Owner

Notes

ca.crt

644

root:root

Public; establishes trust in the server cert

client.crt

644

root:root

Presented by all clients during handshake

client.key

644

root:root

World-readable — required by multi-UID containers

redis.crt

644

root:root

Server cert; only on server machines

redis.key

600

999:root

UID 999 = Redis container user; only on server machines

ca.key

local only

Never deployed; only needed for cert signing

Certs are deployed to /opt/redis-certs/{variant}/ on each machine.

Workflow#

The ccat redis-certs sub-command handles the full lifecycle:

# 1. Generate certs locally (runs 8 openssl steps, sets permissions)
ccat redis-certs generate main

# 2. Check sync status before distributing
ccat redis-certs status --variant main

# 3. Dry-run first
ccat redis-certs distribute --variant main --dry-run

# 4. Deploy to all target hosts via Ansible
ccat redis-certs distribute --variant main

# 5. Verify all machines are in sync
ccat redis-certs status --variant main

# Rotate (regenerate + distribute in one step)
ccat redis-certs rotate --variant main

The status command shows a Rich table comparing the local ca.crt SHA256 fingerprint against the fingerprint read from each remote host via Ansible ad-hoc. Use --verbose to see the full fingerprint instead of the last 16 characters.

After distributing new certs, restart Redis on each affected machine:

ssh <user>@input-b.data.ccat.uni-koeln.de
cd /opt/data-center/system-integration
docker compose restart redis

Ansible role#

The redis_certs role (ansible/roles/redis_certs/) is included in the ccat, input_staging, and input_ccat plays in playbook_setup_vms.yml and can be run in isolation via:

ccat provision --group input_ccat --tag redis_certs

Host-specific behaviour (client vs. server) is controlled by two variables:

  • redis_cert_variants — which variants to deploy on this host (set in group_vars/<group>/vars_redis_certs.yml)

  • redis_cert_server_variants — variants where this host is the Redis server (set in host_vars/<host>/redis_certs.yml for input-b and input-b.staging; overrides the group-level empty default)

Local Development#

git clone <repository_url>
cd system-integration
git submodule update --init --recursive
cp .env.example .env
# Edit .env — set POSTGRES_PASSWORD, REDIS_PASSWORD, etc.
ccat update   # or: make start_main

Staging Environment#

Staging runs on input-b.staging.data.ccat.uni-koeln.de.

Jenkins deploy-staging / deploy-data-center-documentation-develop handle staging deploys automatically when the gate check passes on the develop branch.

To deploy manually:

ssh <user>@input-b.staging.data.ccat.uni-koeln.de
cd /opt/data-center/system-integration
docker compose -f docker-compose.staging.input-b.yml pull
docker compose -f docker-compose.staging.input-b.yml up -d

Production Deployment#

Production runs on input-b.data.ccat.uni-koeln.de.

Jenkins deploy-production / deploy-data-center-documentation-production handle production deploys automatically when the gate check passes on the main branch.

To deploy manually:

ssh <user>@input-b.data.ccat.uni-koeln.de
cd /opt/data-center/system-integration
# Main stack:
docker compose -f docker-compose.production.input-b.yml pull
docker compose -f docker-compose.production.input-b.yml up -d
# Docs (independent):
docker compose -f docker-compose.docs.input-b.yml pull
docker compose -f docker-compose.docs.input-b.yml up -d