# Deployment

```{eval-rst}
.. verified:: 2026-02-23
   :reviewer: Christof Buchbender
```

## Overview

Deployments are driven by a **causal build-state orchestrator** in
`ci/check_builds.py`, triggered via GitHub Actions
(`orchestrate-deploy.yml`) and executed by Jenkins.

Repos are divided into **deploy groups**. Each group has its own gate
check and its own Jenkins job — a docs rebuild never blocks or is
blocked by a main-stack rebuild.

```{eval-rst}
.. list-table:: Deploy groups
   :header-rows: 1
   :widths: 15 45 40

   * - Group
     - Repos
     - Jenkins jobs
   * - ``default``
     - ops-db, ops-db-api, data-transfer, ops-db-ui
     - ``deploy-staging`` / ``deploy-production``
   * - ``docs``
     - data-center-documentation
     - ``deploy-data-center-documentation-develop`` /
       ``deploy-data-center-documentation-production``

```

## Main-stack deploy (`default` group)

1. A developer pushes to one of the main-stack repos.

2. GitHub Actions builds and pushes the Docker image, then calls
   `notify-build-complete.yml` which sends a `repository_dispatch`
   (type `build-complete`) to `system-integration`.

3. `orchestrate-deploy.yml` runs (serialised per branch via a
   concurrency group):

   1. **Update state** — records `sha`, `image_digest`, and
      `built_with` in `BUILD_STATE_JSON_<BRANCH>`.
   2. **Cascade** — dispatches `workflow_dispatch` to immediate
      downstream repos so they rebuild against the new upstream image.
   3. **Cross-group dispatch** — dispatches `update-submodules.yml`
      in `data-center-documentation` for each `cross_group_trigger`
      entry in `dependency-graph.yml`.
   4. **Gate check** — resolves the triggering repo's deploy group,
      then verifies every repo in that group is causally consistent
      (`built_with` matches current upstream digest). If all green,
      triggers the Jenkins job via its REST API.

4. Jenkins pipeline (`deploy_staging` / `deploy_production`):

   - SSHs into `input-b.{staging.}data.ccat.uni-koeln.de`
   - Runs Alembic migrations (ops-db pipeline only)
   - `docker compose pull + up -d` (main stack compose file)
   - Smoke test

## Docs deploy (`docs` group)

The docs stack is **independent** — it has its own compose files
(`docker-compose.docs.input-b.yml` /
`docker-compose.docs.staging.input-b.yml`) and its own Jenkins jobs.

The trigger chain:

1. Any of the four tracked upstream repos publishes a new image.
2. The orchestrator fires a cross-group `workflow_dispatch` to
   `data-center-documentation`, targeting `update-submodules.yml`.
3. `update-submodules.yml` bumps all tracked submodule pointers to
   their latest HEAD on the dispatched branch, then commits and pushes
   (if anything changed).
4. The push triggers `docker-docs.yml`, which builds and pushes the
   docs image and calls `notify-build-complete` with the submodule
   SHAs as `built_with_json`.
5. The orchestrator receives the `build-complete` event, updates
   state, and runs the gate check for the `docs` group. The gate
   uses `source_commit` comparison: `built_with[dep]` is compared
   against `state[dep].sha` (git commit SHA of each submodule pointer)
   rather than an image digest.
6. If all four submodule SHAs match, the Jenkins docs deploy job is
   triggered.

## Config-only updates

When only config changes are pushed (e.g., Docker Compose files, environment
variables, monitoring configs) without rebuilding container images, a lightweight
update pipeline automatically syncs those changes to all data-center machines
**without waiting for a full deploy cycle**.

Trigger:

1. Developer pushes to `develop` or `main`

2. GitHub Actions workflow `update-system-integration` fires (triggered via
   `workflows/update-system-integration.yml`)

3. Workflow runs on `ccat-internal` runner (VPN access to Jenkins)

4. Triggers Jenkins job:

   - `update-system-integration-staging` (for `develop` branch)
   - `update-system-integration-production` (for `main` branch)

Pipeline:

1. SSH into all machines:

   - **Staging**: input-a, input-b, input-c on staging environment
   - **Production**: input-a, input-b, input-c on production environment

2. On each machine, run:

   ```
   cd /opt/data-center/system-integration
   CI=true ccat update -y --no-image-pull
   ```

   This:

   - Refreshes GitHub authentication tokens (avoiding failures on short-lived
     HTTPS tokens)
   - Pulls latest code from the repository
   - Runs `docker compose up -d` to apply config changes immediately
   - Does **not** pull new container images (unlike full `ccat update`)

Benefits:

- Config changes are applied immediately without waiting for Docker builds
- Avoids unnecessary rebuilds when only compose files or configs change
- Maintains causal consistency: triggered directly by git push, not via
  orchestration state machine
- Lightweight and fast: minimal overhead vs. full deploy cycle

**Manual trigger** (`workflow_dispatch`):

You can also manually trigger the workflow from the GitHub UI, selecting
either `staging` or `production` environment.

To verify the update was applied:

```
ssh <user>@input-{a,b,c}.{staging.}data.ccat.uni-koeln.de
cd /opt/data-center/system-integration
git log -1
docker compose ps  # or: ccat ps
```

## Causal consistency model

Build state is stored in two GitHub repository variables:
`BUILD_STATE_JSON_DEVELOP` and `BUILD_STATE_JSON_MAIN`. Each
per-repo entry has the form:

```json
{
  "sha":          "<git commit SHA>",
  "image_digest": "sha256:...",
  "ts":           "2026-01-01T00:00:00+00:00",
  "built_with":   { "ops-db": "sha256:..." }
}
```

The `built_with_type` field in `ci/dependency-graph.yml` controls
what the gate check compares:

```{eval-rst}
.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Type
     - Gate compares ``built_with[dep]`` against
   * - ``runtime``
     - ``state[dep].image_digest``
   * - ``base_image``
     - ``state[dep].image_digest``
   * - ``source_commit``
     - ``state[dep].sha``  (used by docs: submodule pointer vs upstream
       git SHA)

```

## Redis TLS Certificate Management

Redis connections between machines use mutual TLS. Four **cert variants** exist,
one per Redis instance:

```{eval-rst}
.. list-table::
   :header-rows: 1
   :widths: 20 25 55

   * - Variant
     - Redis server
     - Deployed to
   * - ``main``
     - input-b (production)
     - input-a, input-b, input-c, reuna
   * - ``ccat``
     - reuna (Chile)
     - input-a, input-b, input-c, reuna
   * - ``develop``
     - input-b.staging
     - input-a.staging, input-b.staging, input-c.staging
   * - ``develop-ccat``
     - *(future staging Chile)*
     - input-a.staging, input-b.staging, input-c.staging
```

All production machines receive **both** `main` and `ccat` certs so that
services can connect to either Redis instance without future cert changes.

**File permissions on remote hosts:**

```{eval-rst}
.. list-table::
   :header-rows: 1
   :widths: 20 15 20 45

   * - File
     - Mode
     - Owner
     - Notes
   * - ``ca.crt``
     - 644
     - root:root
     - Public; establishes trust in the server cert
   * - ``client.crt``
     - 644
     - root:root
     - Presented by all clients during handshake
   * - ``client.key``
     - 644
     - root:root
     - World-readable — required by multi-UID containers
   * - ``redis.crt``
     - 644
     - root:root
     - Server cert; only on server machines
   * - ``redis.key``
     - 600
     - 999:root
     - UID 999 = Redis container user; only on server machines
   * - ``ca.key``
     - —
     - local only
     - **Never deployed**; only needed for cert signing
```

Certs are deployed to `/opt/redis-certs/{variant}/` on each machine.

### Workflow

The `ccat redis-certs` sub-command handles the full lifecycle:

```bash
# 1. Generate certs locally (runs 8 openssl steps, sets permissions)
ccat redis-certs generate main

# 2. Check sync status before distributing
ccat redis-certs status --variant main

# 3. Dry-run first
ccat redis-certs distribute --variant main --dry-run

# 4. Deploy to all target hosts via Ansible
ccat redis-certs distribute --variant main

# 5. Verify all machines are in sync
ccat redis-certs status --variant main

# Rotate (regenerate + distribute in one step)
ccat redis-certs rotate --variant main
```

The `status` command shows a Rich table comparing the local `ca.crt` SHA256
fingerprint against the fingerprint read from each remote host via Ansible ad-hoc.
Use `--verbose` to see the full fingerprint instead of the last 16 characters.

After distributing new certs, restart Redis on each affected machine:

```
ssh <user>@input-b.data.ccat.uni-koeln.de
cd /opt/data-center/system-integration
docker compose restart redis
```

### Ansible role

The `redis_certs` role (`ansible/roles/redis_certs/`) is included in the
`ccat`, `input_staging`, and `input_ccat` plays in
`playbook_setup_vms.yml` and can be run in isolation via:

```
ccat provision --group input_ccat --tag redis_certs
```

Host-specific behaviour (client vs. server) is controlled by two variables:

- `redis_cert_variants` — which variants to deploy on this host (set in
  `group_vars/<group>/vars_redis_certs.yml`)
- `redis_cert_server_variants` — variants where *this* host is the Redis
  server (set in `host_vars/<host>/redis_certs.yml` for input-b and
  input-b.staging; overrides the group-level empty default)

## Local Development

```bash
git clone <repository_url>
cd system-integration
git submodule update --init --recursive
cp .env.example .env
# Edit .env — set POSTGRES_PASSWORD, REDIS_PASSWORD, etc.
ccat update   # or: make start_main
```

## Staging Environment

Staging runs on `input-b.staging.data.ccat.uni-koeln.de`.

Jenkins `deploy-staging` / `deploy-data-center-documentation-develop`
handle staging deploys automatically when the gate check passes on the
`develop` branch.

To deploy manually:

```
ssh <user>@input-b.staging.data.ccat.uni-koeln.de
cd /opt/data-center/system-integration
docker compose -f docker-compose.staging.input-b.yml pull
docker compose -f docker-compose.staging.input-b.yml up -d
```

## Production Deployment

Production runs on `input-b.data.ccat.uni-koeln.de`.

Jenkins `deploy-production` / `deploy-data-center-documentation-production`
handle production deploys automatically when the gate check passes on
the `main` branch.

To deploy manually:

```
ssh <user>@input-b.data.ccat.uni-koeln.de
cd /opt/data-center/system-integration
# Main stack:
docker compose -f docker-compose.production.input-b.yml pull
docker compose -f docker-compose.production.input-b.yml up -d
# Docs (independent):
docker compose -f docker-compose.docs.input-b.yml pull
docker compose -f docker-compose.docs.input-b.yml up -d
```