Deploying Database Migrations#

This page is the operational runbook for database migrations across all CCAT environments. It covers how migrations are triggered, what safety mechanisms are in place, and what to do when something goes wrong. For the developer workflow and how migration scripts are written and tested, see Database Migrations.

Environment Overview#

The ops-db runs in three environments. Each has a different deployment mechanism and a different relationship to database migrations:

Environment

Host

How migrations run

Local development

Developer workstation

Manual, via alembic upgrade head or opsdb_migrate. Most developers use SQLite (no migrations needed — opsdb_init creates the schema directly) or a local PostgreSQL container.

Staging

staging-input-a (input-a.staging.data.ccat.uni-koeln.de)

Automatically by the deploy_staging Jenkins pipeline before services are restarted. Triggered by the orchestrate-deploy GitHub Actions workflow on every push to develop.

Production

prod-input-b (input-b.data.ccat.uni-koeln.de)

Automatically by the deploy_production Jenkins pipeline before services are restarted. Triggered by the orchestrate-deploy GitHub Actions workflow on every push to main.

The Deployment Sequence#

Both the staging and production Jenkins pipelines follow the same sequence. The migration stage runs before ccat update restarts any service, so the new schema is always in place before any code that depends on it starts:

┌─────────────────────────────────────┐
│  1. Pre-migration pg_dump backup    │  ← safety net written first
└──────────────────┬──────────────────┘
                   │ success
┌──────────────────▼──────────────────┐
│  2. Pull latest ops-db image        │  ← ensures opsdb_migrate is up to date
└──────────────────┬──────────────────┘
                   │ success
┌──────────────────▼──────────────────┐
│  3. Run opsdb_migrate               │  ← alembic upgrade head
└──────────────────┬──────────────────┘
                   │
        ┌──────────┴──────────┐
        │ success             │ failure
        ▼                     ▼
┌──────────────────┐  ┌──────────────────────────────┐
│  4. ccat update  │  │  4. Restore from pg_dump     │
│  (restart all    │  │  5. Pipeline aborts          │
│   services)      │  │  ccat update never runs      │
└──────────────────┘  └──────────────────────────────┘

The critical property: if opsdb_migrate fails, ccat update never runs. Services continue running against the pre-migration schema (which is still valid because the migration was rolled back). The pg_dump backup is the last resort if the automatic restore also fails.

The Safety Net in Detail#

Before running any migration, the pipeline creates a PostgreSQL custom-format dump (pg_dump -Fc) to /srv/ops-db-backups/ on the database host:

# Filename pattern
/srv/ops-db-backups/pre_migration_20260115_143022.pgdump

If opsdb_migrate exits non-zero, the pipeline immediately runs pg_restore --clean --if-exists from that file to return the database to its pre-migration state, then calls error() to abort the pipeline.

Note

Backups accumulate in /srv/ops-db-backups/ and are not automatically pruned. Periodically remove old dumps to avoid filling the disk. Keeping the last few is sufficient for rollback purposes.

The opsdb_migrate Command#

opsdb_migrate is a console-script entry point installed by the ops-db package. It runs alembic upgrade head programmatically using the alembic.ini bundled with the package. Database connection settings are resolved from dynaconf in exactly the same way as the rest of the package:

# In a Docker container or on the host, with ENV_FOR_DYNACONF set:
opsdb_migrate

Exit codes:

  • 0 — all pending migrations applied successfully (or database was already at head)

  • non-zero — migration failed; the database state is unchanged (Alembic wraps each migration in a transaction and rolls back on error)

The Jenkins pipeline treats a non-zero exit as a hard failure and triggers the restore path described above.

Running Migrations Manually#

In rare cases you may need to run a migration outside of a normal deployment — for example, when testing a migration script against the staging database before a production deploy, or when recovering from a partial failure.

SSH into the database host and run:

ssh <db-host>
cd /opt/data-center/system-integration

# Staging
docker compose -f docker-compose.yml -f docker-compose.staging.yml \
    run --rm --no-deps \
    -e ENV_FOR_DYNACONF="staging-mekleth" \
    --entrypoint opsdb_migrate \
    init-ops-db

# Production
docker compose \
    -f docker-compose.production.input-b.yml \
    -f docker-compose.production.db.init_db.yml \
    run --rm --no-deps \
    -e ENV_FOR_DYNACONF="production" \
    --entrypoint opsdb_migrate \
    init-ops-db

Always take a manual pg_dump backup before running migrations outside the pipeline — the automatic backup only happens when the pipeline runs:

docker compose -f docker-compose.production.input-b.yml \
    exec -T postgres \
    pg_dump -U ccat -Fc ccat_ops_db > /srv/ops-db-backups/manual_$(date +%Y%m%d_%H%M%S).pgdump

The Standalone migrate_db Pipeline#

A parameterised Jenkins pipeline migrate_db is available for running migrations independently of a full deployment — useful for testing a migration against staging before promoting to production, or for running a hotfix migration without triggering a full service restart cycle.

Select ENVIRONMENT: staging or ENVIRONMENT: production when triggering it. It follows the same pg_dump → migrate → restore-on-failure sequence as the deploy pipelines.

See the system-integration repository for the full pipeline source (jenkins_pipelines/migrate_db).

Checking Migration State#

To inspect what revision the database is currently at, or what scripts are pending, use the Alembic CLI with the appropriate environment settings:

# What revision is the live database at?
ENV_FOR_DYNACONF=production alembic current

# What migrations are pending (between current and head)?
ENV_FOR_DYNACONF=production alembic history -r current:head

# Full migration history
alembic history --verbose

Or inside a Docker container on the host:

docker compose -f docker-compose.production.input-b.yml \
    -f docker-compose.production.db.init_db.yml \
    run --rm --no-deps \
    -e ENV_FOR_DYNACONF="production" \
    --entrypoint "alembic current" \
    init-ops-db

Manual Rollback#

Alembic supports rolling back one migration at a time with alembic downgrade -1. Use this only as a last resort when:

  • opsdb_migrate failed and the automatic restore also failed, OR

  • a migration was applied manually and needs to be reversed before a fix is ready

Warning

Rolling back does not undo data changes, only schema changes. If the migration inserted rows or transformed data, those changes remain after a downgrade. Always verify the database state after a rollback.

# Roll back one migration
alembic downgrade -1

# Check where you are
alembic current

After rolling back, the services running against the database may be running against a schema they do not expect. Restart them from the previous image tag if necessary.

Jenkins Pipeline Source#

The full Jenkins pipeline source is in the system-integration repository:

  • jenkins_pipelines/deploy_staging — migration stage runs before ccat update on all staging hosts

  • jenkins_pipelines/deploy_production — migration stage runs before ccat update on all production hosts

  • jenkins_pipelines/migrate_db — standalone parameterised migration pipeline

See the CCAT System Integration Documentation for deployment architecture details.