# Deploying Database Migrations

This page is the operational runbook for database migrations across all CCAT
environments. It covers how migrations are triggered, what safety mechanisms are
in place, and what to do when something goes wrong. For the developer workflow
and how migration scripts are written and tested, see {doc}`migrations`.

## Environment Overview

The ops-db runs in three environments. Each has a different deployment mechanism
and a different relationship to database migrations:

```{eval-rst}
.. list-table::
   :widths: 15 25 60
   :header-rows: 1

   * - Environment
     - Host
     - How migrations run
   * - **Local development**
     - Developer workstation
     - Manual, via ``alembic upgrade head`` or ``opsdb_migrate``.  Most
       developers use SQLite (no migrations needed — ``opsdb_init`` creates the
       schema directly) or a local PostgreSQL container.
   * - **Staging**
     - ``staging-input-a`` (``input-a.staging.data.ccat.uni-koeln.de``)
     - Automatically by the ``deploy_staging`` Jenkins pipeline before services
       are restarted.  Triggered by the ``orchestrate-deploy`` GitHub Actions
       workflow on every push to ``develop``.
   * - **Production**
     - ``prod-input-b`` (``input-b.data.ccat.uni-koeln.de``)
     - Automatically by the ``deploy_production`` Jenkins pipeline before
       services are restarted.  Triggered by the ``orchestrate-deploy`` GitHub
       Actions workflow on every push to ``main``.
```

## The Deployment Sequence

Both the staging and production Jenkins pipelines follow the same sequence.
The migration stage runs **before** `ccat update` restarts any service, so the
new schema is always in place before any code that depends on it starts:

```text
┌─────────────────────────────────────┐
│  1. Pre-migration pg_dump backup    │  ← safety net written first
└──────────────────┬──────────────────┘
                   │ success
┌──────────────────▼──────────────────┐
│  2. Pull latest ops-db image        │  ← ensures opsdb_migrate is up to date
└──────────────────┬──────────────────┘
                   │ success
┌──────────────────▼──────────────────┐
│  3. Run opsdb_migrate               │  ← alembic upgrade head
└──────────────────┬──────────────────┘
                   │
        ┌──────────┴──────────┐
        │ success             │ failure
        ▼                     ▼
┌──────────────────┐  ┌──────────────────────────────┐
│  4. ccat update  │  │  4. Restore from pg_dump     │
│  (restart all    │  │  5. Pipeline aborts          │
│   services)      │  │  ccat update never runs      │
└──────────────────┘  └──────────────────────────────┘
```

The critical property: if `opsdb_migrate` fails, `ccat update` never runs.
Services continue running against the pre-migration schema (which is still valid
because the migration was rolled back). The pg_dump backup is the last resort
if the automatic restore also fails.

## The Safety Net in Detail

Before running any migration, the pipeline creates a PostgreSQL custom-format
dump (`pg_dump -Fc`) to `/srv/ops-db-backups/` on the database host:

```bash
# Filename pattern
/srv/ops-db-backups/pre_migration_20260115_143022.pgdump
```

If `opsdb_migrate` exits non-zero, the pipeline immediately runs
`pg_restore --clean --if-exists` from that file to return the database to its
pre-migration state, then calls `error()` to abort the pipeline.

:::{note}
Backups accumulate in `/srv/ops-db-backups/` and are not automatically
pruned. Periodically remove old dumps to avoid filling the disk. Keeping
the last few is sufficient for rollback purposes.
:::

## The `opsdb_migrate` Command

`opsdb_migrate` is a console-script entry point installed by the `ops-db`
package. It runs `alembic upgrade head` programmatically using the
`alembic.ini` bundled with the package. Database connection settings are
resolved from dynaconf in exactly the same way as the rest of the package:

```bash
# In a Docker container or on the host, with ENV_FOR_DYNACONF set:
opsdb_migrate
```

Exit codes:

- `0` — all pending migrations applied successfully (or database was already at head)
- non-zero — migration failed; the database state is unchanged (Alembic wraps
  each migration in a transaction and rolls back on error)

The Jenkins pipeline treats a non-zero exit as a hard failure and triggers the
restore path described above.

## Running Migrations Manually

In rare cases you may need to run a migration outside of a normal deployment —
for example, when testing a migration script against the staging database before
a production deploy, or when recovering from a partial failure.

SSH into the database host and run:

```bash
ssh <db-host>
cd /opt/data-center/system-integration

# Staging
docker compose -f docker-compose.yml -f docker-compose.staging.yml \
    run --rm --no-deps \
    -e ENV_FOR_DYNACONF="staging-mekleth" \
    --entrypoint opsdb_migrate \
    init-ops-db

# Production
docker compose \
    -f docker-compose.production.input-b.yml \
    -f docker-compose.production.db.init_db.yml \
    run --rm --no-deps \
    -e ENV_FOR_DYNACONF="production" \
    --entrypoint opsdb_migrate \
    init-ops-db
```

Always take a manual pg_dump backup before running migrations outside the
pipeline — the automatic backup only happens when the pipeline runs:

```bash
docker compose -f docker-compose.production.input-b.yml \
    exec -T postgres \
    pg_dump -U ccat -Fc ccat_ops_db > /srv/ops-db-backups/manual_$(date +%Y%m%d_%H%M%S).pgdump
```

## The Standalone `migrate_db` Pipeline

A parameterised Jenkins pipeline `migrate_db` is available for running
migrations independently of a full deployment — useful for testing a migration
against staging before promoting to production, or for running a hotfix
migration without triggering a full service restart cycle.

Select `ENVIRONMENT: staging` or `ENVIRONMENT: production` when triggering
it. It follows the same pg_dump → migrate → restore-on-failure sequence as the
deploy pipelines.

See the [system-integration repository](https://github.com/ccatobs/system-integration)
for the full pipeline source (`jenkins_pipelines/migrate_db`).

## Checking Migration State

To inspect what revision the database is currently at, or what scripts are
pending, use the Alembic CLI with the appropriate environment settings:

```bash
# What revision is the live database at?
ENV_FOR_DYNACONF=production alembic current

# What migrations are pending (between current and head)?
ENV_FOR_DYNACONF=production alembic history -r current:head

# Full migration history
alembic history --verbose
```

Or inside a Docker container on the host:

```bash
docker compose -f docker-compose.production.input-b.yml \
    -f docker-compose.production.db.init_db.yml \
    run --rm --no-deps \
    -e ENV_FOR_DYNACONF="production" \
    --entrypoint "alembic current" \
    init-ops-db
```

## Manual Rollback

Alembic supports rolling back one migration at a time with `alembic downgrade -1`.
Use this only as a last resort when:

- `opsdb_migrate` failed and the automatic restore also failed, OR
- a migration was applied manually and needs to be reversed before a fix is ready

:::{warning}
Rolling back does not undo data changes, only schema changes. If the
migration inserted rows or transformed data, those changes remain after a
downgrade. Always verify the database state after a rollback.
:::

```bash
# Roll back one migration
alembic downgrade -1

# Check where you are
alembic current
```

After rolling back, the services running against the database may be running
against a schema they do not expect. Restart them from the previous image tag
if necessary.

## Jenkins Pipeline Source

The full Jenkins pipeline source is in the system-integration repository:

- `jenkins_pipelines/deploy_staging` — migration stage runs before `ccat update` on all staging hosts
- `jenkins_pipelines/deploy_production` — migration stage runs before `ccat update` on all production hosts
- `jenkins_pipelines/migrate_db` — standalone parameterised migration pipeline

See the {doc}`/system-integration/docs/index` for deployment architecture details.

## Related Documentation

- {doc}`migrations` — how migration scripts are written, reviewed, and tested
- {doc}`../config/index` — dynaconf environment selection and connection settings
- {doc}`../api_reference/utilities` — `opsdb_init` reference