Secrets Management & .env Setup#

Overview#

CCAT Data Center automates secrets provisioning through Ansible Vault. This replaces manual .env file management with a declarative, auditable, encrypted-in-git approach.

Key principles:

  • Production/staging compose files fail immediately if secrets are missing (no silent fallback)

  • Dev compose retains fallback values for developer convenience

  • All secrets are stored in Ansible vault files (encrypted at rest)

  • Deployment is idempotent and can be run repeatedly without side effects

Issue#

Prior to this implementation (Issue #49), hardcoded secrets were scattered across:

  1. Category A — fully hardcoded literals in compose files (MinIO, Infisical DB, pgadmin, replication passwords)

  2. Category B — silent ${VAR:-weak_default} fallbacks in production (users never noticed missing .env)

  3. Category C — plaintext user passwords in data-center/users/users.toml (tracked separately)

Without automation, operators left compose fallbacks active and never created .env files.

Solution#

Three components work together:

  1. Ansible role (ansible/roles/application_env/) - Templates .env file with vault variables - Deploys to /opt/data-center/system-integration/.env on each host - Idempotent and tagged for selective execution

  2. Vault storage (ansible/group_vars/input_*/vault_application.yml) - Encrypted YAML files with production and staging secrets - Distributed securely via Ansible vault key - Never committed to git in plaintext

  3. CLI command (ccat secrets provision) - Operator-friendly wrapper around Ansible - Supports --dry-run for preview and --host for targeting - Part of the ctl management tool

Architecture#

Deployment Flow#

Developer / Operator
      |
      | runs: ccat secrets provision [--host HOST]
      v
ctl script (system-integration/ctl)
      |
      | locates ansible at: REPO_ROOT / "ansible"
      | reads vault key: ansible/.ansible_vault_key
      v
ansible-playbook playbook_setup_vms.yml
      |
      | loads: vars_application_schema.yml (defines all variables)
      | loads: group_vars/input_*/vault_application.yml (vault decryption)
      | loads: group_vars/input_*/vars_application.yml (non-secret config)
      v
application_env role
      |
      | loops through schema variables
      | pulls values from vault + defaults
      | generates .env dynamically from schema
      v
Host: /opt/data-center/system-integration/.env ✓ provisioned

Variable Sources#

The .env file combines three sources:

  1. Vault variables (secrets, encrypted) - Location: group_vars/{input_ccat,input_staging}/vault_application.yml - Schema: vars_application_schema.yml (defines all vault variables) - Examples: vault_postgres_password, vault_redis_password, vault_minio_password - ENCRYPTED in git (must be decrypted with .ansible_vault_key) - Auto-managed via ccat secrets add/set/remove commands

  2. Config variables (non-secret) - Location: group_vars/{input_ccat,input_staging}/vars_application.yml - Examples: gf_server_root_url, influxdb_org, gf_github_allowed_orgs - Plain YAML, readable in git

  3. Template defaults (fallback) - In roles/application_env/templates/env.j2 - Examples: default('admin'), default('ccat_metrics') - Used only if config var is not provided

Single Source of Truth

The vars_application_schema.yml file is the schema that: - Defines all possible vault variables with their env var names - Auto-syncs with CLI tab completion - Drives dynamic Ansible template generation - When you add a variable to the schema, it automatically appears in .env generation

Compose File Changes#

Dev Compose (docker-compose.yml)#

Dev retains fallback values for developer convenience:

minio:
  environment:
    MINIO_ROOT_USER: ${MINIO_ROOT_USER:-minio_access_key}
    MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD:-minio_secret_key}

Developers can override with .env or environment variables. If neither exists, the fallback is used. This is intentional—dev should work without .env setup.

Production & Staging Compose#

Production and staging fail immediately if required secrets are missing:

redis:
  command: >
    redis-server
    --requirepass ${REDIS_PASSWORD:?REDIS_PASSWORD must be set in .env}
    ...

grafana:
  environment:
    - GF_SECURITY_ADMIN_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD:?GF_SECURITY_ADMIN_PASSWORD must be set in .env}

The :? syntax in Bash variable expansion requires the variable to be set, otherwise Docker Compose exits with a clear error message before starting any container.

Supported fail-fast secrets:

  • REDIS_PASSWORD

  • POSTGRES_PASSWORD

  • POSTGRES_REPLICATION_PASSWORD

  • INFISICAL_DB_PASSWORD

  • PGADMIN_PASSWORD

  • GF_SECURITY_ADMIN_PASSWORD

  • DOCKER_INFLUXDB_INIT_PASSWORD

  • DOCKER_INFLUXDB_INIT_ADMIN_TOKEN

Dynaconf Bridge Variables#

Application containers (ops-db-api, data-transfer services) use Dynaconf for settings. Dynaconf looks for prefixed environment variables like CCAT_OPS_DB_API_MAIN_DB_PASSWORD.

To allow .env secrets to be picked up by Dynaconf, we bridge them in the compose file:

ops-db-api:
  environment:
    - ENV_FOR_DYNACONF="production"
    - CCAT_OPS_DB_API_MAIN_DB_PASSWORD=${POSTGRES_PASSWORD}
    - CCAT_OPS_DB_API_LOCAL_DB_PASSWORD=${POSTGRES_PASSWORD}
    - CCAT_OPS_DB_API_REDIS_PASSWORD=${REDIS_PASSWORD}
  env_file:
    - .env

CLI Usage#

Schema Management (Register Variables)#

Register a new vault variable:

ccat secrets add vault_my_secret --env production

Prompts for: - Environment variable name (e.g., MY_SECRET) - Description - Initial value

Adds to schema and vault in one step.

Update an existing secret:

ccat secrets set vault_postgres_password --env production

If the variable is new (not in schema), offers to add it.

Rotate a secret with a secure random value:

ccat secrets rotate vault_redis_password --env production

Generates a cryptographically secure token and updates the vault.

Remove a secret:

ccat secrets remove vault_old_secret

Removes from both schema and vault (all environments).

View all secrets:

ccat secrets show --env production           # masked by default
ccat secrets show vault_redis_password --env production --reveal  # show actual value

Provision to Hosts#

Provision all hosts:

Deploy .env on all inventory hosts:

ccat secrets provision

This runs:

ansible-playbook \
  -i ansible/inventory.ini \
  --vault-password-file ansible/.ansible_vault_key \
  --tags env \
  ansible/playbook_setup_vms.yml

Provision a Specific Host#

Deploy .env to only input-staging:

ccat secrets provision --host input-staging

Translates to:

ansible-playbook \
  ... \
  --limit input-staging \
  ...

Dry-Run (Preview)#

Show what would be deployed without making changes:

ccat secrets provision --dry-run

Or for a specific host:

ccat secrets provision --host input-ccat --dry-run

This adds --check mode to Ansible, which: - Reads the vault and template - Shows what would be written - Does NOT actually write to disk

Setup & Deployment#

Initial Setup (One-Time)#

  1. Encrypt vault files:

    cd ansible
    ansible-vault encrypt group_vars/input_ccat/vault_application.yml
    ansible-vault encrypt group_vars/input_staging/vault_application.yml
    

    You’ll be prompted for a vault password. This password is stored in .ansible_vault_key (git-ignored, only shared securely with operators).

  2. Add secrets using the CLI (recommended):

    The easiest way is to use ccat secrets add for each variable:

    # Add database password
    ccat secrets add vault_postgres_password --env production
    # → Prompts for env var name (POSTGRES_PASSWORD), description, and value
    
    # Add redis password
    ccat secrets add vault_redis_password --env production
    
    # Add other secrets...
    ccat secrets add vault_minio_password --env production
    ccat secrets add vault_pgadmin_password --env production
    ccat secrets add vault_gf_admin_password --env production
    # etc.
    

    Each command adds the variable to both the schema and vault automatically.

    Or, manually edit the vault file:

    If you prefer manual editing:

    ansible-vault edit group_vars/input_ccat/vault_application.yml
    

    Add real passwords/tokens:

    vault_postgres_password: "your-strong-postgres-pw"
    vault_redis_password: "your-strong-redis-pw"
    vault_minio_password: "your-minio-secret"
    vault_pgadmin_password: "your-pgadmin-pw"
    vault_gf_admin_password: "your-grafana-admin-pw"
    vault_infisical_db_password: "your-infisical-db-pw"
    vault_postgres_replication_password: "your-replication-pw"
    vault_influxdb_password: "your-influxdb-pw"
    vault_influxdb_token: "your-influxdb-token"
    vault_gf_github_client_id: ""
    vault_gf_github_client_secret: ""
    

    Then manually add each variable to vars_application_schema.yml:

    vault_postgres_password:
      env_name: POSTGRES_PASSWORD
      description: "PostgreSQL admin password"
      added: "2026-02-25"
    

    Recommendation: Use ccat secrets add — it keeps schema and vault in sync automatically.

  3. Test on staging first:

    ccat secrets provision --host input-staging --dry-run
    # Review output, verify paths and variables
    
    ccat secrets provision --host input-staging
    # Actually deploy to staging
    
  4. Verify deployment:

    SSH to the staging host and check:

    cat /opt/data-center/system-integration/.env
    # Should show all vars populated from vault
    
  5. Deploy to production:

    Once confident on staging:

    ccat secrets provision --host input-ccat --dry-run
    ccat secrets provision --host input-ccat
    

Maintenance#

Rotating a secret (recommended method):

Use the CLI to rotate with a cryptographically secure random value:

ccat secrets rotate vault_redis_password --env production --dry-run
ccat secrets rotate vault_redis_password --env production
# Shows what changed, prompts for confirmation

Then re-provision:

ccat secrets provision --host input-ccat --dry-run
ccat secrets provision --host input-ccat

And restart affected services:

ccat restart redis
# Docker Compose will use the new .env

Manually updating a secret:

If you need to set a specific value (not a random token):

ccat secrets set vault_gf_github_client_secret --env production
# → Prompts for new value, shows diff, confirms before writing

Then provision and restart.

Viewing secrets (without editing):

# Show all (masked by default)
ccat secrets show --env production

# Show one specific secret
ccat secrets show vault_postgres_password --env production

# Reveal actual value (for copy/paste scenarios)
ccat secrets show vault_postgres_password --env production --reveal

Adding a new secret to production:

ccat secrets add vault_my_new_secret --env production
# → Adds to schema + vault automatically
# → CLI tab-completion auto-updates
# → Next provision will include it in .env

ccat secrets provision --host input-ccat

Removing a deprecated secret:

ccat secrets remove vault_old_unused_secret
# → Removes from schema + all vault environments
# → Next provision will no longer generate this env var

Manual vault file editing (if needed):

For bulk changes or direct manipulation:

ansible-vault edit ansible/group_vars/input_ccat/vault_application.yml
# Make changes, save, exit

# Then update the schema if you added new variables:
# Edit: ansible/vars_application_schema.yml
# Add the new variable definition

ccat secrets provision --host input-ccat

Re-encrypting after a lost vault key:

If you lose the vault key, you’ll need to re-create encrypted files:

cd ansible
rm group_vars/input_ccat/vault_application.yml
git checkout group_vars/input_ccat/vault_application.yml
# (Restores placeholder)

ansible-vault encrypt group_vars/input_ccat/vault_application.yml
# Create new password

# Then add secrets using the new key:
ccat secrets add vault_postgres_password --env production
# Repeat for each secret

Example .env Output#

After ccat secrets provision runs, the generated .env at /opt/data-center/system-integration/.env looks like:

# Managed by Ansible — do not edit manually
# Generated: 2026-02-25T18:30:45.123456+00:00
# Schema source: .../vars_application_schema.yml

# Vault-managed variables (from vars_application_schema.yml)
POSTGRES_PASSWORD=your-strong-postgres-pw
POSTGRES_REPLICATION_PASSWORD=your-replication-pw
INFISICAL_DB_PASSWORD=your-infisical-db-pw
REDIS_PASSWORD=your-strong-redis-pw
PGADMIN_PASSWORD=your-pgadmin-pw
GF_SECURITY_ADMIN_PASSWORD=your-grafana-admin-pw
GF_AUTH_GITHUB_CLIENT_ID=your-oauth-client-id
GF_AUTH_GITHUB_CLIENT_SECRET=your-oauth-secret
GF_AUTH_GITHUB_TEAM_IDS=10458935,3503389
DOCKER_INFLUXDB_INIT_PASSWORD=your-influxdb-pw
DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=your-influxdb-token
MINIO_ROOT_PASSWORD=your-minio-secret

# Defaults (non-vaulted configuration)
POSTGRES_USER=ccat
POSTGRES_DB=ccat_ops_db
POSTGRES_REPLICATION_USER=replicator
PGADMIN_EMAIL=pgadmin@data.ccat.uni-koeln.de
GF_SECURITY_ADMIN_USER=admin
GF_AUTH_GITHUB_ALLOWED_ORGANIZATIONS=ccatobs
GF_AUTH_GITHUB_ROLE_ATTRIBUTE_PATH=contains(groups[*], '@ccatobs/datacenter') && 'Admin' || 'Viewer'
GF_SERVER_ROOT_URL=https://grafana.data.ccat.uni-koeln.de
GF_SERVER_DOMAIN=grafana.data.ccat.uni-koeln.de
DOCKER_INFLUXDB_INIT_USERNAME=admin
DOCKER_INFLUXDB_INIT_ORG=ccat
DOCKER_INFLUXDB_INIT_BUCKET=ccat_metrics
MINIO_ROOT_USER=minio

Note: The actual variables generated depend on what’s defined in vars_application_schema.yml. Adding a new variable to the schema automatically includes it in the next .env generation.

Schema Management (vars_application_schema.yml)#

The vars_application_schema.yml file is the single source of truth for all vault variables.

Location: ansible/vars_application_schema.yml

Purpose: - Define all vault variables that should be in .env - Map vault variable names (e.g., vault_redis_password) to env var names (e.g., REDIS_PASSWORD) - Document what each variable is for - Auto-sync with CLI and Ansible template

Format:

variables:
  vault_redis_password:
    env_name: REDIS_PASSWORD
    description: "Redis admin password"
    added: "2026-02-25"

  vault_gf_auth_github_team_ids:
    env_name: GF_AUTH_GITHUB_TEAM_IDS
    description: "GitHub team IDs for Grafana access control"
    added: "2026-02-25"

How It Works:

  1. Add a variable: ccat secrets add vault_my_secret --env production - Prompts for env name, description, and initial value - Adds to schema automatically - Sets value in vault

  2. CLI reads from schema: Tab completion auto-updates - ccat secrets set vault_<TAB> shows all variables from schema

  3. Ansible reads from schema: Template generation is dynamic - Loops through all variables in schema - Pulls values from vault - Generates .env with all defined variables

  4. Remove a variable: ccat secrets remove vault_old_var - Removes from schema - Removes from vault (all environments) - Next provision won’t generate that env var

Key Principle: Add a variable to the schema → it appears in CLI, Ansible, and .env automatically.

Troubleshooting#

Error: “Ansible directory not found at …”

The ansible/ directory is missing (should always be present in this repo):

ls ansible/  # Should show roles/, group_vars/, etc.

Error: “Vault key not found at …”

The .ansible_vault_key file is missing from the Ansible repo:

# This file is git-ignored and must be securely distributed
# Ask your team for the current vault password
# Place it at: ansible/.ansible_vault_key

Error: “Decryption failed”

The vault password is wrong or the vault key file doesn’t match the encrypted files:

# Verify you have the correct key
ansible-vault view ansible/group_vars/input_ccat/vault_application.yml
# Enter the correct password when prompted

# If still failing, check the files are valid YAML
# and re-encrypt if necessary

Error: “.env not created on the host”

After running ccat secrets provision, check:

  1. SSH to the target host and check if the file exists:

    ls -la /opt/data-center/system-integration/.env
    
  2. Check Ansible logs for errors:

    # Re-run in verbose mode
    cd ansible
    ansible-playbook \
      -i inventory.ini \
      --vault-password-file .ansible_vault_key \
      --tags env \
      -vvv \
      playbook_setup_vms.yml
    
  3. Verify permissions on the destination directory:

    ls -ld /opt/data-center/system-integration/
    # Should be owned by root or appropriate system_user
    

Redis TLS Certificate Lifecycle#

Redis connections in production and staging use mutual TLS (mTLS). Certificates are managed by the ccat redis-certs CLI and distributed via Ansible.

Cert Variants#

Each variant corresponds to a Redis instance:

All variants distribute client certs (ca.crt, client.crt, client.key) to their respective hosts. Server hosts additionally receive redis.crt and redis.key.

Generating Certs#

Generate a fresh set of certificates for a variant:

ccat redis-certs generate main
ccat redis-certs generate develop --force   # overwrite existing

This runs 8 openssl steps (CA key → CA cert → server key/CSR/cert → client key/CSR/cert) and sets local permissions (*.crt/client.key = 644, ca.key/redis.key = 600).

For variants with a Grafana mapping (main, develop), the corresponding Grafana datasource provisioning YAML is automatically updated with the new certs.

Distributing Certs#

Push generated certs to remote hosts via Ansible:

ccat redis-certs distribute --variant main
ccat redis-certs distribute                  # all variants (with confirmation)
ccat redis-certs distribute --variant develop --dry-run

Rotating Certs#

Rotation regenerates certs locally and immediately distributes them:

ccat redis-certs rotate --variant main
ccat redis-certs rotate                      # all variants

Warning

Rotating certs breaks Redis connections until distribution completes on all machines and containers are restarted. Plan a brief maintenance window.

After rotation:

  1. Grafana datasource YAMLs are auto-updated for mapped variants (main → production, develop → staging)

  2. Commit the updated YAMLs and deploy

  3. Restart Redis and Grafana containers on affected hosts

Full rotation workflow:

# 1. Rotate (generates + distributes)
ccat redis-certs rotate --variant main

# 2. Commit the auto-updated Grafana YAML
git add grafana/provisioning/production/datasources/redis.yaml
git commit -m "Rotate production Redis TLS certs"
git push

# 3. Restart containers on the target host
ssh input-b 'cd /opt/data-center/system-integration && \
  git pull && \
  docker compose restart redis grafana'

Checking Status#

Compare local vs remote cert fingerprints and Grafana sync status:

ccat redis-certs status                       # all variants
ccat redis-certs status --variant main --verbose

Output includes:

  • Per-host table: local vs remote CA fingerprint with IN SYNC / MISMATCH status

  • Grafana sync: whether the embedded CA cert in the provisioning YAML matches the local cert (IN SYNC / MISMATCH / YAML MISSING)

Cleaning Cruft#

Audit remote cert folders for unexpected files (e.g. ca.key that should never be deployed, or redis.key on a client-only host):

ccat redis-certs clean --dry-run              # inspect only
ccat redis-certs clean --variant main         # clean specific variant

Grafana Redis Datasource Sync#

Grafana connects to Redis via TLS using the redis-datasource plugin. TLS certificates (CA, client cert, client key) are embedded inline in the datasource provisioning YAML files under secureJsonData. These files are git-tracked and mounted read-only into the Grafana container.

Mapped files:

Variant

Grafana Provisioning YAML

main

grafana/provisioning/production/datasources/redis.yaml

develop

grafana/provisioning/staging/datasources/redis.yaml

How it works:

  • ccat redis-certs generate and ccat redis-certs rotate auto-update the Grafana YAML for mapped variants after cert generation

  • ccat redis-certs status checks whether the embedded cert matches the local cert

  • ccat redis-certs update-grafana regenerates YAMLs on demand without regenerating certs

Standalone update (without rotation):

# Update all mapped variants
ccat redis-certs update-grafana

# Update only production
ccat redis-certs update-grafana --variant main

Password injection:

The provisioning YAML uses ${REDIS_PASSWORD} which Grafana resolves from its container environment. The production and staging docker-compose.*.input-b.yml files pass this variable to Grafana:

grafana:
  environment:
    - REDIS_PASSWORD=${REDIS_PASSWORD:?REDIS_PASSWORD must be set in .env}

This means Grafana’s Redis datasource password stays in sync with the .env file managed by ccat secrets provision — no hardcoded passwords in the YAML.