Secrets Management & .env Setup#
Overview#
CCAT Data Center automates secrets provisioning through Ansible Vault. This replaces
manual .env file management with a declarative, auditable, encrypted-in-git approach.
Key principles:
Production/staging compose files fail immediately if secrets are missing (no silent fallback)
Dev compose retains fallback values for developer convenience
All secrets are stored in Ansible vault files (encrypted at rest)
Deployment is idempotent and can be run repeatedly without side effects
Issue#
Prior to this implementation (Issue #49), hardcoded secrets were scattered across:
Category A — fully hardcoded literals in compose files (MinIO, Infisical DB, pgadmin, replication passwords)
Category B — silent
${VAR:-weak_default}fallbacks in production (users never noticed missing .env)Category C — plaintext user passwords in
data-center/users/users.toml(tracked separately)
Without automation, operators left compose fallbacks active and never created .env files.
Solution#
Three components work together:
Ansible role (
ansible/roles/application_env/) - Templates .env file with vault variables - Deploys to/opt/data-center/system-integration/.envon each host - Idempotent and tagged for selective executionVault storage (
ansible/group_vars/input_*/vault_application.yml) - Encrypted YAML files with production and staging secrets - Distributed securely via Ansible vault key - Never committed to git in plaintextCLI command (
ccat secrets provision) - Operator-friendly wrapper around Ansible - Supports--dry-runfor preview and--hostfor targeting - Part of thectlmanagement tool
Architecture#
Deployment Flow#
Developer / Operator
|
| runs: ccat secrets provision [--host HOST]
v
ctl script (system-integration/ctl)
|
| locates ansible at: REPO_ROOT / "ansible"
| reads vault key: ansible/.ansible_vault_key
v
ansible-playbook playbook_setup_vms.yml
|
| loads: vars_application_schema.yml (defines all variables)
| loads: group_vars/input_*/vault_application.yml (vault decryption)
| loads: group_vars/input_*/vars_application.yml (non-secret config)
v
application_env role
|
| loops through schema variables
| pulls values from vault + defaults
| generates .env dynamically from schema
v
Host: /opt/data-center/system-integration/.env ✓ provisioned
Variable Sources#
The .env file combines three sources:
Vault variables (secrets, encrypted) - Location:
group_vars/{input_ccat,input_staging}/vault_application.yml- Schema:vars_application_schema.yml(defines all vault variables) - Examples:vault_postgres_password,vault_redis_password,vault_minio_password- ENCRYPTED in git (must be decrypted with.ansible_vault_key) - Auto-managed viaccat secrets add/set/removecommandsConfig variables (non-secret) - Location:
group_vars/{input_ccat,input_staging}/vars_application.yml- Examples:gf_server_root_url,influxdb_org,gf_github_allowed_orgs- Plain YAML, readable in gitTemplate defaults (fallback) - In
roles/application_env/templates/env.j2- Examples:default('admin'),default('ccat_metrics')- Used only if config var is not provided
Single Source of Truth
The vars_application_schema.yml file is the schema that:
- Defines all possible vault variables with their env var names
- Auto-syncs with CLI tab completion
- Drives dynamic Ansible template generation
- When you add a variable to the schema, it automatically appears in .env generation
Compose File Changes#
Dev Compose (docker-compose.yml)#
Dev retains fallback values for developer convenience:
minio:
environment:
MINIO_ROOT_USER: ${MINIO_ROOT_USER:-minio_access_key}
MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD:-minio_secret_key}
Developers can override with .env or environment variables. If neither exists,
the fallback is used. This is intentional—dev should work without .env setup.
Production & Staging Compose#
Production and staging fail immediately if required secrets are missing:
redis:
command: >
redis-server
--requirepass ${REDIS_PASSWORD:?REDIS_PASSWORD must be set in .env}
...
grafana:
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD:?GF_SECURITY_ADMIN_PASSWORD must be set in .env}
The :? syntax in Bash variable expansion requires the variable to be set,
otherwise Docker Compose exits with a clear error message before starting any container.
Supported fail-fast secrets:
REDIS_PASSWORDPOSTGRES_PASSWORDPOSTGRES_REPLICATION_PASSWORDINFISICAL_DB_PASSWORDPGADMIN_PASSWORDGF_SECURITY_ADMIN_PASSWORDDOCKER_INFLUXDB_INIT_PASSWORDDOCKER_INFLUXDB_INIT_ADMIN_TOKEN
Dynaconf Bridge Variables#
Application containers (ops-db-api, data-transfer services) use Dynaconf
for settings. Dynaconf looks for prefixed environment variables like CCAT_OPS_DB_API_MAIN_DB_PASSWORD.
To allow .env secrets to be picked up by Dynaconf, we bridge them in the compose file:
ops-db-api:
environment:
- ENV_FOR_DYNACONF="production"
- CCAT_OPS_DB_API_MAIN_DB_PASSWORD=${POSTGRES_PASSWORD}
- CCAT_OPS_DB_API_LOCAL_DB_PASSWORD=${POSTGRES_PASSWORD}
- CCAT_OPS_DB_API_REDIS_PASSWORD=${REDIS_PASSWORD}
env_file:
- .env
CLI Usage#
Schema Management (Register Variables)#
Register a new vault variable:
ccat secrets add vault_my_secret --env production
Prompts for:
- Environment variable name (e.g., MY_SECRET)
- Description
- Initial value
Adds to schema and vault in one step.
Update an existing secret:
ccat secrets set vault_postgres_password --env production
If the variable is new (not in schema), offers to add it.
Rotate a secret with a secure random value:
ccat secrets rotate vault_redis_password --env production
Generates a cryptographically secure token and updates the vault.
Remove a secret:
ccat secrets remove vault_old_secret
Removes from both schema and vault (all environments).
View all secrets:
ccat secrets show --env production # masked by default
ccat secrets show vault_redis_password --env production --reveal # show actual value
Provision to Hosts#
Provision all hosts:
Deploy .env on all inventory hosts:
ccat secrets provision
This runs:
ansible-playbook \
-i ansible/inventory.ini \
--vault-password-file ansible/.ansible_vault_key \
--tags env \
ansible/playbook_setup_vms.yml
Provision a Specific Host#
Deploy .env to only input-staging:
ccat secrets provision --host input-staging
Translates to:
ansible-playbook \
... \
--limit input-staging \
...
Dry-Run (Preview)#
Show what would be deployed without making changes:
ccat secrets provision --dry-run
Or for a specific host:
ccat secrets provision --host input-ccat --dry-run
This adds --check mode to Ansible, which:
- Reads the vault and template
- Shows what would be written
- Does NOT actually write to disk
Setup & Deployment#
Initial Setup (One-Time)#
Encrypt vault files:
cd ansible ansible-vault encrypt group_vars/input_ccat/vault_application.yml ansible-vault encrypt group_vars/input_staging/vault_application.yml
You’ll be prompted for a vault password. This password is stored in
.ansible_vault_key(git-ignored, only shared securely with operators).Add secrets using the CLI (recommended):
The easiest way is to use
ccat secrets addfor each variable:# Add database password ccat secrets add vault_postgres_password --env production # → Prompts for env var name (POSTGRES_PASSWORD), description, and value # Add redis password ccat secrets add vault_redis_password --env production # Add other secrets... ccat secrets add vault_minio_password --env production ccat secrets add vault_pgadmin_password --env production ccat secrets add vault_gf_admin_password --env production # etc.
Each command adds the variable to both the schema and vault automatically.
Or, manually edit the vault file:
If you prefer manual editing:
ansible-vault edit group_vars/input_ccat/vault_application.yml
Add real passwords/tokens:
vault_postgres_password: "your-strong-postgres-pw" vault_redis_password: "your-strong-redis-pw" vault_minio_password: "your-minio-secret" vault_pgadmin_password: "your-pgadmin-pw" vault_gf_admin_password: "your-grafana-admin-pw" vault_infisical_db_password: "your-infisical-db-pw" vault_postgres_replication_password: "your-replication-pw" vault_influxdb_password: "your-influxdb-pw" vault_influxdb_token: "your-influxdb-token" vault_gf_github_client_id: "" vault_gf_github_client_secret: ""
Then manually add each variable to
vars_application_schema.yml:vault_postgres_password: env_name: POSTGRES_PASSWORD description: "PostgreSQL admin password" added: "2026-02-25"
Recommendation: Use
ccat secrets add— it keeps schema and vault in sync automatically.Test on staging first:
ccat secrets provision --host input-staging --dry-run # Review output, verify paths and variables ccat secrets provision --host input-staging # Actually deploy to staging
Verify deployment:
SSH to the staging host and check:
cat /opt/data-center/system-integration/.env # Should show all vars populated from vault
Deploy to production:
Once confident on staging:
ccat secrets provision --host input-ccat --dry-run ccat secrets provision --host input-ccat
Maintenance#
Rotating a secret (recommended method):
Use the CLI to rotate with a cryptographically secure random value:
ccat secrets rotate vault_redis_password --env production --dry-run
ccat secrets rotate vault_redis_password --env production
# Shows what changed, prompts for confirmation
Then re-provision:
ccat secrets provision --host input-ccat --dry-run
ccat secrets provision --host input-ccat
And restart affected services:
ccat restart redis
# Docker Compose will use the new .env
Manually updating a secret:
If you need to set a specific value (not a random token):
ccat secrets set vault_gf_github_client_secret --env production
# → Prompts for new value, shows diff, confirms before writing
Then provision and restart.
Viewing secrets (without editing):
# Show all (masked by default)
ccat secrets show --env production
# Show one specific secret
ccat secrets show vault_postgres_password --env production
# Reveal actual value (for copy/paste scenarios)
ccat secrets show vault_postgres_password --env production --reveal
Adding a new secret to production:
ccat secrets add vault_my_new_secret --env production
# → Adds to schema + vault automatically
# → CLI tab-completion auto-updates
# → Next provision will include it in .env
ccat secrets provision --host input-ccat
Removing a deprecated secret:
ccat secrets remove vault_old_unused_secret
# → Removes from schema + all vault environments
# → Next provision will no longer generate this env var
Manual vault file editing (if needed):
For bulk changes or direct manipulation:
ansible-vault edit ansible/group_vars/input_ccat/vault_application.yml
# Make changes, save, exit
# Then update the schema if you added new variables:
# Edit: ansible/vars_application_schema.yml
# Add the new variable definition
ccat secrets provision --host input-ccat
Re-encrypting after a lost vault key:
If you lose the vault key, you’ll need to re-create encrypted files:
cd ansible
rm group_vars/input_ccat/vault_application.yml
git checkout group_vars/input_ccat/vault_application.yml
# (Restores placeholder)
ansible-vault encrypt group_vars/input_ccat/vault_application.yml
# Create new password
# Then add secrets using the new key:
ccat secrets add vault_postgres_password --env production
# Repeat for each secret
Example .env Output#
After ccat secrets provision runs, the generated .env at
/opt/data-center/system-integration/.env looks like:
# Managed by Ansible — do not edit manually
# Generated: 2026-02-25T18:30:45.123456+00:00
# Schema source: .../vars_application_schema.yml
# Vault-managed variables (from vars_application_schema.yml)
POSTGRES_PASSWORD=your-strong-postgres-pw
POSTGRES_REPLICATION_PASSWORD=your-replication-pw
INFISICAL_DB_PASSWORD=your-infisical-db-pw
REDIS_PASSWORD=your-strong-redis-pw
PGADMIN_PASSWORD=your-pgadmin-pw
GF_SECURITY_ADMIN_PASSWORD=your-grafana-admin-pw
GF_AUTH_GITHUB_CLIENT_ID=your-oauth-client-id
GF_AUTH_GITHUB_CLIENT_SECRET=your-oauth-secret
GF_AUTH_GITHUB_TEAM_IDS=10458935,3503389
DOCKER_INFLUXDB_INIT_PASSWORD=your-influxdb-pw
DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=your-influxdb-token
MINIO_ROOT_PASSWORD=your-minio-secret
# Defaults (non-vaulted configuration)
POSTGRES_USER=ccat
POSTGRES_DB=ccat_ops_db
POSTGRES_REPLICATION_USER=replicator
PGADMIN_EMAIL=pgadmin@data.ccat.uni-koeln.de
GF_SECURITY_ADMIN_USER=admin
GF_AUTH_GITHUB_ALLOWED_ORGANIZATIONS=ccatobs
GF_AUTH_GITHUB_ROLE_ATTRIBUTE_PATH=contains(groups[*], '@ccatobs/datacenter') && 'Admin' || 'Viewer'
GF_SERVER_ROOT_URL=https://grafana.data.ccat.uni-koeln.de
GF_SERVER_DOMAIN=grafana.data.ccat.uni-koeln.de
DOCKER_INFLUXDB_INIT_USERNAME=admin
DOCKER_INFLUXDB_INIT_ORG=ccat
DOCKER_INFLUXDB_INIT_BUCKET=ccat_metrics
MINIO_ROOT_USER=minio
Note: The actual variables generated depend on what’s defined in vars_application_schema.yml.
Adding a new variable to the schema automatically includes it in the next .env generation.
Schema Management (vars_application_schema.yml)#
The vars_application_schema.yml file is the single source of truth for all vault variables.
Location: ansible/vars_application_schema.yml
Purpose:
- Define all vault variables that should be in .env
- Map vault variable names (e.g., vault_redis_password) to env var names (e.g., REDIS_PASSWORD)
- Document what each variable is for
- Auto-sync with CLI and Ansible template
Format:
variables:
vault_redis_password:
env_name: REDIS_PASSWORD
description: "Redis admin password"
added: "2026-02-25"
vault_gf_auth_github_team_ids:
env_name: GF_AUTH_GITHUB_TEAM_IDS
description: "GitHub team IDs for Grafana access control"
added: "2026-02-25"
How It Works:
Add a variable:
ccat secrets add vault_my_secret --env production- Prompts for env name, description, and initial value - Adds to schema automatically - Sets value in vaultCLI reads from schema: Tab completion auto-updates -
ccat secrets set vault_<TAB>shows all variables from schemaAnsible reads from schema: Template generation is dynamic - Loops through all variables in schema - Pulls values from vault - Generates
.envwith all defined variablesRemove a variable:
ccat secrets remove vault_old_var- Removes from schema - Removes from vault (all environments) - Next provision won’t generate that env var
Key Principle: Add a variable to the schema → it appears in CLI, Ansible, and .env automatically.
Troubleshooting#
Error: “Ansible directory not found at …”
The ansible/ directory is missing (should always be present in this repo):
ls ansible/ # Should show roles/, group_vars/, etc.
Error: “Vault key not found at …”
The .ansible_vault_key file is missing from the Ansible repo:
# This file is git-ignored and must be securely distributed
# Ask your team for the current vault password
# Place it at: ansible/.ansible_vault_key
Error: “Decryption failed”
The vault password is wrong or the vault key file doesn’t match the encrypted files:
# Verify you have the correct key
ansible-vault view ansible/group_vars/input_ccat/vault_application.yml
# Enter the correct password when prompted
# If still failing, check the files are valid YAML
# and re-encrypt if necessary
Error: “.env not created on the host”
After running ccat secrets provision, check:
SSH to the target host and check if the file exists:
ls -la /opt/data-center/system-integration/.env
Check Ansible logs for errors:
# Re-run in verbose mode cd ansible ansible-playbook \ -i inventory.ini \ --vault-password-file .ansible_vault_key \ --tags env \ -vvv \ playbook_setup_vms.yml
Verify permissions on the destination directory:
ls -ld /opt/data-center/system-integration/ # Should be owned by root or appropriate system_user
Redis TLS Certificate Lifecycle#
Redis connections in production and staging use mutual TLS (mTLS). Certificates are
managed by the ccat redis-certs CLI and distributed via Ansible.
Cert Variants#
Each variant corresponds to a Redis instance:
All variants distribute client certs (ca.crt, client.crt, client.key) to
their respective hosts. Server hosts additionally receive redis.crt and redis.key.
Generating Certs#
Generate a fresh set of certificates for a variant:
ccat redis-certs generate main
ccat redis-certs generate develop --force # overwrite existing
This runs 8 openssl steps (CA key → CA cert → server key/CSR/cert → client key/CSR/cert)
and sets local permissions (*.crt/client.key = 644, ca.key/redis.key = 600).
For variants with a Grafana mapping (main, develop), the corresponding Grafana
datasource provisioning YAML is automatically updated with the new certs.
Distributing Certs#
Push generated certs to remote hosts via Ansible:
ccat redis-certs distribute --variant main
ccat redis-certs distribute # all variants (with confirmation)
ccat redis-certs distribute --variant develop --dry-run
Rotating Certs#
Rotation regenerates certs locally and immediately distributes them:
ccat redis-certs rotate --variant main
ccat redis-certs rotate # all variants
Warning
Rotating certs breaks Redis connections until distribution completes on all machines and containers are restarted. Plan a brief maintenance window.
After rotation:
Grafana datasource YAMLs are auto-updated for mapped variants (
main→ production,develop→ staging)Commit the updated YAMLs and deploy
Restart Redis and Grafana containers on affected hosts
Full rotation workflow:
# 1. Rotate (generates + distributes)
ccat redis-certs rotate --variant main
# 2. Commit the auto-updated Grafana YAML
git add grafana/provisioning/production/datasources/redis.yaml
git commit -m "Rotate production Redis TLS certs"
git push
# 3. Restart containers on the target host
ssh input-b 'cd /opt/data-center/system-integration && \
git pull && \
docker compose restart redis grafana'
Checking Status#
Compare local vs remote cert fingerprints and Grafana sync status:
ccat redis-certs status # all variants
ccat redis-certs status --variant main --verbose
Output includes:
Per-host table: local vs remote CA fingerprint with IN SYNC / MISMATCH status
Grafana sync: whether the embedded CA cert in the provisioning YAML matches the local cert (IN SYNC / MISMATCH / YAML MISSING)
Cleaning Cruft#
Audit remote cert folders for unexpected files (e.g. ca.key that should never be
deployed, or redis.key on a client-only host):
ccat redis-certs clean --dry-run # inspect only
ccat redis-certs clean --variant main # clean specific variant
Grafana Redis Datasource Sync#
Grafana connects to Redis via TLS using the redis-datasource plugin. TLS certificates
(CA, client cert, client key) are embedded inline in the datasource provisioning YAML
files under secureJsonData. These files are git-tracked and mounted read-only into
the Grafana container.
Mapped files:
Variant |
Grafana Provisioning YAML |
|---|---|
main |
|
develop |
|
How it works:
ccat redis-certs generateandccat redis-certs rotateauto-update the Grafana YAML for mapped variants after cert generationccat redis-certs statuschecks whether the embedded cert matches the local certccat redis-certs update-grafanaregenerates YAMLs on demand without regenerating certs
Standalone update (without rotation):
# Update all mapped variants
ccat redis-certs update-grafana
# Update only production
ccat redis-certs update-grafana --variant main
Password injection:
The provisioning YAML uses ${REDIS_PASSWORD} which Grafana resolves from its
container environment. The production and staging docker-compose.*.input-b.yml files
pass this variable to Grafana:
grafana:
environment:
- REDIS_PASSWORD=${REDIS_PASSWORD:?REDIS_PASSWORD must be set in .env}
This means Grafana’s Redis datasource password stays in sync with the .env file
managed by ccat secrets provision — no hardcoded passwords in the YAML.