# Secrets Management & .env Setup ## Overview CCAT Data Center automates secrets provisioning through Ansible Vault. This replaces manual `.env` file management with a declarative, auditable, encrypted-in-git approach. **Key principles:** - Production/staging compose files **fail immediately** if secrets are missing (no silent fallback) - Dev compose retains fallback values for developer convenience - All secrets are stored in Ansible vault files (encrypted at rest) - Deployment is idempotent and can be run repeatedly without side effects ### Issue Prior to this implementation (Issue #49), hardcoded secrets were scattered across: 1. **Category A** — fully hardcoded literals in compose files (MinIO, Infisical DB, pgadmin, replication passwords) 2. **Category B** — silent `${VAR:-weak_default}` fallbacks in production (users never noticed missing .env) 3. **Category C** — plaintext user passwords in `data-center/users/users.toml` (tracked separately) Without automation, operators left compose fallbacks active and never created `.env` files. ## Solution Three components work together: 1. **Ansible role** (`ansible/roles/application_env/`) \- Templates .env file with vault variables \- Deploys to `/opt/data-center/system-integration/.env` on each host \- Idempotent and tagged for selective execution 2. **Vault storage** (`ansible/group_vars/input_*/vault_application.yml`) \- Encrypted YAML files with production and staging secrets \- Distributed securely via Ansible vault key \- Never committed to git in plaintext 3. **CLI command** (`ccat secrets provision`) \- Operator-friendly wrapper around Ansible \- Supports `--dry-run` for preview and `--host` for targeting \- Part of the `ctl` management tool ## Architecture ### Deployment Flow ``` Developer / Operator | | runs: ccat secrets provision [--host HOST] v ctl script (system-integration/ctl) | | locates ansible at: REPO_ROOT / "ansible" | reads vault key: ansible/.ansible_vault_key v ansible-playbook playbook_setup_vms.yml | | loads: vars_application_schema.yml (defines all variables) | loads: group_vars/input_*/vault_application.yml (vault decryption) | loads: group_vars/input_*/vars_application.yml (non-secret config) v application_env role | | loops through schema variables | pulls values from vault + defaults | generates .env dynamically from schema v Host: /opt/data-center/system-integration/.env ✓ provisioned ``` ### Variable Sources The `.env` file combines three sources: 1. **Vault variables** (secrets, encrypted) \- Location: `group_vars/{input_ccat,input_staging}/vault_application.yml` \- Schema: `vars_application_schema.yml` (defines all vault variables) \- Examples: `vault_postgres_password`, `vault_redis_password`, `vault_minio_password` \- **ENCRYPTED** in git (must be decrypted with `.ansible_vault_key`) \- Auto-managed via `ccat secrets add/set/remove` commands 2. **Config variables** (non-secret) \- Location: `group_vars/{input_ccat,input_staging}/vars_application.yml` \- Examples: `gf_server_root_url`, `influxdb_org`, `gf_github_allowed_orgs` \- Plain YAML, readable in git 3. **Template defaults** (fallback) \- In `roles/application_env/templates/env.j2` \- Examples: `default('admin')`, `default('ccat_metrics')` \- Used only if config var is not provided **Single Source of Truth** The `vars_application_schema.yml` file is the schema that: \- Defines all possible vault variables with their env var names \- Auto-syncs with CLI tab completion \- Drives dynamic Ansible template generation \- When you add a variable to the schema, it automatically appears in `.env` generation ## Compose File Changes ### Dev Compose (`docker-compose.yml`) Dev retains fallback values for developer convenience: ```yaml minio: environment: MINIO_ROOT_USER: ${MINIO_ROOT_USER:-minio_access_key} MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD:-minio_secret_key} ``` Developers can override with `.env` or environment variables. If neither exists, the fallback is used. This is intentional—dev should work without .env setup. ### Production & Staging Compose Production and staging **fail immediately** if required secrets are missing: ```yaml redis: command: > redis-server --requirepass ${REDIS_PASSWORD:?REDIS_PASSWORD must be set in .env} ... grafana: environment: - GF_SECURITY_ADMIN_PASSWORD=${GF_SECURITY_ADMIN_PASSWORD:?GF_SECURITY_ADMIN_PASSWORD must be set in .env} ``` The `:?` syntax in Bash variable expansion requires the variable to be set, otherwise Docker Compose exits with a clear error message before starting any container. **Supported fail-fast secrets:** - `REDIS_PASSWORD` - `POSTGRES_PASSWORD` - `POSTGRES_REPLICATION_PASSWORD` - `INFISICAL_DB_PASSWORD` - `PGADMIN_PASSWORD` - `GF_SECURITY_ADMIN_PASSWORD` - `DOCKER_INFLUXDB_INIT_PASSWORD` - `DOCKER_INFLUXDB_INIT_ADMIN_TOKEN` ### Dynaconf Bridge Variables Application containers (ops-db-api, data-transfer services) use [Dynaconf](https://www.dynaconf.com/) for settings. Dynaconf looks for prefixed environment variables like `CCAT_OPS_DB_API_MAIN_DB_PASSWORD`. To allow .env secrets to be picked up by Dynaconf, we bridge them in the compose file: ```yaml ops-db-api: environment: - ENV_FOR_DYNACONF="production" - CCAT_OPS_DB_API_MAIN_DB_PASSWORD=${POSTGRES_PASSWORD} - CCAT_OPS_DB_API_LOCAL_DB_PASSWORD=${POSTGRES_PASSWORD} - CCAT_OPS_DB_API_REDIS_PASSWORD=${REDIS_PASSWORD} env_file: - .env ``` ## CLI Usage ### Schema Management (Register Variables) **Register a new vault variable:** ```bash ccat secrets add vault_my_secret --env production ``` Prompts for: \- Environment variable name (e.g., `MY_SECRET`) \- Description \- Initial value Adds to schema and vault in one step. **Update an existing secret:** ```bash ccat secrets set vault_postgres_password --env production ``` If the variable is new (not in schema), offers to add it. **Rotate a secret with a secure random value:** ```bash ccat secrets rotate vault_redis_password --env production ``` Generates a cryptographically secure token and updates the vault. **Remove a secret:** ```bash ccat secrets remove vault_old_secret ``` Removes from both schema and vault (all environments). **View all secrets:** ```bash ccat secrets show --env production # masked by default ccat secrets show vault_redis_password --env production --reveal # show actual value ``` ### Provision to Hosts **Provision all hosts:** Deploy .env on all inventory hosts: ```bash ccat secrets provision ``` This runs: ``` ansible-playbook \ -i ansible/inventory.ini \ --vault-password-file ansible/.ansible_vault_key \ --tags env \ ansible/playbook_setup_vms.yml ``` ### Provision a Specific Host Deploy .env to only input-staging: ```bash ccat secrets provision --host input-staging ``` Translates to: ``` ansible-playbook \ ... \ --limit input-staging \ ... ``` ### Dry-Run (Preview) Show what would be deployed without making changes: ```bash ccat secrets provision --dry-run ``` Or for a specific host: ```bash ccat secrets provision --host input-ccat --dry-run ``` This adds `--check` mode to Ansible, which: \- Reads the vault and template \- Shows what would be written \- Does NOT actually write to disk ## Setup & Deployment ### Initial Setup (One-Time) 1. **Encrypt vault files:** ```bash cd ansible ansible-vault encrypt group_vars/input_ccat/vault_application.yml ansible-vault encrypt group_vars/input_staging/vault_application.yml ``` You'll be prompted for a vault password. This password is stored in `.ansible_vault_key` (git-ignored, only shared securely with operators). 2. **Add secrets using the CLI (recommended):** The easiest way is to use `ccat secrets add` for each variable: ```bash # Add database password ccat secrets add vault_postgres_password --env production # → Prompts for env var name (POSTGRES_PASSWORD), description, and value # Add redis password ccat secrets add vault_redis_password --env production # Add other secrets... ccat secrets add vault_minio_password --env production ccat secrets add vault_pgadmin_password --env production ccat secrets add vault_gf_admin_password --env production # etc. ``` Each command adds the variable to both the schema and vault automatically. **Or, manually edit the vault file:** If you prefer manual editing: ```bash ansible-vault edit group_vars/input_ccat/vault_application.yml ``` Add real passwords/tokens: ```yaml vault_postgres_password: "your-strong-postgres-pw" vault_redis_password: "your-strong-redis-pw" vault_minio_password: "your-minio-secret" vault_pgadmin_password: "your-pgadmin-pw" vault_gf_admin_password: "your-grafana-admin-pw" vault_infisical_db_password: "your-infisical-db-pw" vault_postgres_replication_password: "your-replication-pw" vault_influxdb_password: "your-influxdb-pw" vault_influxdb_token: "your-influxdb-token" vault_gf_github_client_id: "" vault_gf_github_client_secret: "" ``` Then manually add each variable to `vars_application_schema.yml`: ```yaml vault_postgres_password: env_name: POSTGRES_PASSWORD description: "PostgreSQL admin password" added: "2026-02-25" ``` **Recommendation:** Use `ccat secrets add` — it keeps schema and vault in sync automatically. 3. **Test on staging first:** ```bash ccat secrets provision --host input-staging --dry-run # Review output, verify paths and variables ccat secrets provision --host input-staging # Actually deploy to staging ``` 4. **Verify deployment:** SSH to the staging host and check: ```bash cat /opt/data-center/system-integration/.env # Should show all vars populated from vault ``` 5. **Deploy to production:** Once confident on staging: ```bash ccat secrets provision --host input-ccat --dry-run ccat secrets provision --host input-ccat ``` ### Maintenance **Rotating a secret (recommended method):** Use the CLI to rotate with a cryptographically secure random value: ```bash ccat secrets rotate vault_redis_password --env production --dry-run ccat secrets rotate vault_redis_password --env production # Shows what changed, prompts for confirmation ``` Then re-provision: ```bash ccat secrets provision --host input-ccat --dry-run ccat secrets provision --host input-ccat ``` And restart affected services: ```bash ccat restart redis # Docker Compose will use the new .env ``` **Manually updating a secret:** If you need to set a specific value (not a random token): ```bash ccat secrets set vault_gf_github_client_secret --env production # → Prompts for new value, shows diff, confirms before writing ``` Then provision and restart. **Viewing secrets (without editing):** ```bash # Show all (masked by default) ccat secrets show --env production # Show one specific secret ccat secrets show vault_postgres_password --env production # Reveal actual value (for copy/paste scenarios) ccat secrets show vault_postgres_password --env production --reveal ``` **Adding a new secret to production:** ```bash ccat secrets add vault_my_new_secret --env production # → Adds to schema + vault automatically # → CLI tab-completion auto-updates # → Next provision will include it in .env ccat secrets provision --host input-ccat ``` **Removing a deprecated secret:** ```bash ccat secrets remove vault_old_unused_secret # → Removes from schema + all vault environments # → Next provision will no longer generate this env var ``` **Manual vault file editing (if needed):** For bulk changes or direct manipulation: ```bash ansible-vault edit ansible/group_vars/input_ccat/vault_application.yml # Make changes, save, exit # Then update the schema if you added new variables: # Edit: ansible/vars_application_schema.yml # Add the new variable definition ccat secrets provision --host input-ccat ``` **Re-encrypting after a lost vault key:** If you lose the vault key, you'll need to re-create encrypted files: ```bash cd ansible rm group_vars/input_ccat/vault_application.yml git checkout group_vars/input_ccat/vault_application.yml # (Restores placeholder) ansible-vault encrypt group_vars/input_ccat/vault_application.yml # Create new password # Then add secrets using the new key: ccat secrets add vault_postgres_password --env production # Repeat for each secret ``` ## Example .env Output After `ccat secrets provision` runs, the generated `.env` at `/opt/data-center/system-integration/.env` looks like: ```bash # Managed by Ansible — do not edit manually # Generated: 2026-02-25T18:30:45.123456+00:00 # Schema source: .../vars_application_schema.yml # Vault-managed variables (from vars_application_schema.yml) POSTGRES_PASSWORD=your-strong-postgres-pw POSTGRES_REPLICATION_PASSWORD=your-replication-pw INFISICAL_DB_PASSWORD=your-infisical-db-pw REDIS_PASSWORD=your-strong-redis-pw PGADMIN_PASSWORD=your-pgadmin-pw GF_SECURITY_ADMIN_PASSWORD=your-grafana-admin-pw GF_AUTH_GITHUB_CLIENT_ID=your-oauth-client-id GF_AUTH_GITHUB_CLIENT_SECRET=your-oauth-secret GF_AUTH_GITHUB_TEAM_IDS=10458935,3503389 DOCKER_INFLUXDB_INIT_PASSWORD=your-influxdb-pw DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=your-influxdb-token MINIO_ROOT_PASSWORD=your-minio-secret # Defaults (non-vaulted configuration) POSTGRES_USER=ccat POSTGRES_DB=ccat_ops_db POSTGRES_REPLICATION_USER=replicator PGADMIN_EMAIL=pgadmin@data.ccat.uni-koeln.de GF_SECURITY_ADMIN_USER=admin GF_AUTH_GITHUB_ALLOWED_ORGANIZATIONS=ccatobs GF_AUTH_GITHUB_ROLE_ATTRIBUTE_PATH=contains(groups[*], '@ccatobs/datacenter') && 'Admin' || 'Viewer' GF_SERVER_ROOT_URL=https://grafana.data.ccat.uni-koeln.de GF_SERVER_DOMAIN=grafana.data.ccat.uni-koeln.de DOCKER_INFLUXDB_INIT_USERNAME=admin DOCKER_INFLUXDB_INIT_ORG=ccat DOCKER_INFLUXDB_INIT_BUCKET=ccat_metrics MINIO_ROOT_USER=minio ``` **Note:** The actual variables generated depend on what's defined in `vars_application_schema.yml`. Adding a new variable to the schema automatically includes it in the next `.env` generation. ## Schema Management (vars_application_schema.yml) The `vars_application_schema.yml` file is the **single source of truth** for all vault variables. **Location:** `ansible/vars_application_schema.yml` **Purpose:** \- Define all vault variables that should be in `.env` \- Map vault variable names (e.g., `vault_redis_password`) to env var names (e.g., `REDIS_PASSWORD`) \- Document what each variable is for \- Auto-sync with CLI and Ansible template **Format:** ```yaml variables: vault_redis_password: env_name: REDIS_PASSWORD description: "Redis admin password" added: "2026-02-25" vault_gf_auth_github_team_ids: env_name: GF_AUTH_GITHUB_TEAM_IDS description: "GitHub team IDs for Grafana access control" added: "2026-02-25" ``` **How It Works:** 1. **Add a variable:** `ccat secrets add vault_my_secret --env production` \- Prompts for env name, description, and initial value \- Adds to schema automatically \- Sets value in vault 2. **CLI reads from schema:** Tab completion auto-updates \- `ccat secrets set vault_` shows all variables from schema 3. **Ansible reads from schema:** Template generation is dynamic \- Loops through all variables in schema \- Pulls values from vault \- Generates `.env` with all defined variables 4. **Remove a variable:** `ccat secrets remove vault_old_var` \- Removes from schema \- Removes from vault (all environments) \- Next provision won't generate that env var **Key Principle:** Add a variable to the schema → it appears in CLI, Ansible, and .env automatically. ## Troubleshooting **Error: "Ansible directory not found at ..."** The `ansible/` directory is missing (should always be present in this repo): ```bash ls ansible/ # Should show roles/, group_vars/, etc. ``` **Error: "Vault key not found at ..."** The `.ansible_vault_key` file is missing from the Ansible repo: ```bash # This file is git-ignored and must be securely distributed # Ask your team for the current vault password # Place it at: ansible/.ansible_vault_key ``` **Error: "Decryption failed"** The vault password is wrong or the vault key file doesn't match the encrypted files: ```bash # Verify you have the correct key ansible-vault view ansible/group_vars/input_ccat/vault_application.yml # Enter the correct password when prompted # If still failing, check the files are valid YAML # and re-encrypt if necessary ``` **Error: ".env not created on the host"** After running `ccat secrets provision`, check: 1. SSH to the target host and check if the file exists: ```bash ls -la /opt/data-center/system-integration/.env ``` 2. Check Ansible logs for errors: ```bash # Re-run in verbose mode cd ansible ansible-playbook \ -i inventory.ini \ --vault-password-file .ansible_vault_key \ --tags env \ -vvv \ playbook_setup_vms.yml ``` 3. Verify permissions on the destination directory: ```bash ls -ld /opt/data-center/system-integration/ # Should be owned by root or appropriate system_user ``` ## Redis TLS Certificate Lifecycle Redis connections in production and staging use mutual TLS (mTLS). Certificates are managed by the `ccat redis-certs` CLI and distributed via Ansible. ### Cert Variants Each variant corresponds to a Redis instance: All variants distribute client certs (`ca.crt`, `client.crt`, `client.key`) to their respective hosts. Server hosts additionally receive `redis.crt` and `redis.key`. ### Generating Certs Generate a fresh set of certificates for a variant: ```bash ccat redis-certs generate main ccat redis-certs generate develop --force # overwrite existing ``` This runs 8 openssl steps (CA key → CA cert → server key/CSR/cert → client key/CSR/cert) and sets local permissions (`*.crt`/`client.key` = 644, `ca.key`/`redis.key` = 600). For variants with a Grafana mapping (`main`, `develop`), the corresponding Grafana datasource provisioning YAML is automatically updated with the new certs. ### Distributing Certs Push generated certs to remote hosts via Ansible: ```bash ccat redis-certs distribute --variant main ccat redis-certs distribute # all variants (with confirmation) ccat redis-certs distribute --variant develop --dry-run ``` ### Rotating Certs Rotation regenerates certs locally and immediately distributes them: ```bash ccat redis-certs rotate --variant main ccat redis-certs rotate # all variants ``` :::{warning} Rotating certs **breaks Redis connections** until distribution completes on all machines and containers are restarted. Plan a brief maintenance window. ::: After rotation: 1. Grafana datasource YAMLs are auto-updated for mapped variants (`main` → production, `develop` → staging) 2. Commit the updated YAMLs and deploy 3. Restart Redis and Grafana containers on affected hosts Full rotation workflow: ```bash # 1. Rotate (generates + distributes) ccat redis-certs rotate --variant main # 2. Commit the auto-updated Grafana YAML git add grafana/provisioning/production/datasources/redis.yaml git commit -m "Rotate production Redis TLS certs" git push # 3. Restart containers on the target host ssh input-b 'cd /opt/data-center/system-integration && \ git pull && \ docker compose restart redis grafana' ``` ### Checking Status Compare local vs remote cert fingerprints and Grafana sync status: ```bash ccat redis-certs status # all variants ccat redis-certs status --variant main --verbose ``` Output includes: - **Per-host table**: local vs remote CA fingerprint with IN SYNC / MISMATCH status - **Grafana sync**: whether the embedded CA cert in the provisioning YAML matches the local cert (IN SYNC / MISMATCH / YAML MISSING) ### Cleaning Cruft Audit remote cert folders for unexpected files (e.g. `ca.key` that should never be deployed, or `redis.key` on a client-only host): ```bash ccat redis-certs clean --dry-run # inspect only ccat redis-certs clean --variant main # clean specific variant ``` ### Grafana Redis Datasource Sync Grafana connects to Redis via TLS using the `redis-datasource` plugin. TLS certificates (CA, client cert, client key) are embedded inline in the datasource provisioning YAML files under `secureJsonData`. These files are git-tracked and mounted read-only into the Grafana container. **Mapped files:** | Variant | Grafana Provisioning YAML | | ------- | -------------------------------------------------------- | | main | `grafana/provisioning/production/datasources/redis.yaml` | | develop | `grafana/provisioning/staging/datasources/redis.yaml` | **How it works:** - `ccat redis-certs generate` and `ccat redis-certs rotate` auto-update the Grafana YAML for mapped variants after cert generation - `ccat redis-certs status` checks whether the embedded cert matches the local cert - `ccat redis-certs update-grafana` regenerates YAMLs on demand without regenerating certs **Standalone update (without rotation):** ```bash # Update all mapped variants ccat redis-certs update-grafana # Update only production ccat redis-certs update-grafana --variant main ``` **Password injection:** The provisioning YAML uses `${REDIS_PASSWORD}` which Grafana resolves from its container environment. The production and staging `docker-compose.*.input-b.yml` files pass this variable to Grafana: ```yaml grafana: environment: - REDIS_PASSWORD=${REDIS_PASSWORD:?REDIS_PASSWORD must be set in .env} ``` This means Grafana's Redis datasource password stays in sync with the `.env` file managed by `ccat secrets provision` — no hardcoded passwords in the YAML. ## Related Documentation - {doc}`deployment` — General deployment procedures - {doc}`configuration` — Configuration file reference - [Ansible Vault](https://docs.ansible.com/ansible/latest/user_guide/vault.html) — Official Ansible Vault docs - [Dynaconf Settings](https://www.dynaconf.com/) — Settings management library used by CCAT apps