# TLS, Certificates, and Public Key Infrastructure ```{contents} On this page :depth: 2 :local: true ``` This document explains how TLS certificates and PKI (Public Key Infrastructure) work, using the CCAT Data Center's Redis mTLS setup as a concrete, running example. By the end you should understand what each file does, what is secret, what is public, and how the pieces fit together. ## The Trust Chain — Certificates as a Notary System Think of a **Certificate Authority (CA)** as a notary. When a service presents a certificate, the other side checks: "was this signed by a notary I trust?" If yes, the certificate is accepted — no prior relationship needed. This is the core idea behind all TLS. Your browser does it thousands of times a day when connecting to HTTPS websites. ## The Files When we run `ccat redis-certs generate main`, eight openssl commands produce the following files: ```text redis/main/certs/ ├── ca.key # CA's PRIVATE key — the notary's stamp die ├── ca.crt # CA's PUBLIC certificate — "here's who the notary is" ├── ca.srl # Serial number counter (bookkeeping, not security-relevant) │ ├── redis.key # Server's PRIVATE key — Redis's secret ├── redis.csr # Certificate Signing Request (temporary, used during signing) ├── redis.crt # Server's PUBLIC certificate — "I am Redis, signed by CA" │ ├── client.key # Client's PRIVATE key — the connecting app's secret ├── client.csr # Certificate Signing Request (temporary) └── client.crt # Client's PUBLIC certificate — "I am a legitimate client, signed by CA" ``` There are three keypairs (CA, server, client), each consisting of a private `.key` file and a public `.crt` certificate. The `.csr` files are intermediate artifacts used only during signing and can be deleted afterwards. ## Public vs Private — The Golden Rule ```{eval-rst} .. list-table:: :header-rows: 1 :widths: 15 10 25 50 * - File - Secret? - Who has it - Purpose * - ``ca.key`` - **YES — most critical** - Only the machine that signs certs. Never deployed to any server. - Signs new certificates. If stolen, an attacker can forge any certificate in the trust domain. * - ``ca.crt`` - No — public - Everyone. All hosts, all clients. - "This is the CA I trust." Used to verify signatures on other certificates. * - ``redis.key`` - **YES** - Only the Redis server host - Proves "I am the real Redis server" during the TLS handshake. * - ``redis.crt`` - No — public - Anyone can see it - Contains Redis's public key plus the CA's signature confirming it is legitimate. * - ``client.key`` - **YES** - Only the client machines (ops-db-api, data-transfer, Grafana, etc.) - Proves "I am a legitimate client" during mTLS handshake. * - ``client.crt`` - No — public - Anyone can see it - Contains the client's public key plus the CA's signature. * - ``*.csr`` - No — temporary - Deleted after signing - A "please sign this" request containing the public key and identity info. ``` :::{important} The private key (`*.key`) **never** leaves the machine it belongs to. The certificate (`*.crt`) is freely distributable — it contains only the public key and the CA's signature. Compromise of a private key means that entity can be impersonated; compromise of the CA key means **any** entity can be forged. ::: ## What Is Actually Inside a Certificate? A `.crt` file is a signed document containing: ```text ┌──────────────────────────────────────────┐ │ Subject: CN=Redis Server │ ← Who this cert belongs to │ Issuer: CN=Redis CA │ ← Who signed it (the CA) │ Valid: 2026-03-27 to 2036-03-25 │ ← Expiry window │ Public Key: [4096-bit RSA key] │ ← The public half of redis.key │ SANs: redis, input-b, 134.95.x.x │ ← Hostnames/IPs this cert covers │ ────────────────────────────────────── │ │ Signature: [bytes signed by ca.key] │ ← Proof the CA approved this └──────────────────────────────────────────┘ ``` You can inspect any certificate yourself: ```bash openssl x509 -in redis/main/certs/redis.crt -text -noout ``` Key fields to look for: - **Subject / Issuer** — identity chain (who is this, who vouches for them) - **Validity** — `Not Before` / `Not After` dates - **Subject Alternative Names (SANs)** — the hostnames and IPs the cert is valid for. A client will reject the cert if the hostname it connected to is not in the SANs list. - **Signature Algorithm** — should be SHA-256 or better (never SHA-1) ## How a TLS Connection Works Here is what happens when ops-db-api connects to Redis with mTLS: ```text ops-db-api Redis (has: ca.crt, client.crt, client.key) (has: ca.crt, redis.crt, redis.key) │ │ │─── 1. "Hello, I want to connect" ─────────>│ │ │ │<── 2. "Here's my redis.crt" ───────────────│ │ │ │ 3. Verify: is redis.crt │ │ signed by the CA in my ca.crt? │ │ YES → server is legitimate │ │ │ │─── 4. "Here's my client.crt" ─────────────>│ │ │ │ 5. Verify: is client.crt │ │ signed by the CA in my ca.crt? │ │ YES → client is legitimate │ │ │ │<══ 6. Encrypted connection established ════>│ ``` - **Steps 1–3** are standard TLS (the same thing your browser does for HTTPS). - **Steps 4–5** are the **mutual** part of mTLS — the server also verifies the client. That is why Redis is configured with `--tls-ca-cert-file`: it uses the CA certificate to validate incoming client certificates. - **Step 6** establishes an encrypted channel. From this point on, all data is encrypted with a session key that was negotiated during the handshake. Even if someone captures the network traffic, they cannot read it. :::{note} Regular TLS (without the "m") only verifies the server. The client is typically authenticated by other means (passwords, tokens). mTLS adds client certificate verification, which means **both sides prove their identity cryptographically** before any data is exchanged. We use mTLS for Redis because it eliminates the possibility of an unauthorized service connecting, even if it somehow obtains the Redis password. ::: ## TLS vs mTLS — When to Use Which ```{eval-rst} .. list-table:: :header-rows: 1 :widths: 20 40 40 * - Mode - What it verifies - Use when * - **TLS** (one-way) - Client verifies the server is who it claims to be. Server does not verify the client. - Public-facing web services, APIs with token-based auth. Example: Grafana behind nginx with Let's Encrypt. * - **mTLS** (mutual) - Both sides verify each other via certificates. - Internal service-to-service communication where you want cryptographic identity on both ends. Example: Redis, PostgreSQL connections between backend services. ``` ## The 8-Step Certificate Generation Process Here is what `ccat redis-certs generate` does under the hood, mapped to the concepts above: ```text Step 1: openssl genrsa → ca.key Create the CA's private key (a random 4096-bit number) Step 2: openssl req -x509 → ca.crt Self-sign the CA's own certificate ("I am a CA, and I vouch for myself") This is a ROOT certificate. Step 3: openssl genrsa → redis.key Create the server's private key Step 4: openssl req -new → redis.csr Create a signing request ("please certify me as Redis") Step 5: openssl x509 -req → redis.crt CA signs the request → server cert Uses ca.key to sign; references ca.crt. Step 6: openssl genrsa → client.key Create the client's private key Step 7: openssl req -new → client.csr Create a signing request ("please certify me as a client") Step 8: openssl x509 -req → client.crt CA signs the request → client cert ``` Steps 1–2 create the CA itself. Steps 3–5 produce the server certificate. Steps 6–8 produce the client certificate. The CSR files (steps 4 and 7) are intermediaries that can be deleted after signing. ## File Permissions and Deployment Not all files are deployed to all hosts. The deployment rules reflect the public/private distinction: ```{eval-rst} .. list-table:: :header-rows: 1 :widths: 15 15 15 55 * - File - Permission - Deployed to - Rationale * - ``ca.key`` - ``0600`` - **Nowhere** — stays on the signing machine only - Most critical secret. If this leaks, all certs can be forged. * - ``ca.crt`` - ``0644`` - All hosts (servers and clients) - Public. Everyone needs it to verify certificates. * - ``redis.key`` - ``0600``, owned by UID 999 (Redis user) - Redis server host only (e.g. input-b) - Only Redis needs its own private key. * - ``redis.crt`` - ``0644`` - Redis server host only - Public, but only the server presents it. * - ``client.key`` - ``0644`` - All client hosts (input-a, input-c, reuna, etc.) - Needs to be readable by multiple container UIDs. This is a pragmatic trade-off; ideally each client would have its own keypair. * - ``client.crt`` - ``0644`` - All client hosts - Public half of the client identity. ``` :::{note} `client.key` is `0644` (world-readable) because multiple containers running as different UIDs need to read it. In a CA-managed setup, each service would get its own unique client certificate, avoiding this shared-key pattern. ::: ## Root Certificates, Intermediates, and Trust Hierarchies Our current Redis setup uses a **flat, single-tier CA**: one CA key signs everything directly. This is simple but has a drawback — if the CA key is compromised, you must replace it and re-issue every certificate. Production PKI systems use a **two-tier hierarchy**: ```text Current (flat, per-service): With a CA hierarchy: Redis CA ──┬── redis.crt Offline Root CA └── client.crt │ Intermediate CA (online, in HSM) (one independent CA per variant; ├── redis.crt 4 separate trust roots) ├── redis-client.crt ├── postgres.crt ├── influxdb.crt ├── loki.crt ├── SSH host certificates └── SSH user certificates ``` **Root CA:** : Generated once on an air-gapped (offline) machine. Signs only the intermediate certificate. Stored in a safe (encrypted USB drive or similar). Never connected to the network. **Intermediate CA:** : The day-to-day signing key. Lives on the CA server, protected by a hardware or software security module. If compromised, you revoke it and sign a new intermediate from the offline root — services only need to trust the root, which never changes. The benefit: one root of trust for the entire infrastructure. Adding TLS to a new service (PostgreSQL, InfluxDB, Loki) is issuing one more certificate from the same CA, not building a new CA from scratch. ## Certificate Lifecycle Certificates are not permanent. They have a validity window and must be renewed before they expire. ```{eval-rst} .. list-table:: :header-rows: 1 :widths: 25 20 55 * - Scenario - Typical lifetime - Renewal approach * - Current Redis certs - 10 years (``-days 3650``) - Manual rotation via ``ccat redis-certs rotate`` * - CA-issued server certs (ACME) - 90 days - Automatic renewal via systemd timer or step agent * - CA-issued SSH user certs - 16 hours - User runs ``step ssh login`` daily (opens browser for GitHub SSO) * - CA-issued SSH host certs - 7 days - Automatic renewal via SSHPOP provisioner + systemd timer ``` Short-lived certificates are a security feature, not an inconvenience. A stolen 10-year certificate is useful for 10 years. A stolen 16-hour certificate is useful until end-of-day. The operational cost of short lifetimes is offset by automating renewal. ## SSH Certificates — How They Differ SSH certificates use OpenSSH's own format (not X.509), but the concept is identical: - A **CA** signs a user's or host's public key, producing a certificate. - SSH servers are configured to trust the CA (`TrustedUserCAKeys`). - Users present their certificate instead of registering individual public keys in `authorized_keys`. ```text Traditional SSH: Certificate-based SSH: User generates keypair User authenticates via GitHub SSO User sends public key to admin CA issues a short-lived certificate Admin adds to authorized_keys SSH server trusts the CA Key valid until manually removed Certificate expires in 16 hours Problem: keys accumulate, Benefit: no key management, no expiry, painful offboarding access revoked by removing from GitHub org — cert expires on its own ``` The daily workflow for a developer: ```bash # One-time setup (~5 minutes) step ca bootstrap --ca-url https://ca.data.ccat.uni-koeln.de \ --fingerprint # Daily: get a certificate (opens browser → GitHub login) step ssh login yourname@github.com --provisioner CCAT-GitHub # Then just SSH normally — the cert is in your SSH agent ssh input-b ``` ## Key Concepts Glossary :::{glossary} :sorted: true CA (Certificate Authority) : An entity that signs certificates, vouching for the identity of the certificate holder. Analogous to a notary. Certificate (`.crt`) : A signed document binding a public key to an identity (hostname, username, organization). Contains the public key, identity information, validity period, and the CA's signature. Private Key (`.key`) : The secret half of a keypair. Must never leave the machine it belongs to. Used to prove ownership of the corresponding certificate during TLS handshakes. CSR (Certificate Signing Request, `.csr`) : A request sent to a CA containing a public key and identity information. The CA verifies the request and returns a signed certificate. The CSR is a temporary artifact. mTLS (Mutual TLS) : A TLS connection where both sides (client and server) present and verify certificates. Provides cryptographic identity for both parties. SAN (Subject Alternative Name) : A certificate field listing the hostnames and IP addresses the certificate is valid for. Clients reject certificates whose SANs do not match the host they connected to. Root Certificate : A self-signed CA certificate at the top of the trust chain. Generated offline and stored securely. Clients and servers are configured to trust this certificate. Intermediate Certificate : A CA certificate signed by the root CA. Used for day-to-day signing. Can be revoked and replaced without changing the root trust anchor. PKI (Public Key Infrastructure) : The system of CAs, certificates, and policies that manages digital identities. Encompasses everything from certificate issuance to revocation and renewal. HSM (Hardware Security Module) : A dedicated device that stores cryptographic keys and performs signing operations. The key cannot be extracted from the device, only used through its interface. SoftHSM2 : A software emulation of an HSM, providing the same PKCS#11 interface. Keys are stored in encrypted files rather than on dedicated hardware. Suitable for environments where USB hardware is not available. PKCS#11 : A standard API for communicating with hardware and software security modules. Both real HSMs and SoftHSM2 expose this interface, making them interchangeable from the application's perspective. ACME (Automatic Certificate Management Environment) : A protocol for automating certificate issuance and renewal. Originally designed for Let's Encrypt; also supported by step-ca for internal certificate management. FIDO2 / WebAuthn : A standard for hardware-based authentication. FIDO2 security keys (YubiKey, Nitrokey) provide phishing-resistant two-factor authentication and can generate SSH keys that are bound to the physical device. ::: ## Further Reading - {doc}`../secrets-management` — Operational guide for managing secrets in the CCAT Data Center - [OpenSSL Cookbook](https://www.feistyduck.com/library/openssl-cookbook/) — Comprehensive guide to practical TLS and certificate operations - [Smallstep Practical Zero Trust](https://smallstep.com/practical-zero-trust/) — Background on certificate-based infrastructure - [SSH Certificates (man ssh-keygen, CERTIFICATES section)](https://man.openbsd.org/ssh-keygen#CERTIFICATES) — OpenSSH certificate format reference