# TLS, Certificates, and Public Key Infrastructure

```{contents} On this page
:depth: 2
:local: true
```

This document explains how TLS certificates and PKI (Public Key Infrastructure)
work, using the CCAT Data Center's Redis mTLS setup as a concrete, running
example. By the end you should understand what each file does, what is secret,
what is public, and how the pieces fit together.

## The Trust Chain — Certificates as a Notary System

Think of a **Certificate Authority (CA)** as a notary. When a service presents
a certificate, the other side checks: "was this signed by a notary I trust?"
If yes, the certificate is accepted — no prior relationship needed.

This is the core idea behind all TLS. Your browser does it thousands of times a
day when connecting to HTTPS websites.

## The Files

When we run `ccat redis-certs generate main`, eight openssl commands produce
the following files:

```text
redis/main/certs/
├── ca.key          # CA's PRIVATE key — the notary's stamp die
├── ca.crt          # CA's PUBLIC certificate — "here's who the notary is"
├── ca.srl          # Serial number counter (bookkeeping, not security-relevant)
│
├── redis.key       # Server's PRIVATE key — Redis's secret
├── redis.csr       # Certificate Signing Request (temporary, used during signing)
├── redis.crt       # Server's PUBLIC certificate — "I am Redis, signed by CA"
│
├── client.key      # Client's PRIVATE key — the connecting app's secret
├── client.csr      # Certificate Signing Request (temporary)
└── client.crt      # Client's PUBLIC certificate — "I am a legitimate client, signed by CA"
```

There are three keypairs (CA, server, client), each consisting of a private
`.key` file and a public `.crt` certificate. The `.csr` files are
intermediate artifacts used only during signing and can be deleted afterwards.

## Public vs Private — The Golden Rule

```{eval-rst}
.. list-table::
   :header-rows: 1
   :widths: 15 10 25 50

   * - File
     - Secret?
     - Who has it
     - Purpose
   * - ``ca.key``
     - **YES — most critical**
     - Only the machine that signs certs. Never deployed to any server.
     - Signs new certificates. If stolen, an attacker can forge any certificate
       in the trust domain.
   * - ``ca.crt``
     - No — public
     - Everyone. All hosts, all clients.
     - "This is the CA I trust." Used to verify signatures on other certificates.
   * - ``redis.key``
     - **YES**
     - Only the Redis server host
     - Proves "I am the real Redis server" during the TLS handshake.
   * - ``redis.crt``
     - No — public
     - Anyone can see it
     - Contains Redis's public key plus the CA's signature confirming it is
       legitimate.
   * - ``client.key``
     - **YES**
     - Only the client machines (ops-db-api, data-transfer, Grafana, etc.)
     - Proves "I am a legitimate client" during mTLS handshake.
   * - ``client.crt``
     - No — public
     - Anyone can see it
     - Contains the client's public key plus the CA's signature.
   * - ``*.csr``
     - No — temporary
     - Deleted after signing
     - A "please sign this" request containing the public key and identity info.
```

:::{important}
The private key (`*.key`) **never** leaves the machine it belongs to. The
certificate (`*.crt`) is freely distributable — it contains only the public
key and the CA's signature. Compromise of a private key means that entity can
be impersonated; compromise of the CA key means **any** entity can be forged.
:::

## What Is Actually Inside a Certificate?

A `.crt` file is a signed document containing:

```text
┌──────────────────────────────────────────┐
│  Subject: CN=Redis Server                │  ← Who this cert belongs to
│  Issuer:  CN=Redis CA                    │  ← Who signed it (the CA)
│  Valid:   2026-03-27 to 2036-03-25       │  ← Expiry window
│  Public Key: [4096-bit RSA key]          │  ← The public half of redis.key
│  SANs: redis, input-b, 134.95.x.x       │  ← Hostnames/IPs this cert covers
│  ──────────────────────────────────────  │
│  Signature: [bytes signed by ca.key]     │  ← Proof the CA approved this
└──────────────────────────────────────────┘
```

You can inspect any certificate yourself:

```bash
openssl x509 -in redis/main/certs/redis.crt -text -noout
```

Key fields to look for:

- **Subject / Issuer** — identity chain (who is this, who vouches for them)
- **Validity** — `Not Before` / `Not After` dates
- **Subject Alternative Names (SANs)** — the hostnames and IPs the cert is valid
  for. A client will reject the cert if the hostname it connected to is not in
  the SANs list.
- **Signature Algorithm** — should be SHA-256 or better (never SHA-1)

## How a TLS Connection Works

Here is what happens when ops-db-api connects to Redis with mTLS:

```text
ops-db-api                                    Redis
(has: ca.crt, client.crt, client.key)        (has: ca.crt, redis.crt, redis.key)
    │                                             │
    │─── 1. "Hello, I want to connect" ─────────>│
    │                                             │
    │<── 2. "Here's my redis.crt" ───────────────│
    │                                             │
    │  3. Verify: is redis.crt                    │
    │     signed by the CA in my ca.crt?          │
    │     YES → server is legitimate              │
    │                                             │
    │─── 4. "Here's my client.crt" ─────────────>│
    │                                             │
    │     5. Verify: is client.crt                │
    │        signed by the CA in my ca.crt?       │
    │        YES → client is legitimate           │
    │                                             │
    │<══ 6. Encrypted connection established ════>│
```

- **Steps 1–3** are standard TLS (the same thing your browser does for HTTPS).
- **Steps 4–5** are the **mutual** part of mTLS — the server also verifies the
  client. That is why Redis is configured with `--tls-ca-cert-file`: it uses
  the CA certificate to validate incoming client certificates.
- **Step 6** establishes an encrypted channel. From this point on, all data is
  encrypted with a session key that was negotiated during the handshake. Even if
  someone captures the network traffic, they cannot read it.

:::{note}
Regular TLS (without the "m") only verifies the server. The client is
typically authenticated by other means (passwords, tokens). mTLS adds client
certificate verification, which means **both sides prove their identity
cryptographically** before any data is exchanged. We use mTLS for Redis
because it eliminates the possibility of an unauthorized service connecting,
even if it somehow obtains the Redis password.
:::

## TLS vs mTLS — When to Use Which

```{eval-rst}
.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Mode
     - What it verifies
     - Use when
   * - **TLS** (one-way)
     - Client verifies the server is who it claims to be.
       Server does not verify the client.
     - Public-facing web services, APIs with token-based auth.
       Example: Grafana behind nginx with Let's Encrypt.
   * - **mTLS** (mutual)
     - Both sides verify each other via certificates.
     - Internal service-to-service communication where you want
       cryptographic identity on both ends. Example: Redis, PostgreSQL
       connections between backend services.

```

## The 8-Step Certificate Generation Process

Here is what `ccat redis-certs generate` does under the hood, mapped to the
concepts above:

```text
Step 1: openssl genrsa → ca.key            Create the CA's private key
                                             (a random 4096-bit number)

Step 2: openssl req -x509 → ca.crt         Self-sign the CA's own certificate
                                             ("I am a CA, and I vouch for myself")
                                             This is a ROOT certificate.

Step 3: openssl genrsa → redis.key          Create the server's private key

Step 4: openssl req -new → redis.csr        Create a signing request
                                             ("please certify me as Redis")

Step 5: openssl x509 -req → redis.crt       CA signs the request → server cert
                                             Uses ca.key to sign; references ca.crt.

Step 6: openssl genrsa → client.key         Create the client's private key

Step 7: openssl req -new → client.csr       Create a signing request
                                             ("please certify me as a client")

Step 8: openssl x509 -req → client.crt      CA signs the request → client cert
```

Steps 1–2 create the CA itself. Steps 3–5 produce the server certificate.
Steps 6–8 produce the client certificate. The CSR files (steps 4 and 7) are
intermediaries that can be deleted after signing.

## File Permissions and Deployment

Not all files are deployed to all hosts. The deployment rules reflect the
public/private distinction:

```{eval-rst}
.. list-table::
   :header-rows: 1
   :widths: 15 15 15 55

   * - File
     - Permission
     - Deployed to
     - Rationale
   * - ``ca.key``
     - ``0600``
     - **Nowhere** — stays on the signing machine only
     - Most critical secret. If this leaks, all certs can be forged.
   * - ``ca.crt``
     - ``0644``
     - All hosts (servers and clients)
     - Public. Everyone needs it to verify certificates.
   * - ``redis.key``
     - ``0600``, owned by UID 999 (Redis user)
     - Redis server host only (e.g. input-b)
     - Only Redis needs its own private key.
   * - ``redis.crt``
     - ``0644``
     - Redis server host only
     - Public, but only the server presents it.
   * - ``client.key``
     - ``0644``
     - All client hosts (input-a, input-c, reuna, etc.)
     - Needs to be readable by multiple container UIDs. This is a
       pragmatic trade-off; ideally each client would have its own keypair.
   * - ``client.crt``
     - ``0644``
     - All client hosts
     - Public half of the client identity.
```

:::{note}
`client.key` is `0644` (world-readable) because multiple containers
running as different UIDs need to read it. In a CA-managed setup, each
service would get its own unique client certificate, avoiding this shared-key
pattern.
:::

## Root Certificates, Intermediates, and Trust Hierarchies

Our current Redis setup uses a **flat, single-tier CA**: one CA key signs
everything directly. This is simple but has a drawback — if the CA key is
compromised, you must replace it and re-issue every certificate.

Production PKI systems use a **two-tier hierarchy**:

```text
Current (flat, per-service):           With a CA hierarchy:

Redis CA ──┬── redis.crt                Offline Root CA
           └── client.crt                     │
                                         Intermediate CA (online, in HSM)
(one independent CA per variant;          ├── redis.crt
 4 separate trust roots)                  ├── redis-client.crt
                                          ├── postgres.crt
                                          ├── influxdb.crt
                                          ├── loki.crt
                                          ├── SSH host certificates
                                          └── SSH user certificates
```

**Root CA:**

: Generated once on an air-gapped (offline) machine. Signs only the intermediate
  certificate. Stored in a safe (encrypted USB drive or similar). Never connected
  to the network.

**Intermediate CA:**

: The day-to-day signing key. Lives on the CA server, protected by a hardware or
  software security module. If compromised, you revoke it and sign a new
  intermediate from the offline root — services only need to trust the root,
  which never changes.

The benefit: one root of trust for the entire infrastructure. Adding TLS to a
new service (PostgreSQL, InfluxDB, Loki) is issuing one more certificate from
the same CA, not building a new CA from scratch.

## Certificate Lifecycle

Certificates are not permanent. They have a validity window and must be renewed
before they expire.

```{eval-rst}
.. list-table::
   :header-rows: 1
   :widths: 25 20 55

   * - Scenario
     - Typical lifetime
     - Renewal approach
   * - Current Redis certs
     - 10 years (``-days 3650``)
     - Manual rotation via ``ccat redis-certs rotate``
   * - CA-issued server certs (ACME)
     - 90 days
     - Automatic renewal via systemd timer or step agent
   * - CA-issued SSH user certs
     - 16 hours
     - User runs ``step ssh login`` daily (opens browser for GitHub SSO)
   * - CA-issued SSH host certs
     - 7 days
     - Automatic renewal via SSHPOP provisioner + systemd timer
```

Short-lived certificates are a security feature, not an inconvenience. A stolen
10-year certificate is useful for 10 years. A stolen 16-hour certificate is
useful until end-of-day. The operational cost of short lifetimes is offset by
automating renewal.

## SSH Certificates — How They Differ

SSH certificates use OpenSSH's own format (not X.509), but the concept is
identical:

- A **CA** signs a user's or host's public key, producing a certificate.
- SSH servers are configured to trust the CA (`TrustedUserCAKeys`).
- Users present their certificate instead of registering individual public keys
  in `authorized_keys`.

```text
Traditional SSH:                      Certificate-based SSH:

User generates keypair               User authenticates via GitHub SSO
User sends public key to admin       CA issues a short-lived certificate
Admin adds to authorized_keys        SSH server trusts the CA
Key valid until manually removed     Certificate expires in 16 hours

Problem: keys accumulate,             Benefit: no key management,
no expiry, painful offboarding        access revoked by removing from
                                       GitHub org — cert expires on its own
```

The daily workflow for a developer:

```bash
# One-time setup (~5 minutes)
step ca bootstrap --ca-url https://ca.data.ccat.uni-koeln.de \
  --fingerprint <root-ca-fingerprint>

# Daily: get a certificate (opens browser → GitHub login)
step ssh login yourname@github.com --provisioner CCAT-GitHub

# Then just SSH normally — the cert is in your SSH agent
ssh input-b
```

## Key Concepts Glossary

:::{glossary}
:sorted: true

CA (Certificate Authority)

: An entity that signs certificates, vouching for the identity of the
  certificate holder. Analogous to a notary.

Certificate (`.crt`)

: A signed document binding a public key to an identity (hostname, username,
  organization). Contains the public key, identity information, validity
  period, and the CA's signature.

Private Key (`.key`)

: The secret half of a keypair. Must never leave the machine it belongs to.
  Used to prove ownership of the corresponding certificate during TLS
  handshakes.

CSR (Certificate Signing Request, `.csr`)

: A request sent to a CA containing a public key and identity information.
  The CA verifies the request and returns a signed certificate. The CSR is
  a temporary artifact.

mTLS (Mutual TLS)

: A TLS connection where both sides (client and server) present and verify
  certificates. Provides cryptographic identity for both parties.

SAN (Subject Alternative Name)

: A certificate field listing the hostnames and IP addresses the certificate
  is valid for. Clients reject certificates whose SANs do not match the
  host they connected to.

Root Certificate

: A self-signed CA certificate at the top of the trust chain. Generated
  offline and stored securely. Clients and servers are configured to trust
  this certificate.

Intermediate Certificate

: A CA certificate signed by the root CA. Used for day-to-day signing. Can
  be revoked and replaced without changing the root trust anchor.

PKI (Public Key Infrastructure)

: The system of CAs, certificates, and policies that manages digital
  identities. Encompasses everything from certificate issuance to
  revocation and renewal.

HSM (Hardware Security Module)

: A dedicated device that stores cryptographic keys and performs signing
  operations. The key cannot be extracted from the device, only used
  through its interface.

SoftHSM2

: A software emulation of an HSM, providing the same PKCS#11 interface.
  Keys are stored in encrypted files rather than on dedicated hardware.
  Suitable for environments where USB hardware is not available.

PKCS#11

: A standard API for communicating with hardware and software security
  modules. Both real HSMs and SoftHSM2 expose this interface, making them
  interchangeable from the application's perspective.

ACME (Automatic Certificate Management Environment)

: A protocol for automating certificate issuance and renewal. Originally
  designed for Let's Encrypt; also supported by step-ca for internal
  certificate management.

FIDO2 / WebAuthn

: A standard for hardware-based authentication. FIDO2 security keys
  (YubiKey, Nitrokey) provide phishing-resistant two-factor authentication
  and can generate SSH keys that are bound to the physical device.
:::

## Further Reading

- {doc}`../secrets-management` — Operational guide for managing secrets
  in the CCAT Data Center
- [OpenSSL Cookbook](https://www.feistyduck.com/library/openssl-cookbook/) —
  Comprehensive guide to practical TLS and certificate operations
- [Smallstep Practical Zero Trust](https://smallstep.com/practical-zero-trust/) —
  Background on certificate-based infrastructure
- [SSH Certificates (man ssh-keygen, CERTIFICATES section)](https://man.openbsd.org/ssh-keygen#CERTIFICATES) — OpenSSH certificate
  format reference