Distributed Architecture#
The ops-db-api operates in a distributed environment with a single authoritative database and multiple read-only replicas. Understanding this topology is crucial for understanding the system’s behavior.
Database Topology#
The system consists of:
One authoritative database (main site - Cologne, Germany)
One or more read-only replicas (secondary sites - observatory in Chile, potentially others)
One API instance per site (each connecting to its local database)
graph TB
subgraph "Main Site (Cologne)"
API_Main[API Server<br/>SITE_TYPE=main]
DB_Main[(Main Database<br/>Read + Write)]
Redis_Main[Redis<br/>Caching only]
end
subgraph "Observatory Site (Chile)"
API_Obs[API Server<br/>SITE_TYPE=secondary]
DB_Replica[(Replica Database<br/>Read-only)]
Redis_Obs[Redis<br/>Transaction Buffer]
end
API_Main -->|Direct Writes| DB_Main
API_Main -->|Direct Reads| DB_Main
API_Main -->|Cache| Redis_Main
API_Obs -->|Buffered Writes| Redis_Obs
Redis_Obs -->|Background Process| DB_Main
API_Obs -->|Reads| DB_Replica
API_Obs -->|Merge Buffer| Redis_Obs
DB_Main -.->|Streaming Replication<br/>WAL| DB_Replica
style DB_Main fill:#90EE90
style DB_Replica fill:#FFB6C1
style Redis_Obs fill:#FFD700
PostgreSQL Streaming Replication#
How It Works#
PostgreSQL replication uses the Write-Ahead Log (WAL):
Main database writes all changes to WAL before applying them
WAL segments are shipped to replicas (continuously or in batches)
Replicas replay WAL segments to stay synchronized
Replication lag is the time between main write and replica apply
sequenceDiagram
participant Client
participant Main DB
participant WAL
participant Replica
Client->>Main DB: INSERT observation
Main DB->>WAL: Write WAL record (LSN: 0/12345678)
Main DB->>Main DB: Apply to database
Main DB-->>Client: Success
Note over WAL,Replica: Continuous streaming
WAL->>Replica: Stream WAL record
Replica->>Replica: Replay WAL (apply changes)
Replica->>Replica: Update replay LSN
Note over Replica: Now has data at LSN 0/12345678
WAL (Write-Ahead Log)#
The WAL is PostgreSQL’s transaction log:
What it contains:
Every INSERT, UPDATE, DELETE operation
Transaction commits and aborts
Schema changes (DDL)
Sequence increments
Why it’s powerful:
Guaranteed durability (written before data changes)
Point-in-time recovery possible
Replication uses the same mechanism as crash recovery
Binary format - efficient to transmit
LSN (Log Sequence Number)#
The LSN is a unique identifier for each WAL record:
Format: file_offset/byte_offset (e.g., 0/12345678)
Properties:
Monotonically increasing (never decreases)
Unique per transaction
Consistent across main and replicas
How we use it:
# After writing to main database
lsn_after_write = execute_query("SELECT pg_current_wal_lsn()")
# Returns: '0/12345678'
# Check if replica has caught up
replica_lsn = execute_query("SELECT pg_last_wal_replay_lsn()")
# Returns: '0/12345600' (slightly behind)
if replica_lsn >= lsn_after_write:
print("Replica has our data!")
else:
print(f"Replica is {lsn_after_write - replica_lsn} bytes behind")
See LSN Tracking for implementation details.
Replication Lag#
Lag is the time between a write on main and its availability on replica:
Factors affecting lag:
Network bandwidth (WAL transmission speed)
Network latency (round-trip time)
Replica load (CPU, disk I/O)
WAL volume (many writes = more to replay)
Typical lag:
Good network: Milliseconds to seconds
Slow network: Seconds to minutes
Network down: Infinite (accumulates until reconnection)
How we handle it:
Buffer writes locally (observatory doesn’t block on lag)
Track LSN precisely (know when data has replicated)
Merge buffered + replicated data on reads (consistent view)
Extend cache TTL if replication takes longer than expected
Site Types: MAIN vs SECONDARY#
The API behavior changes dramatically based on site type:
MAIN Site (Cologne)#
Configuration:
SITE_TYPE=main
MAIN_DB_HOST=localhost # Same as LOCAL_DB_HOST
LOCAL_DB_HOST=localhost
Behavior:
Writes: Direct to database (no buffering)
Reads: Direct from database
Redis: Used only for caching, not buffering
Background Processor: Disabled or minimal activity
LSN Tracking: Not needed (no replication to track)
Why:
Main database is local (low latency)
No replication lag (reads immediately consistent)
Buffering overhead unnecessary
When to use:
Production main site (Cologne)
Development when testing direct database operations
Any site where main database is accessible with low latency
SECONDARY Site (Observatory)#
Configuration:
SITE_TYPE=secondary
MAIN_DB_HOST=main-db.example.com # Remote main database
LOCAL_DB_HOST=localhost # Local read-only replica
Behavior:
Critical Writes: Buffered in Redis, processed asynchronously
Non-Critical Writes: Direct to local database (fails if read-only)
Reads: From local replica + merge with buffered data
Redis: Transaction buffer + read buffer + caching
Background Processor: Active, processes buffered transactions
LSN Tracking: Active, monitors replication state
Why:
Main database is remote (high latency or unavailable)
Replication lag expected (merge buffer with DB)
Operations must never block (buffer ensures reliability)
When to use:
Production observatory site
Development when testing transaction buffering
Any site where main database is unreliable
Connection Configuration#
The API maintains two database connections:
Main Database Connection#
Purpose: Execute write operations (INSERT, UPDATE, DELETE)
Configuration:
MAIN_DB_HOST=main-db.example.com
MAIN_DB_PORT=5432
MAIN_DB_USER=ccat_ops_user
MAIN_DB_PASSWORD=***
MAIN_DB_NAME=ccat_ops_db
Used by:
Background processor (executing buffered transactions)
Transaction executor (at main site for direct writes)
Connection pool:
Async connection (asyncpg)
Separate pool for background processing
Configurable pool size (default: 10 connections)
Local Database Connection#
Purpose: Execute read operations (SELECT)
Configuration:
LOCAL_DB_HOST=localhost # Replica at secondary, main at main site
LOCAL_DB_PORT=5432
LOCAL_DB_USER=ccat_ops_user
LOCAL_DB_PASSWORD=***
LOCAL_DB_NAME=ccat_ops_db
Used by:
All read endpoints
Smart query manager (merge with buffered data)
Non-critical operations (at main site)
Connection pool:
Async connection (asyncpg)
Shared across all API requests
Configurable pool size (default: 20 connections)
Why Two Connections?#
Separation of concerns:
Reads can go to local replica (fast, no network)
Writes go to main database (authoritative)
Background processor doesn’t interfere with read traffic
Flexibility:
Main site: Both connections point to same database
Secondary site: Different databases (main remote, local replica)
Testing: Can point to different databases for verification
Network Partition Scenarios#
Understanding how the system behaves during network issues:
Scenario 1: Observatory Loses Connection#
What happens:
API always uses buffering mode
Critical operations buffer in Redis
Reads continue from local replica (slightly stale data)
Buffer accumulates in Redis (monitored for size)
Background processor retries periodically
When network restored: buffer drains automatically
Data consistency:
Buffered writes not visible in local reads (until replicated)
Smart query manager merges buffer + DB (consistent view)
Read buffer manager handles updates to buffered records
Recovery:
Automatic - no manual intervention needed
Background processor processes entire buffer
LSN tracking confirms when data reaches replica
Cache cleanup occurs after replication confirmed
Scenario 2: Main Site Database Down#
What happens:
Background processor can’t execute buffered transactions
Buffer continues to accumulate in Redis
Observatory operations continue unaffected
When database restored: buffer drains automatically
Data loss risk:
Redis with AOF: Minimal risk (persisted to disk)
Redis memory-only: Risk if Redis also fails
Recommendation: Always use Redis AOF at observatory
Scenario 3: Replica Falls Behind#
What happens:
Local reads become increasingly stale
Smart query manager still merges buffer (partially mitigates)
LSN tracking detects lag
Cache TTL extended (preserve buffered data longer)
When critical:
Rare - replicas usually catch up quickly
If persistent: Check replica performance, network bandwidth
Can temporarily point reads to main database (high latency)
Why Not Application-Level Sync?#
Alternative approach: Application manages synchronization
# Application-level sync (we don't do this)
def sync_observations():
# Get observations from local DB modified since last sync
local_obs = db_local.query(
Observation
).filter(
Observation.modified_at > last_sync_time
)
# Upload to main database
for obs in local_obs:
db_main.merge(obs) # What if already modified at main?
# Download observations from main DB
main_obs = db_main.query(
Observation
).filter(
Observation.modified_at > last_sync_time
)
# Apply to local DB
for obs in main_obs:
db_local.merge(obs) # Conflict resolution needed!
Problems with this approach:
Conflict Resolution: If both sites modify same record, how to merge?
Foreign Keys: Need to track and resolve cross-table relationships
Partial Sync: Some tables sync, others don’t - inconsistent state
Transaction Atomicity: Multi-table transactions can sync partially
Schema Changes: Every migration needs sync logic update
Performance: Full table scans or complex change tracking needed
PostgreSQL streaming replication solves all these at the database engine level.
Summary#
The distributed architecture provides:
Single source of truth: Main database (Cologne)
Local replicas: Fast reads at each site (PostgreSQL streaming replication)
Site-aware behavior: MAIN (direct writes) vs SECONDARY (buffered writes)
Automatic replication: PostgreSQL handles data synchronization
Precise tracking: LSN tells us exactly when data has replicated
Network resilience: Operations continue during network issues
This architecture is optimized for reliability at the cost of:
Additional complexity (buffering, LSN tracking, smart queries)
Eventual consistency (seconds to minutes lag)
Single point of write failure (main database)
These trade-offs are acceptable for the observatory use case where reliability trumps immediate consistency.
Next Steps#
Reliability First - How we ensure operations never fail
Database Topology - Technical details of database setup
LSN Tracking - How LSN tracking works