# Design Rationale ```{eval-rst} .. verified:: 2025-11-12 :reviewer: Christof Buchbender ``` This document explains the key design decisions behind the ops-db-api architecture and why alternative approaches were rejected. ```{contents} Table of Contents :depth: 2 :local: true ``` ## The Remote Observatory Challenge Understanding why this architecture exists requires understanding the environment: ### Physical Constraints The CCAT observatory site: - **Altitude**: 5600m (18,400 feet) - extreme environment - **Location**: Atacama Desert, Chile - remote, harsh conditions - **Distance**: 11,000+ km from main database in Cologne, Germany - **Connectivity**: Fiber - **Bandwidth**: Limited when available - **Latency**: 200-500ms when connected, infinite when not ### The Unacceptable Solution **What doesn't work**: Direct writes to remote database ```python # This fails during network outage def record_observation(obs_data): try: db.insert(obs_data) # Blocks until remote DB responds db.commit() # Fails if network down except NetworkError: # What now? Observation already happened! # Data is lost or requires manual intervention raise ``` **Problems**: 1. Blocks observation workflow on network latency 2. Fails completely during outages 3. No recovery mechanism 4. Lost data requires manual intervention ## Why Redis for Buffering? We chose Redis over other buffering solutions for specific reasons: ### Redis Advantages **1. In-Memory Speed** > - Sub-millisecond write latency > - No disk I/O blocking > - Observation recording never waits **2. Persistence Options** > - AOF (Append-Only File) for durability > - Configurable fsync policies > - Survives process restart **3. Data Structures** > - Lists for transaction queue (LPUSH/RPOP) > - Hashes for transaction metadata > - Sets for tracking state > - Pub/Sub for real-time updates **4. Simple Deployment** > - Single binary, minimal dependencies > - Low resource overhead > - Proven reliability **5. Atomic Operations** > - Transaction queuing is atomic > - No race conditions > - Safe for concurrent access ### Alternatives Considered **RabbitMQ / Kafka** > - **Pro**: Purpose-built for message queuing > - **Pro**: Better durability guarantees > - **Con**: Much heavier weight > - **Con**: More complex deployment > - **Con**: Overkill for single-site buffering **Local SQLite** > - **Pro**: SQL interface > - **Pro**: ACID guarantees > - **Con**: Disk I/O latency > - **Con**: Locking under concurrent access > - **Con**: No built-in pub/sub for updates **PostgreSQL Queue Table** > - **Pro**: Same database system > - **Pro**: ACID guarantees > - **Con**: Adds load to database > - **Con**: Defeats purpose (still requires DB connection) > - **Con**: Slower than in-memory **File System Queue** > - **Pro**: Simple, no dependencies > - **Pro**: Durable by default > - **Con**: No atomicity guarantees > - **Con**: Race conditions on concurrent access > - **Con**: No built-in monitoring **Decision**: Redis provides the best balance of speed, reliability, and simplicity for our use case. ## Why PostgreSQL Streaming Replication? We use PostgreSQL's built-in streaming replication rather than application-level sync: ### Streaming Replication Benefits **1. Built-in Mechanism** > - Native PostgreSQL feature > - Battle-tested in production > - No custom sync logic needed **2. Log Sequence Numbers (LSN)** > - Precise replication state tracking > - Know exactly what data has replicated > - Enable smart cache management **3. Read-Only Replicas** > - Safe by design (can't accidentally write) > - Multiple replicas possible > - No conflict resolution needed **4. Continuous Sync** > - Changes stream as they happen > - No batch delays > - Minimal lag when network available **5. Automatic Catchup** > - Replica auto-catches up after network restoration > - Uses WAL (Write-Ahead Log) for efficiency > - No manual intervention required ### Why Not Application-Level Sync? **Alternative**: Sync tables at application level (e.g., with timestamps) **Problems**: 1. **Conflict Resolution**: What if same record modified at both sites? 2. **Cascading Updates**: Foreign key relationships become complex 3. **Schema Changes**: Every DB change needs sync logic update 4. **Partial Failure**: Some tables sync, others don't - inconsistent state 5. **Performance**: Full table scans or complex timestamp tracking **PostgreSQL streaming replication** avoids all these issues by replicating at the database engine level. ## Why Eventual Consistency? We explicitly choose eventual consistency over strong consistency: ### The Trade-Off **Strong Consistency** (rejected): > - Every write waits for confirmation from all replicas > - Guarantees immediate consistency everywhere > - **Blocks when network down** ❌ **Eventual Consistency** (chosen): > - Writes succeed locally, sync in background > - Data becomes consistent "eventually" (seconds to minutes) > - **Never blocks operations** ✅ ### Why This Is Acceptable **For Observatory Operations**: - Recording observation at 00:00:00 doesn't need to be visible in Cologne at 00:00:01 - Scientists querying data hours/days later don't care about seconds of lag - **Reliability > Immediate Consistency** **For UI Users**: - Transfer monitoring: Seconds of lag is imperceptible - Data browsing: Minutes-old data is fine for most use cases - Real-time updates: WebSockets provide immediate local feedback **Critical**: We track replication state precisely with LSN, so we *know* when consistency is achieved. ## Why Not Multi-Master? **Multi-Master Replication** (rejected): Both sites can write, changes sync bidirectionally **Sounds Good But**: 1. **Conflict Resolution**: What if both sites create same ID? ```sql -- Site A creates: INSERT INTO obs (id=123, status='running'); -- Simultaneously, Site B creates: INSERT INTO obs (id=123, status='completed'); -- Which one wins when they sync? ``` 2. **Complex Logic**: Need custom resolution for every conflict type 3. **Partial Updates**: Site A updates field X, Site B updates field Y - merge how? 4. **Schema Evolution**: Database changes must work on both sites simultaneously 5. **Testing Complexity**: Every scenario needs conflict resolution test **Our Single-Master Approach**: - All writes go to Cologne (authoritative) - Observatory buffers and forwards - No conflicts possible - Simple, predictable behavior ### Cost of This Approach We pay a cost for this simplicity: - Observatory can't immediately see writes from Cologne (read-only replica) - Network partition means buffered data accumulates - Main site failure blocks all writes (but buffering prevents data loss) **We accept these costs** because the alternative (conflict resolution) is far more complex and error-prone. Writes from Cologne are not needed immediately. ## Technology Choices ### Framework: FastAPI **Why FastAPI**: - Modern async Python framework - Automatic OpenAPI documentation - Type hints and validation (Pydantic) - WebSocket support built-in - High performance (comparable to Go, Node.js) **Why not Django/Flask**: - Django: Too heavyweight, REST framework less modern - Flask: Lacks async support, no automatic docs ### ORM: SQLAlchemy **Why SQLAlchemy**: - Industry standard Python ORM - Powerful query builder - Async support (SQLAlchemy 1.4+) - Used by ops-db package (consistency) **Why not Django ORM**: - Tied to Django framework - Less flexible for complex queries ### Database: PostgreSQL **Why PostgreSQL**: - Streaming replication built-in - LSN tracking for replication state - JSON/JSONB for flexible schemas - Proven at scale - Excellent ops-db package support **Why not MySQL/MariaDB**: - Replication less sophisticated - No LSN equivalent - Weaker JSON support ### Cache/Buffer: Redis **Why Redis**: - Discussed in detail above - In-memory speed - Versatile data structures - Simple deployment ### Language: Python **Why Python**: - Astronomy/science community standard - Rich ecosystem (astropy, numpy, etc.) - Easy for scientists to extend - Same language as ops-db package ## Summary Every major design decision has a rationale: - **Unified API**: Shared infrastructure, planned split later - **Redis Buffering**: Speed + reliability + simplicity - **PostgreSQL Replication**: Built-in, LSN tracking, proven - **Eventual Consistency**: Reliability over immediate consistency - **Single Master**: Simplicity over conflict resolution - **FastAPI + SQLAlchemy + PostgreSQL + Redis**: Modern, proven, Python ecosystem These decisions optimize for the **critical requirement**: **Observatory operations must never fail due to network issues.** ## Next Steps - {doc}`distributed-architecture` - How the database topology works - {doc}`reliability-first` - Deep dive into reliability mechanisms