Design Philosophy#
Understanding the “why” behind the ops-db-api architecture is crucial for working effectively with the system. This section explains the design decisions, challenges, and trade-offs that shaped the API.
Core Principles#
The ops-db-api is built on four foundational principles:
Reliability Over Latency
Observatory operations must never fail due to network issues. We accept the cost of additional complexity and occasional latency in exchange for guaranteed operation.
Eventual Consistency
Data doesn’t need to be immediately consistent everywhere. What matters is that it eventually becomes consistent, and we know when it has.
Transparent Complexity
The system’s complexity should be invisible to endpoint developers and API consumers. Endpoints look like normal REST APIs; buffering happens automatically.
LSN-Based Precision
We don’t guess about replication state. PostgreSQL’s Log Sequence Numbers (LSN) tell us exactly when data has reached each replica.
The Challenge#
The CCAT observatory operates at 5600m altitude in the Atacama Desert of Chile. The main database is in Cologne, Germany. This presents unique challenges:
- Physical Reality
11,000+ km distance between sites
Network connectivity is unreliable
Bandwidth may be limited
Latency is high when connected
- Operational Reality
Telescope observations can’t wait for network
Data generation is continuous
Operations staff need immediate feedback
- Our Approach
Buffer writes locally in Redis (fast, reliable)
Process asynchronously when network available
Track replication with LSN (know exactly when data is replicated)
Merge reads from database + buffer (consistent view)
Why This Architecture?#
The architecture described in this documentation solves real problems encountered in production:
- Problem: Network failures block observations
Solution: Local transaction buffering with Redis ensures operations never block
- Problem: Didn’t know if data reached main database
Solution: LSN tracking provides precision confirmation
- Problem: Stale data on local reads after writes
Solution: Smart query manager merges buffered + persisted data
- Problem: No way to update buffered records
Solution: Read buffer manager tracks mutable updates
See Design Rationale for detailed reasoning.
Who This API Serves#
The API has two distinct user groups with different needs:
UI Users (Scientists & Operators)#
Need: Real-time visibility into observatory operations
Tolerance: Can accept slight staleness (seconds to minutes)
Authentication: GitHub OAuth with personal accounts
Priority: Rich queries, dashboards, data exploration
Observatory Services (Automated Scripts)#
Need: Reliable recording of and looking up observations and data
Tolerance: Cannot tolerate failures, accepts eventual consistency
Authentication: API tokens for service accounts
Priority: High reliability, buffering, automatic retry
This dual nature explains why we have:
Two authentication methods (GitHub OAuth + API tokens)
Two operation types (critical + non-critical)
Two site types (main + secondary)
Evolution and Future#
The current architecture is intentional but not final:
- Current State (2025)
Single API tree in source code serves both UI and operations
- Planned Evolution
Split into separate UI and Operations APIs tree in source code