Design Philosophy#

✓

Documentation Verified Last checked: 2025-10-16 Reviewer: Christof Buchbender

Understanding the “why” behind the ops-db-api architecture is crucial for working effectively with the system. This section explains the design decisions, challenges, and trade-offs that shaped the API.

Core Principles #

The ops-db-api is built on four foundational principles:

Reliability Over Latency

Observatory operations must never fail due to network issues. We accept the cost of additional complexity and occasional latency in exchange for guaranteed operation.
Eventual Consistency

Data doesn’t need to be immediately consistent everywhere. What matters is that it eventually becomes consistent, and we know when it has.
Transparent Complexity

The system’s complexity should be invisible to endpoint developers and API consumers. Endpoints look like normal REST APIs; buffering happens automatically.
LSN-Based Precision

We don’t guess about replication state. PostgreSQL’s Log Sequence Numbers (LSN) tell us exactly when data has reached each replica.

The Challenge #

The CCAT observatory operates at 5600m altitude in the Atacama Desert of Chile. The main database is in Cologne, Germany. This presents unique challenges:

Physical Reality

11,000+ km distance between sites
Network connectivity is unreliable
Bandwidth may be limited
Latency is high when connected

Operational Reality

Telescope observations can’t wait for network
Data generation is continuous
Operations staff need immediate feedback

Our Approach

Buffer writes locally in Redis (fast, reliable)
Process asynchronously when network available
Track replication with LSN (know exactly when data is replicated)
Merge reads from database + buffer (consistent view)

Why This Architecture?#

The architecture described in this documentation solves real problems encountered in production:

Problem: Network failures block observations: Solution: Local transaction buffering with Redis ensures operations never block
Problem: Didn’t know if data reached main database: Solution: LSN tracking provides precision confirmation
Problem: Stale data on local reads after writes: Solution: Smart query manager merges buffered + persisted data
Problem: No way to update buffered records: Solution: Read buffer manager tracks mutable updates

See Design Rationale for detailed reasoning.

Who This API Serves #

The API has two distinct user groups with different needs:

UI Users (Scientists & Operators)#

Need: Real-time visibility into observatory operations
Tolerance: Can accept slight staleness (seconds to minutes)
Authentication: GitHub OAuth with personal accounts
Priority: Rich queries, dashboards, data exploration

Observatory Services (Automated Scripts)#

Need: Reliable recording of and looking up observations and data
Tolerance: Cannot tolerate failures, accepts eventual consistency
Authentication: API tokens for service accounts
Priority: High reliability, buffering, automatic retry

This dual nature explains why we have:

Two authentication methods (GitHub OAuth + API tokens)
Two operation types (critical + non-critical)
Two site types (main + secondary)

Evolution and Future #

The current architecture is intentional but not final:

Current State (2025)

Single API tree in source code serves both UI and operations

Planned Evolution

Split into separate UI and Operations APIs tree in source code