Design Philosophy#

Documentation Verified Last checked: 2025-10-16 Reviewer: Christof Buchbender

Understanding the “why” behind the ops-db-api architecture is crucial for working effectively with the system. This section explains the design decisions, challenges, and trade-offs that shaped the API.

Core Principles#

The ops-db-api is built on four foundational principles:

  1. Reliability Over Latency

    Observatory operations must never fail due to network issues. We accept the cost of additional complexity and occasional latency in exchange for guaranteed operation.

  2. Eventual Consistency

    Data doesn’t need to be immediately consistent everywhere. What matters is that it eventually becomes consistent, and we know when it has.

  3. Transparent Complexity

    The system’s complexity should be invisible to endpoint developers and API consumers. Endpoints look like normal REST APIs; buffering happens automatically.

  4. LSN-Based Precision

    We don’t guess about replication state. PostgreSQL’s Log Sequence Numbers (LSN) tell us exactly when data has reached each replica.

The Challenge#

The CCAT observatory operates at 5600m altitude in the Atacama Desert of Chile. The main database is in Cologne, Germany. This presents unique challenges:

Physical Reality
  • 11,000+ km distance between sites

  • Network connectivity is unreliable

  • Bandwidth may be limited

  • Latency is high when connected

Operational Reality
  • Telescope observations can’t wait for network

  • Data generation is continuous

  • Operations staff need immediate feedback

Our Approach
  • Buffer writes locally in Redis (fast, reliable)

  • Process asynchronously when network available

  • Track replication with LSN (know exactly when data is replicated)

  • Merge reads from database + buffer (consistent view)

Why This Architecture?#

The architecture described in this documentation solves real problems encountered in production:

Problem: Network failures block observations

Solution: Local transaction buffering with Redis ensures operations never block

Problem: Didn’t know if data reached main database

Solution: LSN tracking provides precision confirmation

Problem: Stale data on local reads after writes

Solution: Smart query manager merges buffered + persisted data

Problem: No way to update buffered records

Solution: Read buffer manager tracks mutable updates

See Design Rationale for detailed reasoning.

Who This API Serves#

The API has two distinct user groups with different needs:

UI Users (Scientists & Operators)#

  • Need: Real-time visibility into observatory operations

  • Tolerance: Can accept slight staleness (seconds to minutes)

  • Authentication: GitHub OAuth with personal accounts

  • Priority: Rich queries, dashboards, data exploration

Observatory Services (Automated Scripts)#

  • Need: Reliable recording of and looking up observations and data

  • Tolerance: Cannot tolerate failures, accepts eventual consistency

  • Authentication: API tokens for service accounts

  • Priority: High reliability, buffering, automatic retry

This dual nature explains why we have:

  • Two authentication methods (GitHub OAuth + API tokens)

  • Two operation types (critical + non-critical)

  • Two site types (main + secondary)

Evolution and Future#

The current architecture is intentional but not final:

Current State (2025)
  • Single API tree in source code serves both UI and operations

Planned Evolution
  • Split into separate UI and Operations APIs tree in source code