# Design Philosophy ```{eval-rst} .. verified:: 2025-10-16 :reviewer: Christof Buchbender ``` Understanding the "why" behind the ops-db-api architecture is crucial for working effectively with the system. This section explains the design decisions, challenges, and trade-offs that shaped the API. ```{contents} Table of Contents :depth: 2 :local: true ``` ## Core Principles The ops-db-api is built on four foundational principles: 1. **Reliability Over Latency** Observatory operations must never fail due to network issues. We accept the cost of additional complexity and occasional latency in exchange for guaranteed operation. 2. **Eventual Consistency** Data doesn't need to be immediately consistent everywhere. What matters is that it *eventually* becomes consistent, and we know *when* it has. 3. **Transparent Complexity** The system's complexity should be invisible to endpoint developers and API consumers. Endpoints look like normal REST APIs; buffering happens automatically. 4. **LSN-Based Precision** We don't guess about replication state. PostgreSQL's Log Sequence Numbers (LSN) tell us exactly when data has reached each replica. ## The Challenge The CCAT observatory operates at 5600m altitude in the Atacama Desert of Chile. The main database is in Cologne, Germany. This presents unique challenges: **Physical Reality** : - 11,000+ km distance between sites - Network connectivity is unreliable - Bandwidth may be limited - Latency is high when connected **Operational Reality** : - Telescope observations can't wait for network - Data generation is continuous - Operations staff need immediate feedback **Our Approach** : - Buffer writes locally in Redis (fast, reliable) - Process asynchronously when network available - Track replication with LSN (know exactly when data is replicated) - Merge reads from database + buffer (consistent view) ## Why This Architecture? The architecture described in this documentation solves real problems encountered in production: **Problem**: Network failures block observations : **Solution**: Local transaction buffering with Redis ensures operations never block **Problem**: Didn't know if data reached main database : **Solution**: LSN tracking provides precision confirmation **Problem**: Stale data on local reads after writes : **Solution**: Smart query manager merges buffered + persisted data **Problem**: No way to update buffered records : **Solution**: Read buffer manager tracks mutable updates See {doc}`design-rationale` for detailed reasoning. ## Who This API Serves The API has two distinct user groups with different needs: ### UI Users (Scientists & Operators) - **Need**: Real-time visibility into observatory operations - **Tolerance**: Can accept slight staleness (seconds to minutes) - **Authentication**: GitHub OAuth with personal accounts - **Priority**: Rich queries, dashboards, data exploration ### Observatory Services (Automated Scripts) - **Need**: Reliable recording of and looking up observations and data - **Tolerance**: Cannot tolerate failures, accepts eventual consistency - **Authentication**: API tokens for service accounts - **Priority**: High reliability, buffering, automatic retry This dual nature explains why we have: - Two authentication methods (GitHub OAuth + API tokens) - Two operation types (critical + non-critical) - Two site types (main + secondary) ## Evolution and Future The current architecture is intentional but not final: **Current State (2025)** : - Single API tree in source code serves both UI and operations **Planned Evolution** : - Split into separate UI and Operations APIs tree in source code ## Navigation Explore the philosophy in detail: ```{toctree} :maxdepth: 1 design-rationale distributed-architecture reliability-first ``` ## Related Sections - {doc}`../architecture/system-overview` - How the system is built - {doc}`../deep-dive/transaction-buffering/overview` - How buffering works - {doc}`../tutorials/observatory-integration/recording-observations` - Using the API at the observatory