Architecture Overview#

Documentation Verified Last checked: 2025-11-12 Reviewer: Christof Buchbender

This section provides a technical deep dive into the ops-db-api architecture, including system design, database topology, site configuration, authentication, and endpoint categorization.

Introduction#

The ops-db-api is built on a distributed database architecture designed to ensure observatory operations never fail due to network issues. This architecture combines:

  • FastAPI for modern async Python web framework

  • PostgreSQL for relational database with streaming replication

  • Redis for transaction buffering and caching

  • SQLAlchemy for ORM and database abstraction

  • Custom transaction buffering for network resilience

Key Architectural Features#

  1. Site-Aware Behavior

    The API behaves differently based on site type (MAIN vs SECONDARY), automatically routing operations appropriately.

  2. Transaction Buffering

    Critical operations at secondary sites buffer in Redis and execute asynchronously against the main database.

  3. LSN-Based Replication Tracking

    PostgreSQL Log Sequence Numbers provide precise knowledge of replication state for smart cache management.

  4. Smart Query Management

    Reads merge data from database + buffer + read buffer for consistent views even during replication lag.

  5. Dual Authentication

    Supports both GitHub OAuth (for UI users) and API tokens (for service scripts) through unified interface.

Architecture Diagram#

High-level system architecture:

        graph TB
    subgraph "Client Layer"
        UI[Web Frontend<br/>ops-db-ui]
        Scripts[Observatory<br/>Scripts]
    end

    subgraph "API Layer"
        FastAPI[FastAPI Application]
        Routers[Routers<br/>transfer, obs_unit,<br/>executed_obs_units, etc.]
        Auth[Authentication<br/>GitHub OAuth + API Tokens]
    end

    subgraph "Business Logic"
        TxBuilder[Transaction Builder]
        TxManager[Transaction Manager]
        SmartQuery[Smart Query Manager]
    end

    subgraph "Infrastructure"
        Redis[Redis<br/>Buffer + Cache]
        BgProcessor[Background Processor]
        LSNTracker[LSN Tracker]
    end

    subgraph "Data Layer"
        MainDB[(Main Database<br/>Cologne)]
        ReplicaDB[(Replica Database<br/>Observatory)]
    end

    UI -->|HTTP/WS| FastAPI
    Scripts -->|HTTP| FastAPI
    FastAPI --> Auth
    FastAPI --> Routers
    Routers --> TxBuilder
    Routers --> SmartQuery
    TxBuilder --> TxManager
    TxManager --> Redis
    TxManager --> BgProcessor
    BgProcessor --> MainDB
    BgProcessor --> LSNTracker
    LSNTracker --> ReplicaDB
    SmartQuery --> ReplicaDB
    SmartQuery --> Redis
    MainDB -.->|Replication| ReplicaDB

    style FastAPI fill:#90EE90
    style Redis fill:#FFD700
    style MainDB fill:#87CEEB
    style ReplicaDB fill:#FFB6C1
    

Component Responsibilities#

API Layer#

FastAPI Application (main.py):

  • Application lifecycle management

  • Router registration

  • CORS configuration

  • WebSocket connection tracking

  • Startup/shutdown hooks

Routers:

  • UI-focused: transfer, observing_program, sources, visibility, instruments

  • Operations-focused: executed_obs_units, raw_data_files, raw_data_package, staging

  • Shared: auth, github_auth, api_tokens, site, demo

Authentication:

  • Unified token validation (JWT + API tokens)

  • Role-based access control (RBAC)

  • Permission-based authorization

  • Usage tracking for API tokens

Business Logic Layer#

Transaction Builder:

  • Constructs multi-step database transactions

  • Generates pre-allocated IDs

  • Manages dependencies between steps

  • Supports CREATE, UPDATE, DELETE, BULK_CREATE operations

Transaction Manager:

  • Buffers transactions to Redis

  • Manages retry logic and failed queue

  • Provides transaction status queries

  • Implements write-through caching

Smart Query Manager:

  • Merges database + buffered + read buffer data

  • Handles type conversion for filtering

  • Retrieves related records via foreign keys

  • Deduplicates and prioritizes fresher data

Infrastructure Layer#

Redis:

  • Transaction buffer (list: LPUSH/RPOP)

  • Transaction status (hash with TTL)

  • Write-through cache (generated IDs)

  • Buffered data cache (for smart queries)

  • Read buffer (mutable updates to buffered records)

Background Processor:

  • Polls transaction buffer continuously

  • Executes buffered transactions on main DB

  • Implements retry with exponential backoff

  • Health monitoring and statistics

LSN Tracker:

  • Captures LSN after main DB writes

  • Polls replica for replication progress

  • Determines when to cleanup caches

  • Extends cache TTL if replication delayed

Data Layer#

Main Database (PostgreSQL):

  • Single authoritative source of truth

  • Accepts all write operations

  • Generates WAL for replication

  • Located in Cologne, Germany

Replica Database (PostgreSQL):

  • Read-only streaming replica

  • Receives WAL from main database

  • Serves local reads at secondary sites

  • Located at observatory (Chile) and potentially other sites

Request Flow Examples#

UI Read Request#

        sequenceDiagram
    participant UI as Web Frontend
    participant API as FastAPI
    participant Auth as Authentication
    participant Router as Transfer Router
    participant DB as Local Database

    UI->>API: GET /api/transfer/overview
    API->>Auth: Verify JWT token
    Auth-->>API: User authenticated
    API->>Router: Route to handler
    Router->>DB: Query transfers
    DB-->>Router: Transfer data
    Router-->>API: Format response
    API-->>UI: JSON response
    

Observatory Write Request (Buffered)#

        sequenceDiagram
    participant Script as Observatory Script
    participant API as FastAPI
    participant Auth as Authentication
    participant Router as Executed Obs Router
    participant Builder as Transaction Builder
    participant Manager as Transaction Manager
    participant Redis as Redis Buffer

    Script->>API: POST /executed_obs_units/start
    API->>Auth: Verify API token
    Auth-->>API: Service authenticated
    API->>Router: Route to handler (@critical_operation)
    Router->>Builder: Build transaction
    Builder->>Builder: Generate UUID
    Builder-->>Router: Transaction with pre-gen ID
    Router->>Manager: Buffer transaction
    Manager->>Redis: LPUSH to buffer
    Redis-->>Manager: OK
    Manager-->>Router: Transaction ID
    Router-->>API: 201 Created
    API-->>Script: {"id": "uuid", "status": "buffered"}
    

Background Processing#

        sequenceDiagram
    participant BG as Background Processor
    participant Redis as Redis Buffer
    participant Executor as Transaction Executor
    participant MainDB as Main Database
    participant LSN as LSN Tracker
    participant Replica as Replica Database

    loop Every 1 second
        BG->>Redis: RPOP from buffer
        Redis-->>BG: Transaction
        BG->>Executor: Execute transaction
        Executor->>MainDB: INSERT/UPDATE/DELETE
        MainDB-->>Executor: Success
        Executor->>MainDB: SELECT pg_current_wal_lsn()
        MainDB-->>Executor: LSN: 0/12345678
        Executor-->>BG: Success + LSN
        BG->>LSN: Check replication (LSN: 0/12345678)
        LSN->>Replica: SELECT pg_last_wal_replay_lsn()
        Replica-->>LSN: LSN: 0/12345600 (behind)
        LSN-->>BG: Not yet replicated
        BG->>Redis: Extend cache TTL
    end
    

Section Contents#

Explore the architecture in detail:

Key Takeaways#

The architecture is designed with several key principles:

  1. Network Resilience: Operations never fail due to network issues (transaction buffering)

  2. Precise Replication Tracking: LSN-based tracking eliminates guesswork

  3. Consistent Views: Smart queries merge multiple data sources

  4. Flexible Authentication: Supports both interactive users and automation

  5. Site-Aware Behavior: Automatically adapts to site type (main vs secondary)

This architecture enables reliable operation in challenging network environments while maintaining data consistency and providing responsive user experiences.