Architecture Overview#
This section provides a technical deep dive into the ops-db-api architecture, including system design, database topology, site configuration, authentication, and endpoint categorization.
Introduction#
The ops-db-api is built on a distributed database architecture designed to ensure observatory operations never fail due to network issues. This architecture combines:
FastAPI for modern async Python web framework
PostgreSQL for relational database with streaming replication
Redis for transaction buffering and caching
SQLAlchemy for ORM and database abstraction
Custom transaction buffering for network resilience
Key Architectural Features#
Site-Aware Behavior
The API behaves differently based on site type (MAIN vs SECONDARY), automatically routing operations appropriately.
Transaction Buffering
Critical operations at secondary sites buffer in Redis and execute asynchronously against the main database.
LSN-Based Replication Tracking
PostgreSQL Log Sequence Numbers provide precise knowledge of replication state for smart cache management.
Smart Query Management
Reads merge data from database + buffer + read buffer for consistent views even during replication lag.
Dual Authentication
Supports both GitHub OAuth (for UI users) and API tokens (for service scripts) through unified interface.
Architecture Diagram#
High-level system architecture:
graph TB
subgraph "Client Layer"
UI[Web Frontend<br/>ops-db-ui]
Scripts[Observatory<br/>Scripts]
end
subgraph "API Layer"
FastAPI[FastAPI Application]
Routers[Routers<br/>transfer, obs_unit,<br/>executed_obs_units, etc.]
Auth[Authentication<br/>GitHub OAuth + API Tokens]
end
subgraph "Business Logic"
TxBuilder[Transaction Builder]
TxManager[Transaction Manager]
SmartQuery[Smart Query Manager]
end
subgraph "Infrastructure"
Redis[Redis<br/>Buffer + Cache]
BgProcessor[Background Processor]
LSNTracker[LSN Tracker]
end
subgraph "Data Layer"
MainDB[(Main Database<br/>Cologne)]
ReplicaDB[(Replica Database<br/>Observatory)]
end
UI -->|HTTP/WS| FastAPI
Scripts -->|HTTP| FastAPI
FastAPI --> Auth
FastAPI --> Routers
Routers --> TxBuilder
Routers --> SmartQuery
TxBuilder --> TxManager
TxManager --> Redis
TxManager --> BgProcessor
BgProcessor --> MainDB
BgProcessor --> LSNTracker
LSNTracker --> ReplicaDB
SmartQuery --> ReplicaDB
SmartQuery --> Redis
MainDB -.->|Replication| ReplicaDB
style FastAPI fill:#90EE90
style Redis fill:#FFD700
style MainDB fill:#87CEEB
style ReplicaDB fill:#FFB6C1
Component Responsibilities#
API Layer#
FastAPI Application (main.py):
Application lifecycle management
Router registration
CORS configuration
WebSocket connection tracking
Startup/shutdown hooks
Routers:
UI-focused:
transfer,observing_program,sources,visibility,instrumentsOperations-focused:
executed_obs_units,raw_data_files,raw_data_package,stagingShared:
auth,github_auth,api_tokens,site,demo
Authentication:
Unified token validation (JWT + API tokens)
Role-based access control (RBAC)
Permission-based authorization
Usage tracking for API tokens
Business Logic Layer#
Transaction Builder:
Constructs multi-step database transactions
Generates pre-allocated IDs
Manages dependencies between steps
Supports CREATE, UPDATE, DELETE, BULK_CREATE operations
Transaction Manager:
Buffers transactions to Redis
Manages retry logic and failed queue
Provides transaction status queries
Implements write-through caching
Smart Query Manager:
Merges database + buffered + read buffer data
Handles type conversion for filtering
Retrieves related records via foreign keys
Deduplicates and prioritizes fresher data
Infrastructure Layer#
Redis:
Transaction buffer (list: LPUSH/RPOP)
Transaction status (hash with TTL)
Write-through cache (generated IDs)
Buffered data cache (for smart queries)
Read buffer (mutable updates to buffered records)
Background Processor:
Polls transaction buffer continuously
Executes buffered transactions on main DB
Implements retry with exponential backoff
Health monitoring and statistics
LSN Tracker:
Captures LSN after main DB writes
Polls replica for replication progress
Determines when to cleanup caches
Extends cache TTL if replication delayed
Data Layer#
Main Database (PostgreSQL):
Single authoritative source of truth
Accepts all write operations
Generates WAL for replication
Located in Cologne, Germany
Replica Database (PostgreSQL):
Read-only streaming replica
Receives WAL from main database
Serves local reads at secondary sites
Located at observatory (Chile) and potentially other sites
Request Flow Examples#
UI Read Request#
sequenceDiagram
participant UI as Web Frontend
participant API as FastAPI
participant Auth as Authentication
participant Router as Transfer Router
participant DB as Local Database
UI->>API: GET /api/transfer/overview
API->>Auth: Verify JWT token
Auth-->>API: User authenticated
API->>Router: Route to handler
Router->>DB: Query transfers
DB-->>Router: Transfer data
Router-->>API: Format response
API-->>UI: JSON response
Observatory Write Request (Buffered)#
sequenceDiagram
participant Script as Observatory Script
participant API as FastAPI
participant Auth as Authentication
participant Router as Executed Obs Router
participant Builder as Transaction Builder
participant Manager as Transaction Manager
participant Redis as Redis Buffer
Script->>API: POST /executed_obs_units/start
API->>Auth: Verify API token
Auth-->>API: Service authenticated
API->>Router: Route to handler (@critical_operation)
Router->>Builder: Build transaction
Builder->>Builder: Generate UUID
Builder-->>Router: Transaction with pre-gen ID
Router->>Manager: Buffer transaction
Manager->>Redis: LPUSH to buffer
Redis-->>Manager: OK
Manager-->>Router: Transaction ID
Router-->>API: 201 Created
API-->>Script: {"id": "uuid", "status": "buffered"}
Background Processing#
sequenceDiagram
participant BG as Background Processor
participant Redis as Redis Buffer
participant Executor as Transaction Executor
participant MainDB as Main Database
participant LSN as LSN Tracker
participant Replica as Replica Database
loop Every 1 second
BG->>Redis: RPOP from buffer
Redis-->>BG: Transaction
BG->>Executor: Execute transaction
Executor->>MainDB: INSERT/UPDATE/DELETE
MainDB-->>Executor: Success
Executor->>MainDB: SELECT pg_current_wal_lsn()
MainDB-->>Executor: LSN: 0/12345678
Executor-->>BG: Success + LSN
BG->>LSN: Check replication (LSN: 0/12345678)
LSN->>Replica: SELECT pg_last_wal_replay_lsn()
Replica-->>LSN: LSN: 0/12345600 (behind)
LSN-->>BG: Not yet replicated
BG->>Redis: Extend cache TTL
end
Section Contents#
Explore the architecture in detail:
Key Takeaways#
The architecture is designed with several key principles:
Network Resilience: Operations never fail due to network issues (transaction buffering)
Precise Replication Tracking: LSN-based tracking eliminates guesswork
Consistent Views: Smart queries merge multiple data sources
Flexible Authentication: Supports both interactive users and automation
Site-Aware Behavior: Automatically adapts to site type (main vs secondary)
This architecture enables reliable operation in challenging network environments while maintaining data consistency and providing responsive user experiences.