Transaction Buffering ===================== The transaction buffering system is the core innovation that enables reliable observatory operations despite unreliable network connectivity. .. contents:: Table of Contents :local: :depth: 2 Introduction ------------ **The Problem**: Observatory operations must never block or fail due to network issues. **The Solution**: Buffer write operations locally in Redis and execute them asynchronously against the main database when the network is available. **The Innovation**: LSN-based replication tracking provides precise knowledge of when buffered data has reached the local replica, enabling smart cache management. System Components ----------------- The buffering system consists of seven interconnected components: 1. **Site Configuration**: Determines buffering behavior based on site type 2. **Transaction Builder**: Constructs multi-step atomic operations with dependencies 3. **Transaction Manager**: Buffers transactions to Redis and manages state 4. **Background Processor**: Continuously processes buffered transactions 5. **Transaction Executor**: Executes transactions against the main database 6. **LSN Tracker**: Monitors PostgreSQL replication progress 7. **Smart Query Manager**: Merges database + buffered + read buffer data 8. **Read Buffer Manager**: Tracks mutable updates to buffered records .. mermaid:: graph TB Client[Client Request] Decorator[@critical_operation] Builder[Transaction Builder] Manager[Transaction Manager] Redis[(Redis Buffer)] BG[Background Processor] Executor[Transaction Executor] MainDB[(Main Database)] LSN[LSN Tracker] Replica[(Local Replica)] Smart[Smart Query Manager] ReadBuf[Read Buffer Manager] Client --> Decorator Decorator --> Builder Builder --> Manager Manager --> Redis Redis --> BG BG --> Executor Executor --> MainDB Executor --> LSN LSN --> Replica LSN --> Manager MainDB -.Replication.-> Replica Smart --> Replica Smart --> Redis Smart --> ReadBuf Client -.Query.-> Smart style Redis fill:#FFD700 style MainDB fill:#90EE90 style Replica fill:#FFB6C1 Complete Lifecycle ------------------ From client request to replicated data: .. mermaid:: sequenceDiagram participant Client participant API participant Builder participant Manager participant Redis participant BG as Background
Processor participant Main as Main DB participant LSN as LSN Tracker participant Replica Note over Client,Replica: Phase 1: Buffering Client->>API: POST /executed_obs_units/start API->>Builder: Build transaction Builder->>Builder: Generate UUID Builder-->>API: Transaction API->>Manager: Buffer transaction Manager->>Redis: LPUSH to buffer Manager->>Redis: Cache generated IDs Manager->>Redis: Cache buffered data Redis-->>Manager: OK Manager-->>API: Transaction ID API-->>Client: 201 Created (UUID) Note over Client,Replica: Phase 2: Processing loop Every 1 second BG->>Redis: RPOP from buffer Redis-->>BG: Transaction BG->>Main: Execute transaction Main-->>BG: Success BG->>Main: SELECT pg_current_wal_lsn() Main-->>BG: LSN: 0/12345678 end Note over Client,Replica: Phase 3: Replication Main->>Replica: Stream WAL Replica->>Replica: Replay WAL Note over Client,Replica: Phase 4: Confirmation loop Every 0.1 seconds LSN->>Replica: SELECT pg_last_wal_replay_lsn() Replica-->>LSN: Current LSN LSN->>LSN: Compare with target LSN end Note over Client,Replica: Phase 5: Cleanup LSN-->>BG: Replicated BG->>Redis: DELETE cached data BG->>Redis: DELETE from write-through cache Data Structures --------------- Key Redis Data Structures ~~~~~~~~~~~~~~~~~~~~~~~~~~ **Transaction Buffer** (List): .. code-block:: text Key: site:{site_name}:transaction_buffer Type: List (LPUSH/RPOP) Content: JSON-serialized SQLAlchemyBufferedTransaction **Transaction Status** (String with TTL): .. code-block:: text Key: site:{site_name}:transaction:{transaction_id} Type: String TTL: 3600 seconds (1 hour) Content: JSON-serialized transaction status **Write-Through Cache** (Hash): .. code-block:: text Key: site:{site_name}:cache:ids:{model}:{id} Type: String TTL: Varies (extended until replicated) Content: Pre-generated ID **Buffered Data Cache** (String): .. code-block:: text Key: site:{site_name}:buffered:{model}:{id} Type: String TTL: Varies (deleted when replicated) Content: JSON-serialized record data **Read Buffer** (Hash): .. code-block:: text Key: site:{site_name}:read_buffer:{model}:{id} Type: Hash Fields: Updated field values TTL: Varies (deleted when replicated) Transaction Structure ~~~~~~~~~~~~~~~~~~~~~ ``SQLAlchemyBufferedTransaction`` model: .. code-block:: python class SQLAlchemyBufferedTransaction: transaction_id: str # UUID endpoint: str # e.g., "create_observation" site: str # e.g., "observatory" timestamp: datetime status: str # "pending", "processing", "completed", "failed" retry_count: int steps: List[SQLAlchemyTransactionStep] ``SQLAlchemyTransactionStep`` model: .. code-block:: python class SQLAlchemyTransactionStep: step_id: str operation: str # "create", "update", "delete", "bulk_create" model_class: str # "ExecutedObsUnit" data: Dict[str, Any] # Record data conditions: Dict[str, Any] # For updates/deletes dependencies: List[str] # step_ids this depends on Performance Characteristics --------------------------- Latency ~~~~~~~ **Buffering operation** (at secondary site): * Transaction building: < 10ms * Redis LPUSH: < 1ms * Cache writes: < 5ms * Response generation: < 1ms * **Total**: < 20ms (typical) **Background processing**: * Transaction execution: 50-500ms (depends on complexity) * LSN capture: < 5ms * LSN comparison: < 10ms * **Total**: 100ms - 1s per transaction **Replication lag**: * Good network: 1-10 seconds * Slow network: 10-60 seconds * Network down: Accumulates Throughput ~~~~~~~~~~ **Single background processor**: * Simple transactions: ~10-20/second * Complex transactions: ~2-5/second * Bulk operations: ~1-2/second **Bottlenecks**: * Main database connection latency * Transaction complexity * Network bandwidth (replication) Reliability ~~~~~~~~~~~ **Buffer persistence**: * Redis AOF (Append-Only File) enabled * Survives process restart * Data loss risk: Last 1 second of writes (fsync policy) **Retry mechanism**: * Configurable attempts (default: 3) * Exponential backoff (5s, 10s, 20s) * Failed queue for manual intervention **Monitoring**: * Buffer size tracking * Processing rate metrics * Failed transaction alerts Section Contents ---------------- Explore each component in detail: .. toctree:: :maxdepth: 1 overview transaction-builder transaction-manager background-processor lsn-tracking smart-query-manager read-buffer-manager Quick Links ----------- * :doc:`overview` - Detailed system overview * :doc:`transaction-builder` - Building complex transactions * :doc:`lsn-tracking` - PostgreSQL replication tracking * :doc:`smart-query-manager` - Merging data sources * :doc:`../../tutorials/complex-endpoints/buffered-critical-operations` - Using in endpoints * :doc:`../../development/debugging-buffering` - Troubleshooting Related Topics -------------- * :doc:`../../philosophy/reliability-first` - Why we need buffering * :doc:`../../architecture/database-topology` - Database replication architecture * :doc:`../../architecture/site-configuration` - Site configuration details