Transaction Buffering#
The transaction buffering system is the core innovation that enables reliable observatory operations despite unreliable network connectivity.
Introduction#
The Problem: Observatory operations must never block or fail due to network issues.
The Solution: Buffer write operations locally in Redis and execute them asynchronously against the main database when the network is available.
The Innovation: LSN-based replication tracking provides precise knowledge of when buffered data has reached the local replica, enabling smart cache management.
System Components#
The buffering system consists of seven interconnected components:
Site Configuration: Determines buffering behavior based on site type
Transaction Builder: Constructs multi-step atomic operations with dependencies
Transaction Manager: Buffers transactions to Redis and manages state
Background Processor: Continuously processes buffered transactions
Transaction Executor: Executes transactions against the main database
LSN Tracker: Monitors PostgreSQL replication progress
Smart Query Manager: Merges database + buffered + read buffer data
Read Buffer Manager: Tracks mutable updates to buffered records
graph TB
Client[Client Request]
Decorator[@critical_operation]
Builder[Transaction Builder]
Manager[Transaction Manager]
Redis[(Redis Buffer)]
BG[Background Processor]
Executor[Transaction Executor]
MainDB[(Main Database)]
LSN[LSN Tracker]
Replica[(Local Replica)]
Smart[Smart Query Manager]
ReadBuf[Read Buffer Manager]
Client --> Decorator
Decorator --> Builder
Builder --> Manager
Manager --> Redis
Redis --> BG
BG --> Executor
Executor --> MainDB
Executor --> LSN
LSN --> Replica
LSN --> Manager
MainDB -.Replication.-> Replica
Smart --> Replica
Smart --> Redis
Smart --> ReadBuf
Client -.Query.-> Smart
style Redis fill:#FFD700
style MainDB fill:#90EE90
style Replica fill:#FFB6C1
Complete Lifecycle#
From client request to replicated data:
sequenceDiagram
participant Client
participant API
participant Builder
participant Manager
participant Redis
participant BG as Background<br/>Processor
participant Main as Main DB
participant LSN as LSN Tracker
participant Replica
Note over Client,Replica: Phase 1: Buffering
Client->>API: POST /executed_obs_units/start
API->>Builder: Build transaction
Builder->>Builder: Generate UUID
Builder-->>API: Transaction
API->>Manager: Buffer transaction
Manager->>Redis: LPUSH to buffer
Manager->>Redis: Cache generated IDs
Manager->>Redis: Cache buffered data
Redis-->>Manager: OK
Manager-->>API: Transaction ID
API-->>Client: 201 Created (UUID)
Note over Client,Replica: Phase 2: Processing
loop Every 1 second
BG->>Redis: RPOP from buffer
Redis-->>BG: Transaction
BG->>Main: Execute transaction
Main-->>BG: Success
BG->>Main: SELECT pg_current_wal_lsn()
Main-->>BG: LSN: 0/12345678
end
Note over Client,Replica: Phase 3: Replication
Main->>Replica: Stream WAL
Replica->>Replica: Replay WAL
Note over Client,Replica: Phase 4: Confirmation
loop Every 0.1 seconds
LSN->>Replica: SELECT pg_last_wal_replay_lsn()
Replica-->>LSN: Current LSN
LSN->>LSN: Compare with target LSN
end
Note over Client,Replica: Phase 5: Cleanup
LSN-->>BG: Replicated
BG->>Redis: DELETE cached data
BG->>Redis: DELETE from write-through cache
Data Structures#
Key Redis Data Structures#
Transaction Buffer (List):
Key: site:{site_name}:transaction_buffer
Type: List (LPUSH/RPOP)
Content: JSON-serialized SQLAlchemyBufferedTransaction
Transaction Status (String with TTL):
Key: site:{site_name}:transaction:{transaction_id}
Type: String
TTL: 3600 seconds (1 hour)
Content: JSON-serialized transaction status
Write-Through Cache (Hash):
Key: site:{site_name}:cache:ids:{model}:{id}
Type: String
TTL: Varies (extended until replicated)
Content: Pre-generated ID
Buffered Data Cache (String):
Key: site:{site_name}:buffered:{model}:{id}
Type: String
TTL: Varies (deleted when replicated)
Content: JSON-serialized record data
Read Buffer (Hash):
Key: site:{site_name}:read_buffer:{model}:{id}
Type: Hash
Fields: Updated field values
TTL: Varies (deleted when replicated)
Transaction Structure#
SQLAlchemyBufferedTransaction model:
class SQLAlchemyBufferedTransaction:
transaction_id: str # UUID
endpoint: str # e.g., "create_observation"
site: str # e.g., "observatory"
timestamp: datetime
status: str # "pending", "processing", "completed", "failed"
retry_count: int
steps: List[SQLAlchemyTransactionStep]
SQLAlchemyTransactionStep model:
class SQLAlchemyTransactionStep:
step_id: str
operation: str # "create", "update", "delete", "bulk_create"
model_class: str # "ExecutedObsUnit"
data: Dict[str, Any] # Record data
conditions: Dict[str, Any] # For updates/deletes
dependencies: List[str] # step_ids this depends on
Performance Characteristics#
Latency#
Buffering operation (at secondary site):
Transaction building: < 10ms
Redis LPUSH: < 1ms
Cache writes: < 5ms
Response generation: < 1ms
Total: < 20ms (typical)
Background processing:
Transaction execution: 50-500ms (depends on complexity)
LSN capture: < 5ms
LSN comparison: < 10ms
Total: 100ms - 1s per transaction
Replication lag:
Good network: 1-10 seconds
Slow network: 10-60 seconds
Network down: Accumulates
Throughput#
Single background processor:
Simple transactions: ~10-20/second
Complex transactions: ~2-5/second
Bulk operations: ~1-2/second
Bottlenecks:
Main database connection latency
Transaction complexity
Network bandwidth (replication)
Reliability#
Buffer persistence:
Redis AOF (Append-Only File) enabled
Survives process restart
Data loss risk: Last 1 second of writes (fsync policy)
Retry mechanism:
Configurable attempts (default: 3)
Exponential backoff (5s, 10s, 20s)
Failed queue for manual intervention
Monitoring:
Buffer size tracking
Processing rate metrics
Failed transaction alerts
Section Contents#
Explore each component in detail:
Quick Links#
Transaction Buffering Overview - Detailed system overview
Transaction Builder - Building complex transactions
LSN Tracking - PostgreSQL replication tracking
Smart Query Manager - Merging data sources
Buffered Critical Operations - Using in endpoints
Debugging Transaction Buffering - Troubleshooting