# Data Transfer System ```{eval-rst} .. verified:: 2025-10-16 :reviewer: Christof Buchbender ``` The CCAT Data Transfer System orchestrates the complete lifecycle of observatory data from telescope acquisition through long-term archival storage (LTA) and to clean up of upstream data upon successful archival. This system is designed to handle peak data rates of 8TB/day across geographically distributed sites while ensuring data integrity, availability, and efficient resource utilization. ```{toctree} :caption: 'Contents:' :hidden: true :maxdepth: 2 source/philosophy source/transfer_route_CCAT_Cologne source/concepts source/pipeline source/routing source/monitoring source/lifecycle source/api/index ``` ## Overview **What it does:** The Data Transfer System manages the automated flow of raw astronomical data through multiple processing stages: - Package raw data files from instrument computers - Transfer data packages between geographically distributed sites - Verify data integrity at each stage - Archive data to long-term storage (LTA) - Stage data for scientific processing - Clean up original and temporary storage based on retention policies **Why it matters:** With peak data rates of 8TB/day and operations spanning multiple continents, manual data management would be error-prone, inefficient, and unable to meet scientific requirements. The automated system ensures: - **Reliability**: Data safely reaches long-term archives (LTA) without human intervention - **Efficiency**: Intelligent routing and parallel processing maximize throughput - **Integrity**: Multi-layer checksums catch corruption early - **Resilience**: Automatic retry and recovery handle transient failures - **Visibility**: Comprehensive monitoring provides operational insight ## Target Audience This documentation is written for: - **Developers** working on the data transfer system or integrating with it - **CCAT team members** who want to understand how data flows through the observatory - **Operations staff** who need conceptual understanding for troubleshooting For related documentation and indepth guides on how to use and interface with the system: - **Instrument teams**: See {doc}`/source/instrument/integration` for API usage and data filing - **Scientists**: See {doc}`/source/scientists/guide` for accessing archived data - **DevOps/Infrastructure**: See {doc}`/source/operations/datacenter_operations` for deployment ## System Context The Data Transfer System is one component of the larger CCAT Data Center: ```{eval-rst} .. mermaid:: graph TD subgraph Observatory["CCAT Observatory (Chile)"] PrimeCam["Prime-Cam"] CHAI["CHAI"] InstComp["Instrument Computers"] PrimeCam --> InstComp CHAI --> InstComp end InstComp -->|Data Flow| DTS["Data Transfer System
• Package creation
• Site-to-site transfer
• Archive management"] DTS --> Cologne["Cologne LTA
(Germany)
• Long-term archive
• Processing"] DTS --> Cornell["optional e.g. Cornell LTA
(USA)
• Long-term archive
• Processing"] style Observatory fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px style DTS fill:#e3f2fd,stroke:#1565c0,stroke-width:2px style Cologne fill:#fff3e0,stroke:#ef6c00,stroke-width:2px style Cornell fill:#fce4ec,stroke:#c2185b,stroke-width:2px ``` The system integrates with: - {doc}`ops-db `: Tracks all data locations, operations, and metadata - {doc}`ops-db-api `: Instruments use this to file new observations - {doc}`ops-db-ui `: Provides visibility into system state - {doc}`influxdb `: Stores transfer metrics and performance data - {doc}`redis `: Coordinates distributed task execution ## Next Steps **For Developers:** - {doc}`source/api/index` - Complete API reference for all modules and functions - {doc}`source/philosophy` - Understand the design principles and architectural decisions - {doc}`source/concepts` - Learn the key concepts: Sites, DataLocations, Operations, Managers **For Operations:** - {doc}`source/pipeline` - Explore the 7-stage data processing pipeline - {doc}`source/routing` - See how the system intelligently routes work to the right place - {doc}`source/monitoring` - Learn about health checks, failure recovery, and observability - {doc}`source/lifecycle` - Understand deletion policies and data lifecycle management