Data Transfer System ==================== .. verified:: 2025-10-16 :reviewer: Christof Buchbender The CCAT Data Transfer System orchestrates the complete lifecycle of observatory data from telescope acquisition through long-term archival storage (LTA) and to clean up of upstream data upon successful archival. This system is designed to handle peak data rates of 8TB/day across geographically distributed sites while ensuring data integrity, availability, and efficient resource utilization. .. toctree:: :maxdepth: 2 :caption: Contents: :hidden: source/philosophy source/transfer_route_CCAT_Cologne source/concepts source/pipeline source/routing source/monitoring source/lifecycle source/api/index Overview -------- **What it does:** The Data Transfer System manages the automated flow of raw astronomical data through multiple processing stages: * Package raw data files from instrument computers * Transfer data packages between geographically distributed sites * Verify data integrity at each stage * Archive data to long-term storage (LTA) * Stage data for scientific processing * Clean up original and temporary storage based on retention policies **Why it matters:** With peak data rates of 8TB/day and operations spanning multiple continents, manual data management would be error-prone, inefficient, and unable to meet scientific requirements. The automated system ensures: * **Reliability**: Data safely reaches long-term archives (LTA) without human intervention * **Efficiency**: Intelligent routing and parallel processing maximize throughput * **Integrity**: Multi-layer checksums catch corruption early * **Resilience**: Automatic retry and recovery handle transient failures * **Visibility**: Comprehensive monitoring provides operational insight Target Audience --------------- This documentation is written for: * **Developers** working on the data transfer system or integrating with it * **CCAT team members** who want to understand how data flows through the observatory * **Operations staff** who need conceptual understanding for troubleshooting For related documentation and indepth guides on how to use and interface with the system: * **Instrument teams**: See :doc:`/source/instrument/integration` for API usage and data filing * **Scientists**: See :doc:`/source/scientists/guide` for accessing archived data * **DevOps/Infrastructure**: See :doc:`/source/operations/datacenter_operations` for deployment System Context -------------- The Data Transfer System is one component of the larger CCAT Data Center: .. mermaid:: graph TD subgraph Observatory["CCAT Observatory (Chile)"] PrimeCam["Prime-Cam"] CHAI["CHAI"] InstComp["Instrument Computers"] PrimeCam --> InstComp CHAI --> InstComp end InstComp -->|Data Flow| DTS["Data Transfer System
• Package creation
• Site-to-site transfer
• Archive management"] DTS --> Cologne["Cologne LTA
(Germany)
• Long-term archive
• Processing"] DTS --> Cornell["optional e.g. Cornell LTA
(USA)
• Long-term archive
• Processing"] style Observatory fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px style DTS fill:#e3f2fd,stroke:#1565c0,stroke-width:2px style Cologne fill:#fff3e0,stroke:#ef6c00,stroke-width:2px style Cornell fill:#fce4ec,stroke:#c2185b,stroke-width:2px The system integrates with: * :doc:`ops-db `: Tracks all data locations, operations, and metadata * :doc:`ops-db-api `: Instruments use this to file new observations * :doc:`ops-db-ui `: Provides visibility into system state * :doc:`influxdb `: Stores transfer metrics and performance data * :doc:`redis `: Coordinates distributed task execution Next Steps ---------- **For Developers:** * :doc:`source/api/index` - Complete API reference for all modules and functions * :doc:`source/philosophy` - Understand the design principles and architectural decisions * :doc:`source/concepts` - Learn the key concepts: Sites, DataLocations, Operations, Managers **For Operations:** * :doc:`source/pipeline` - Explore the 7-stage data processing pipeline * :doc:`source/routing` - See how the system intelligently routes work to the right place * :doc:`source/monitoring` - Learn about health checks, failure recovery, and observability * :doc:`source/lifecycle` - Understand deletion policies and data lifecycle management