Data Transfer System
====================
.. verified:: 2025-10-16
:reviewer: Christof Buchbender
The CCAT Data Transfer System orchestrates the complete lifecycle of observatory data
from telescope acquisition through long-term archival storage (LTA) and to clean up of
upstream data upon successful archival. This system is designed to handle peak data
rates of 8TB/day across geographically distributed sites while ensuring data integrity,
availability, and efficient resource utilization.
.. toctree::
:maxdepth: 2
:caption: Contents:
:hidden:
source/philosophy
source/transfer_route_CCAT_Cologne
source/concepts
source/pipeline
source/routing
source/monitoring
source/lifecycle
source/api/index
Overview
--------
**What it does:**
The Data Transfer System manages the automated flow of raw astronomical data through
multiple processing stages:
* Package raw data files from instrument computers
* Transfer data packages between geographically distributed sites
* Verify data integrity at each stage
* Archive data to long-term storage (LTA)
* Stage data for scientific processing
* Clean up original and temporary storage based on retention policies
**Why it matters:**
With peak data rates of 8TB/day and operations spanning multiple continents, manual data
management would be error-prone, inefficient, and unable to meet scientific
requirements. The automated system ensures:
* **Reliability**: Data safely reaches long-term archives (LTA) without human intervention
* **Efficiency**: Intelligent routing and parallel processing maximize throughput
* **Integrity**: Multi-layer checksums catch corruption early
* **Resilience**: Automatic retry and recovery handle transient failures
* **Visibility**: Comprehensive monitoring provides operational insight
Target Audience
---------------
This documentation is written for:
* **Developers** working on the data transfer system or integrating with it
* **CCAT team members** who want to understand how data flows through the observatory
* **Operations staff** who need conceptual understanding for troubleshooting
For related documentation and indepth guides on how to use and interface with the
system:
* **Instrument teams**: See :doc:`/source/instrument/integration` for API usage and data
filing
* **Scientists**: See :doc:`/source/scientists/guide` for accessing archived data
* **DevOps/Infrastructure**: See :doc:`/source/operations/datacenter_operations` for
deployment
System Context
--------------
The Data Transfer System is one component of the larger CCAT Data Center:
.. mermaid::
graph TD
subgraph Observatory["CCAT Observatory (Chile)"]
PrimeCam["Prime-Cam"]
CHAI["CHAI"]
InstComp["Instrument Computers"]
PrimeCam --> InstComp
CHAI --> InstComp
end
InstComp -->|Data Flow| DTS["Data Transfer System
• Package creation
• Site-to-site transfer
• Archive management"]
DTS --> Cologne["Cologne LTA
(Germany)
• Long-term archive
• Processing"]
DTS --> Cornell["optional e.g. Cornell LTA
(USA)
• Long-term archive
• Processing"]
style Observatory fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
style DTS fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
style Cologne fill:#fff3e0,stroke:#ef6c00,stroke-width:2px
style Cornell fill:#fce4ec,stroke:#c2185b,stroke-width:2px
The system integrates with:
* :doc:`ops-db `: Tracks all data locations, operations, and metadata
* :doc:`ops-db-api `: Instruments use this to file new observations
* :doc:`ops-db-ui `: Provides visibility into system state
* :doc:`influxdb `: Stores transfer metrics and performance data
* :doc:`redis `: Coordinates distributed task execution
Next Steps
----------
**For Developers:**
* :doc:`source/api/index` - Complete API reference for all modules and functions
* :doc:`source/philosophy` - Understand the design principles and architectural decisions
* :doc:`source/concepts` - Learn the key concepts: Sites, DataLocations, Operations, Managers
**For Operations:**
* :doc:`source/pipeline` - Explore the 7-stage data processing pipeline
* :doc:`source/routing` - See how the system intelligently routes work to the right place
* :doc:`source/monitoring` - Learn about health checks, failure recovery, and observability
* :doc:`source/lifecycle` - Understand deletion policies and data lifecycle management