Data Transfer System#

Documentation Verified Last checked: 2025-10-16 Reviewer: Christof Buchbender

The CCAT Data Transfer System orchestrates the complete lifecycle of observatory data from telescope acquisition through long-term archival storage (LTA) and to clean up of upstream data upon successful archival. This system is designed to handle peak data rates of 8TB/day across geographically distributed sites while ensuring data integrity, availability, and efficient resource utilization.

Overview#

What it does:

The Data Transfer System manages the automated flow of raw astronomical data through multiple processing stages:

  • Package raw data files from instrument computers

  • Transfer data packages between geographically distributed sites

  • Verify data integrity at each stage

  • Archive data to long-term storage (LTA)

  • Stage data for scientific processing

  • Clean up original and temporary storage based on retention policies

Why it matters:

With peak data rates of 8TB/day and operations spanning multiple continents, manual data management would be error-prone, inefficient, and unable to meet scientific requirements. The automated system ensures:

  • Reliability: Data safely reaches long-term archives (LTA) without human intervention

  • Efficiency: Intelligent routing and parallel processing maximize throughput

  • Integrity: Multi-layer checksums catch corruption early

  • Resilience: Automatic retry and recovery handle transient failures

  • Visibility: Comprehensive monitoring provides operational insight

Target Audience#

This documentation is written for:

  • Developers working on the data transfer system or integrating with it

  • CCAT team members who want to understand how data flows through the observatory

  • Operations staff who need conceptual understanding for troubleshooting

For related documentation and indepth guides on how to use and interface with the system:

  • Instrument teams: See /source/instrument/integration for API usage and data filing

  • Scientists: See /source/scientists/guide for accessing archived data

  • DevOps/Infrastructure: See /source/operations/datacenter_operations for deployment

System Context#

The Data Transfer System is one component of the larger CCAT Data Center:

        graph TD
    subgraph Observatory["CCAT Observatory (Chile)"]
        PrimeCam["Prime-Cam"]
        CHAI["CHAI"]
        InstComp["Instrument Computers"]

        PrimeCam --> InstComp
        CHAI --> InstComp
    end

    InstComp -->|Data Flow| DTS["Data Transfer System<br/>• Package creation<br/>• Site-to-site transfer<br/>• Archive management"]

    DTS --> Cologne["Cologne LTA<br/>(Germany)<br/>• Long-term archive<br/>• Processing"]
    DTS --> Cornell["optional e.g. Cornell LTA<br/>(USA)<br/>• Long-term archive<br/>• Processing"]

    style Observatory fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
    style DTS fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
    style Cologne fill:#fff3e0,stroke:#ef6c00,stroke-width:2px
    style Cornell fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    

The system integrates with:

  • ops-db: Tracks all data locations, operations, and metadata

  • ops-db-api: Instruments use this to file new observations

  • ops-db-ui: Provides visibility into system state

  • influxdb: Stores transfer metrics and performance data

  • redis: Coordinates distributed task execution

Next Steps#

For Developers:

For Operations: