Core Concepts ============= .. verified:: 2025-11-20 :reviewer: Christof Buchbender This section explains the fundamental architecture and data flow of the CCAT Data Center, providing essential context for all users. System Overview --------------- The CCAT Data Center is built around a central :doc:`/ops-db/docs/index` that tracks the complete lifecycle of astronomical observations and their associated data. Think of ops-db as the "brain" of the system - it knows what observations have been executed, where data is physically located, what state it's in, and who has access to it. The system handles massive data volumes from the Fred Young Submillimeter Telescope (FYST), automatically moving data through stages from initial capture at the telescope through long-term archival storage and eventual staging for scientific analysis. **Key Components:** * **ops-db** - Central PostgreSQL database tracking all observations, data packages, and locations * **data-transfer** - Automated workflows that move data through its lifecycle * **ops-db-api** - RESTful API for instruments to register observations and data * **ops-db-ui** - Web interface for browsing observations and monitoring system state * **Storage Infrastructure** - Multi-tier storage across sites (Chile, Germany) * **HPC Integration** - Connection to RAMSES cluster for data processing Data Center Architecture ------------------------ .. mermaid:: graph TB subgraph Chile["Telescope Site (Chile)"] INST[Instruments
Prime-Cam, CHAI] SOURCE_CL[SOURCE Storage
Instrument Computers] BUFFER_CL[BUFFER Storage
Transfer Staging] LOCAL_API[ops-db-api
REST API] end subgraph Cologne["CCAT-DC (Cologne)"] BUFFER_COL[BUFFER Storage
Transfer Staging] OPSDB[(ops-db
PostgreSQL)] API[ops-db-api
REST API] UI[ops-db-ui
Web Interface] TRANSFER[data-transfer
Automated Workflows] LTA[LONG_TERM_ARCHIVE
S3 Object Storage] HPC[RAMSES HPC Cluster
PROCESSING Storage] end subgraph Monitoring["Monitoring"] INFLUX[InfluxDB
Metrics] GRAFANA[Grafana
Dashboards] end INST -->|write data| SOURCE_CL INST -->|file observations| LOCAL_API SOURCE_CL -->|package| BUFFER_CL BUFFER_CL -->|transfer| BUFFER_COL BUFFER_COL -->|archive| LTA LOCAL_API --> OPSDB UI --> API API --> OPSDB TRANSFER <-->|track state| OPSDB TRANSFER -->|orchestrate movement| BUFFER_CL TRANSFER -->|orchestrate movement| BUFFER_COL TRANSFER -->|stage for processing| HPC TRANSFER -->|metrics| INFLUX INFLUX --> GRAFANA Scientists -->|browse| UI Scientists -->|request data| API Scientists -->|analyze| HPC The ops-db sits at the center, with all other components updating or querying it. Automated workflows (:doc:`/data-transfer/docs/index`) orchestrate data movement between site buffers, while APIs and web interfaces provide access for humans and instruments. **Current Configuration:** Data flows from Chile to Cologne. The system is designed for multi-site architecture and can be extended to additional sites (e.g., Cornell) as needed. Data Flow Through the System ----------------------------- Data moves through several stages from telescope to scientific publication. Each stage serves a specific purpose in ensuring data integrity, accessibility, and long-term preservation. .. mermaid:: sequenceDiagram participant Telescope participant SOURCE as SOURCE
Storage
(Chile) participant BUFFER_CL as BUFFER
Storage
(Chile) participant Transfer as data-transfer
workflows participant BUFFER_COL as BUFFER
Storage
(Cologne) participant Archive as LONG_TERM
ARCHIVE
(Cologne) participant HPC as RAMSES
HPC Cluster participant Scientist Telescope->>SOURCE: Write raw data files Telescope->>Transfer: Register observation via API Note over Transfer,BUFFER_CL: Stage 1: Packaging Transfer->>SOURCE: Scan for new files Transfer->>Transfer: Create RawDataPackage (tar archive) Transfer->>BUFFER_CL: Copy package to Chile buffer Note over BUFFER_CL,BUFFER_COL: Stage 2: Inter-Site Transfer Transfer->>Transfer: Bundle packages for efficiency Transfer->>BUFFER_CL: Transfer from Chile buffer Transfer->>BUFFER_COL: To Cologne buffer (BBCP) Note over Transfer,Archive: Stage 3: Archive Transfer->>BUFFER_COL: Verify checksums Transfer->>BUFFER_COL: Unpack DataTransferPackages into RawDataPackages Transfer->>Archive: Move RawDataPackages to long-term storage Transfer->>Transfer: Update ops-db with locations Note over Scientist,HPC: Stage 4: Processing (on demand) Scientist->>Transfer: Request data via UI/API Transfer->>Archive: Retrieve archived data Transfer->>HPC: Stage to PROCESSING location Scientist->>HPC: Run analysis pipelines HPC->>Archive: Store results Stage 1: Observation & Capture ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Data originates at :py:class:`~ccat_ops_db.models.DataLocation` of type **SOURCE** - the instrument computers at the telescope in Chile. When an observation executes: 1. Instrument writes raw data files (detector timestreams, housekeeping data) 2. Instrument files the observation via :doc:`/ops-db-api/docs/index`, providing metadata (target, timing, configuration) 3. ops-db creates an ExecutedObsUnit record linking the observation to its data files **Key Concept:** *Filing* means registering an observation in :doc:`/ops-db/docs/index`. This doesn't move data, it just tells the system "this observation happened, and these are its files." Stage 2: Packaging ^^^^^^^^^^^^^^^^^^^ The packaging workflow automatically discovers new files at :py:class:`~ccat_ops_db.models.DataLocation` of type **SOURCE** and prepares them for efficient transfer: 1. Related files (from one observation and instrument module) are grouped together 2. Files are bundled into tar archives (RawDataPackages, max ~50GB each) 3. Packages are copied to the :py:class:`~ccat_ops_db.models.DataLocation` of type **BUFFER** storage at the Chile site for transfer staging 4. Checksums are computed for integrity verification **Why package?** Thousands of small files are inefficient to transfer and manage. Packaging consolidates them into manageable units while preserving directory structure and metadata. **Why buffer?** The buffer provides a staging area that isolates transfer operations from active observations. :py:class:`~ccat_ops_db.models.DataLocation` of type **SOURCE** storage can continue capturing new data while the buffer handles transfer workloads. Stage 3: Inter-Site Transfer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Multiple :py:class:`~ccat_ops_db.models.RawDataPackage` are aggregated into :py:class:`~ccat_ops_db.models.DataTransferPackage` and transferred between site buffers: 1. Packages waiting in Chile's buffer are intelligently grouped by size and priority 2. High-performance transfer protocols (e.g. BBCP) move data from Chile buffer to Cologne buffer 3. Automated retry mechanisms handle transient network failures 4. Transfer metrics track bandwidth utilization and completion rates **Key Concept:** The system transfers buffer-to-buffer. This provides a reliable handoff point between sites - data is considered "received" once it reaches the destination site's buffer, even before final archiving. **Multi-Site Design:** While currently configured for Chile → Cologne transfers, the architecture supports multi-tiered routing (e.g., Chile → Cologne → Cornell) for geographic redundancy. Stage 4: Data Integrity & Archiving ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Once data arrives at Cologne's buffer, it moves to permanent storage: 1. DataTransferPackages in Cologne buffer are verified via checksums 2. DataTransferPackages are unpacked into :py:class:`~ccat_ops_db.models.RawDataPackage` 3. :py:class:`~ccat_ops_db.models.RawDataPackage` are transferred to :py:class:`~ccat_ops_db.models.DataLocation` of type **LONG_TERM_ARCHIVE** storage (S3-compatible Coscine) 4. :py:class:`~ccat_ops_db.models.PhysicalCopy` records track all locations in ops-db 5. IVOA-compliant metadata makes :py:class:`~ccat_ops_db.models.RawDataPackage` discoverable **Storage Technologies:** Archives use :py:class:`~ccat_ops_db.models.DataLocation` of type **LONG_TERM_ARCHIVE** storage (S3-compatible Coscine), providing virtually unlimited capacity, built-in redundancy, and REST API access. Stage 5: Processing (On Demand) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When scientists need data for analysis, the staging workflow retrieves it from archives: 1. Scientist requests specific observations via UI or API 2. Staging manager downloads data from long-term storage 3. Data is unpacked to :py:class:`~ccat_ops_db.models.DataLocation` of type **PROCESSING** locations on RAMSES HPC cluster 4. Scientists run their analysis pipelines 5. Processed results are stored back to archives or published **Key Concept:** Processing storage is temporary. Data is staged on demand and may be cleaned up after analysis to free space. Stage 6: Cleanup ^^^^^^^^^^^^^^^^^ After processing is complete, the data on :py:class:`~ccat_ops_db.models.DataLocation` of type **PROCESSING** and the :py:class:`~ccat_ops_db.models.DataLocation` of type **SOURCE** as well as the :py:class:`~ccat_ops_db.models.DataLocation` of type **BUFFER** are deleted automatically. This occurs if the data is replicated in the :py:class:`~ccat_ops_db.models.DataLocation` of type **LONG_TERM_ARCHIVE** and if customizable retention policies at the origin :py:class:`~ccat_ops_db.models.DataLocation` are met. This assures that data does not have to manually be deleted by operators. Storage Locations & Types -------------------------- The data center uses a **location-based model** to track where data physically exists. Each location has a type that defines its role in the data lifecycle: Location Types ^^^^^^^^^^^^^^ **SOURCE** Instrument computers where data originates (Chile telescope site). Data here is transient - once packaged and transferred to the site buffer, it may be deleted to free space for new observations. **BUFFER** Intermediate staging areas at each site for transfer operations. Buffers hold packaged data during inter-site transfers and provide working space for data transformation workflows. Each site has its own buffer: * Chile buffer: Receives packages from :py:class:`~ccat_ops_db.models.DataLocation` of type **SOURCE**, stages for transfer to Cologne * Cologne buffer: Receives packages from :py:class:`~ccat_ops_db.models.DataLocation` of type **BUFFER** in Chile, stages for archiving A Siute can have multiple BUFFER locations, each with a different priority and active flag. The system uses: * **Priority** (lower number = higher priority): Determines which location to use first * **Active** flag: Allows temporarily disabling locations for maintenance **LONG_TERM_ARCHIVE** Permanent storage systems designed for data preservation (currently at Cologne). Archives provide high capacity and durability, with potential for multiple geographic copies for redundancy. **PROCESSING** Temporary storage on HPC clusters where scientists analyze data (RAMSES at Cologne). Processing locations provide high-performance access for compute-intensive workflows. Storage Technologies ^^^^^^^^^^^^^^^^^^^^ Different storage technologies are used based on performance, capacity, and cost requirements: **Disk** Traditional filesystem storage (local or network-mounted). Used for SOURCE, BUFFER, and PROCESSING locations where performance matters. **S3-Compatible Object Storage** Cloud-native storage with REST API access. Used for LONG_TERM_ARCHIVE locations (Coscine), providing virtually unlimited capacity and built-in redundancy. **Tape Libraries (Not currently in production)** High-capacity sequential storage for archival. Economical for long-term preservation of large datasets that are accessed infrequently. (Supported by architecture, not currently deployed.) Geographic Distribution ^^^^^^^^^^^^^^^^^^^^^^^ Storage locations currently span two sites: * **CCAT Observatory (Chile)** - SOURCE and BUFFER locations at telescope * **University of Cologne (Germany)** - BUFFER, LONG_TERM_ARCHIVE, and PROCESSING (RAMSES) **Future Expansion:** The architecture supports additional sites (e.g., Cornell for secondary archive) and multi-tiered transfer routing. Data Models & Relationships ---------------------------- Understanding the Data Lifecycle ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The CCAT Data Center tracks three parallel hierarchies that converge during observations: **Observational Hierarchy:** How observations are organized scientifically :: Observatory → Telescope → Instrument → InstrumentModule ↓ ObservingProgram → ObsUnit → ExecutedObsUnit **Data Hierarchy:** How data files are packaged and tracked :: RawDataFile → RawDataPackage → DataTransferPackage ↓ ↓ ↓ PhysicalCopy (tracks locations) **Storage Hierarchy:** Where data physically exists :: Site → DataLocation → PhysicalCopy These hierarchies connect when an observation executes: an :py:class:`~ccat_ops_db.models.ExecutedObsUnit` produces :py:class:`~ccat_ops_db.models.RawDataFile`, which are packaged into :py:class:`~ccat_ops_db.models.RawDataPackage`, which have :py:class:`~ccat_ops_db.models.PhysicalCopy` at various various DataLocations across different Sites. Key Terminology ^^^^^^^^^^^^^^^ **ObsUnit** A planned observation - what the telescope *will* observe (target, duration, instrument configuration) **ExecutedObsUnit** An actual observation execution - what the telescope *did* observe (includes start/end times, conditions, quality assessment) **RawDataFile** An individual file produced by an instrument during an observation **RawDataPackage** A tar archive bundling related files from one observation for efficient management **Site** A physical location with data center infrastructure (Chile, Cologne) **DataLocation** A specific storage system at a site (e.g., "chile-buffer-01", "cologne-archive") **PhysicalCopy** A record tracking that a specific file or package exists at a specific location **Filing** The process of registering an observation and its data in ops-db via the API **Staging** Retrieving archived data and making it available at a :py:class:`~ccat_ops_db.models.DataLocation` of type **PROCESSING** Key Principles & Patterns -------------------------- Single Source of Truth ^^^^^^^^^^^^^^^^^^^^^^ :doc:`/ops-db/docs/index` is the authoritative record for all metadata. Rather than duplicating information across systems, all components query or update the central database. This ensures consistency and simplifies data governance. **Example:** When a :py:class:`~ccat_ops_db.models.RawDataPackage` is archived, the :doc:`/data-transfer/docs/index` workflow updates its state in :doc:`/ops-db/docs/index`. The UI, API, and monitoring systems all reflect this change immediately. state in ops-db. The UI, API, and monitoring systems all reflect this change immediately. Buffer-Based Transfer Model ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Inter-site transfers operate buffer-to-buffer rather than directly from source to archive: * Isolates transfer operations from active observations * Provides reliable handoff points between sites * Enables retry and recovery * Supports multi-tiered routing for complex site topologies **Example:** Chile's buffer accumulates packages and transfers to Cologne's buffer when enough data for an efficient :py:class:`~ccat_ops_db.models.DataTransferPackage` is ready. For long distance transfers optimal package sizes exists in the range of 10-50TB. The size can be determined by benchmarks. Automated Lifecycle Management ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Human intervention should be the exception, not the rule. The data-transfer workflows handle routine operations automatically: * New observations trigger packaging workflows * Packages automatically transfer between site buffers when ready * Integrity checks verify data at each stage * Failed operations retry with exponential backoff * Only persistent failures require human attention Geographic Distribution ^^^^^^^^^^^^^^^^^^^^^^^ Data can be replicated across multiple sites (not currently in production) for redundancy: * Network issues can be worked around via alternative routes * Future expansion can add additional archive sites (e.g. Cornell) Comprehensive Tracking ^^^^^^^^^^^^^^^^^^^^^^ Every piece of data is tracked from creation through deletion: * :py:class:`~ccat_ops_db.models.PhysicalCopy` records provide complete audit trail * Checksums verify integrity at every stage * Transfer metrics enable performance analysis via :doc:`/data-transfer/docs/monitoring` * State tracking shows exactly what's happening to data in :doc:`/ops-db/docs/index` IVOA Compliance ^^^^^^^^^^^^^^^ The data center tries to follow International Virtual Observatory Alliance standards where possible: * Metadata enables cross-observatory data discovery * Astronomical concepts (coordinates, sources, lines) use standard models * Integration with broader astronomical data ecosystem Fault Tolerance ^^^^^^^^^^^^^^^ The system is designed to handle failures gracefully: * Retry mechanisms with exponential backoff * Circuit breakers prevent cascade failures * Degraded operation when components fail * Clear error reporting for investigation * Recovery procedures for common failure modes Data Model Reference ^^^^^^^^^^^^^^^^^^^^ The following models are defined in the ops-db package. For complete technical details including all attributes, relationships, and methods, see the :doc:`/ops-db/docs/index` documentation. .. data-model-glossary:: :module: ccat_ops_db.models :sort: alphabetical Next Steps ---------- Now that you understand the core architecture and data flow: * **Scientists:** See :doc:`../scientists/guide` to learn how to access and analyze data * **Instrument Teams:** See :doc:`../instrument/integration` to integrate your instrument * **Operators:** See :doc:`../operations/datacenter_operations` for deployment and monitoring * **Developers:** See :doc:`../components/developers` for component technical documentation