Core Concepts
=============
.. verified:: 2025-11-20
:reviewer: Christof Buchbender
This section explains the fundamental architecture and data flow of the CCAT Data
Center, providing essential context for all users.
System Overview
---------------
The CCAT Data Center is built around a central
:doc:`/ops-db/docs/index` that tracks the complete lifecycle of astronomical
observations and their associated data. Think of ops-db as the "brain" of the system -
it knows what observations have been executed, where data is physically located, what
state it's in, and who has access to it.
The system handles massive data volumes from the Fred Young Submillimeter Telescope
(FYST), automatically moving data through stages from initial capture at the telescope
through long-term archival storage and eventual staging for scientific analysis.
**Key Components:**
* **ops-db** - Central PostgreSQL database tracking all observations, data packages, and locations
* **data-transfer** - Automated workflows that move data through its lifecycle
* **ops-db-api** - RESTful API for instruments to register observations and data
* **ops-db-ui** - Web interface for browsing observations and monitoring system state
* **Storage Infrastructure** - Multi-tier storage across sites (Chile, Germany)
* **HPC Integration** - Connection to RAMSES cluster for data processing
Data Center Architecture
------------------------
.. mermaid::
graph TB
subgraph Chile["Telescope Site (Chile)"]
INST[Instruments
Prime-Cam, CHAI]
SOURCE_CL[SOURCE Storage
Instrument Computers]
BUFFER_CL[BUFFER Storage
Transfer Staging]
LOCAL_API[ops-db-api
REST API]
end
subgraph Cologne["CCAT-DC (Cologne)"]
BUFFER_COL[BUFFER Storage
Transfer Staging]
OPSDB[(ops-db
PostgreSQL)]
API[ops-db-api
REST API]
UI[ops-db-ui
Web Interface]
TRANSFER[data-transfer
Automated Workflows]
LTA[LONG_TERM_ARCHIVE
S3 Object Storage]
HPC[RAMSES HPC Cluster
PROCESSING Storage]
end
subgraph Monitoring["Monitoring"]
INFLUX[InfluxDB
Metrics]
GRAFANA[Grafana
Dashboards]
end
INST -->|write data| SOURCE_CL
INST -->|file observations| LOCAL_API
SOURCE_CL -->|package| BUFFER_CL
BUFFER_CL -->|transfer| BUFFER_COL
BUFFER_COL -->|archive| LTA
LOCAL_API --> OPSDB
UI --> API
API --> OPSDB
TRANSFER <-->|track state| OPSDB
TRANSFER -->|orchestrate movement| BUFFER_CL
TRANSFER -->|orchestrate movement| BUFFER_COL
TRANSFER -->|stage for processing| HPC
TRANSFER -->|metrics| INFLUX
INFLUX --> GRAFANA
Scientists -->|browse| UI
Scientists -->|request data| API
Scientists -->|analyze| HPC
The ops-db sits at the center, with all other components updating or querying it.
Automated workflows (:doc:`/data-transfer/docs/index`) orchestrate data movement between
site buffers, while APIs and web interfaces provide access for humans and instruments.
**Current Configuration:** Data flows from Chile to Cologne. The system is designed for
multi-site architecture and can be extended to additional sites (e.g., Cornell) as
needed.
Data Flow Through the System
-----------------------------
Data moves through several stages from telescope to scientific publication. Each stage
serves a specific purpose in ensuring data integrity, accessibility, and long-term
preservation.
.. mermaid::
sequenceDiagram
participant Telescope
participant SOURCE as SOURCE
Storage
(Chile)
participant BUFFER_CL as BUFFER
Storage
(Chile)
participant Transfer as data-transfer
workflows
participant BUFFER_COL as BUFFER
Storage
(Cologne)
participant Archive as LONG_TERM
ARCHIVE
(Cologne)
participant HPC as RAMSES
HPC Cluster
participant Scientist
Telescope->>SOURCE: Write raw data files
Telescope->>Transfer: Register observation via API
Note over Transfer,BUFFER_CL: Stage 1: Packaging
Transfer->>SOURCE: Scan for new files
Transfer->>Transfer: Create RawDataPackage (tar archive)
Transfer->>BUFFER_CL: Copy package to Chile buffer
Note over BUFFER_CL,BUFFER_COL: Stage 2: Inter-Site Transfer
Transfer->>Transfer: Bundle packages for efficiency
Transfer->>BUFFER_CL: Transfer from Chile buffer
Transfer->>BUFFER_COL: To Cologne buffer (BBCP)
Note over Transfer,Archive: Stage 3: Archive
Transfer->>BUFFER_COL: Verify checksums
Transfer->>BUFFER_COL: Unpack DataTransferPackages into RawDataPackages
Transfer->>Archive: Move RawDataPackages to long-term storage
Transfer->>Transfer: Update ops-db with locations
Note over Scientist,HPC: Stage 4: Processing (on demand)
Scientist->>Transfer: Request data via UI/API
Transfer->>Archive: Retrieve archived data
Transfer->>HPC: Stage to PROCESSING location
Scientist->>HPC: Run analysis pipelines
HPC->>Archive: Store results
Stage 1: Observation & Capture
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Data originates at :py:class:`~ccat_ops_db.models.DataLocation` of type **SOURCE** - the
instrument computers at the telescope in Chile. When an observation executes:
1. Instrument writes raw data files (detector timestreams, housekeeping data)
2. Instrument files the observation via :doc:`/ops-db-api/docs/index`, providing
metadata (target, timing, configuration)
3. ops-db creates an ExecutedObsUnit record linking the observation to its data files
**Key Concept:** *Filing* means registering an observation in :doc:`/ops-db/docs/index`. This doesn't move
data, it just tells the system "this observation happened, and these are its files."
Stage 2: Packaging
^^^^^^^^^^^^^^^^^^^
The packaging workflow automatically discovers new files at
:py:class:`~ccat_ops_db.models.DataLocation` of type **SOURCE** and prepares them for
efficient transfer:
1. Related files (from one observation and instrument module) are grouped together
2. Files are bundled into tar archives (RawDataPackages, max ~50GB each)
3. Packages are copied to the :py:class:`~ccat_ops_db.models.DataLocation` of type **BUFFER** storage at the Chile site for transfer staging
4. Checksums are computed for integrity verification
**Why package?** Thousands of small files are inefficient to transfer and manage.
Packaging consolidates them into manageable units while preserving directory structure
and metadata.
**Why buffer?** The buffer provides a staging area that isolates transfer operations
from active observations. :py:class:`~ccat_ops_db.models.DataLocation` of type
**SOURCE** storage can continue capturing new data while the buffer handles transfer
workloads.
Stage 3: Inter-Site Transfer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Multiple :py:class:`~ccat_ops_db.models.RawDataPackage` are aggregated into
:py:class:`~ccat_ops_db.models.DataTransferPackage` and transferred between site
buffers:
1. Packages waiting in Chile's buffer are intelligently grouped by size and priority
2. High-performance transfer protocols (e.g. BBCP) move data from Chile buffer to
Cologne buffer
3. Automated retry mechanisms handle transient network failures
4. Transfer metrics track bandwidth utilization and completion rates
**Key Concept:** The system transfers buffer-to-buffer. This provides a reliable handoff
point between sites - data is considered "received" once it reaches the destination
site's buffer, even before final archiving.
**Multi-Site Design:** While currently configured for Chile → Cologne transfers, the
architecture supports multi-tiered routing (e.g., Chile → Cologne → Cornell) for
geographic redundancy.
Stage 4: Data Integrity & Archiving
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once data arrives at Cologne's buffer, it moves to permanent storage:
1. DataTransferPackages in Cologne buffer are verified via checksums
2. DataTransferPackages are unpacked into :py:class:`~ccat_ops_db.models.RawDataPackage`
3. :py:class:`~ccat_ops_db.models.RawDataPackage` are transferred to
:py:class:`~ccat_ops_db.models.DataLocation` of type
**LONG_TERM_ARCHIVE** storage (S3-compatible Coscine)
4. :py:class:`~ccat_ops_db.models.PhysicalCopy` records track all locations in ops-db
5. IVOA-compliant metadata makes :py:class:`~ccat_ops_db.models.RawDataPackage`
discoverable
**Storage Technologies:** Archives use :py:class:`~ccat_ops_db.models.DataLocation` of type
**LONG_TERM_ARCHIVE** storage (S3-compatible Coscine), providing
virtually unlimited capacity, built-in redundancy, and REST API access.
Stage 5: Processing (On Demand)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When scientists need data for analysis, the staging workflow retrieves it from archives:
1. Scientist requests specific observations via UI or API
2. Staging manager downloads data from long-term storage
3. Data is unpacked to :py:class:`~ccat_ops_db.models.DataLocation` of type
**PROCESSING** locations on RAMSES HPC cluster
4. Scientists run their analysis pipelines
5. Processed results are stored back to archives or published
**Key Concept:** Processing storage is temporary. Data is staged on demand and may be
cleaned up after analysis to free space.
Stage 6: Cleanup
^^^^^^^^^^^^^^^^^
After processing is complete, the data on :py:class:`~ccat_ops_db.models.DataLocation`
of type **PROCESSING** and the :py:class:`~ccat_ops_db.models.DataLocation` of type
**SOURCE** as well as the :py:class:`~ccat_ops_db.models.DataLocation` of type
**BUFFER** are deleted automatically. This occurs if the data is replicated in the
:py:class:`~ccat_ops_db.models.DataLocation` of type **LONG_TERM_ARCHIVE** and if
customizable retention policies at the origin
:py:class:`~ccat_ops_db.models.DataLocation` are met. This assures that data does not
have to manually be deleted by operators.
Storage Locations & Types
--------------------------
The data center uses a **location-based model** to track where data physically exists.
Each location has a type that defines its role in the data lifecycle:
Location Types
^^^^^^^^^^^^^^
**SOURCE**
Instrument computers where data originates (Chile telescope site). Data here is
transient - once packaged and transferred to the site buffer, it may be deleted to
free space for new observations.
**BUFFER**
Intermediate staging areas at each site for transfer operations. Buffers hold
packaged data during inter-site transfers and provide working space for data
transformation workflows. Each site has its own buffer:
* Chile buffer: Receives packages from :py:class:`~ccat_ops_db.models.DataLocation`
of type **SOURCE**, stages for transfer to Cologne
* Cologne buffer: Receives packages from :py:class:`~ccat_ops_db.models.DataLocation`
of type **BUFFER** in Chile, stages for archiving
A Siute can have multiple BUFFER locations, each with a different
priority and active flag. The system uses:
* **Priority** (lower number = higher priority): Determines which location to use first
* **Active** flag: Allows temporarily disabling locations for maintenance
**LONG_TERM_ARCHIVE**
Permanent storage systems designed for data preservation (currently at Cologne).
Archives provide high capacity and durability, with potential for multiple geographic
copies for redundancy.
**PROCESSING**
Temporary storage on HPC clusters where scientists analyze data (RAMSES at Cologne).
Processing locations provide high-performance access for compute-intensive workflows.
Storage Technologies
^^^^^^^^^^^^^^^^^^^^
Different storage technologies are used based on performance, capacity, and cost requirements:
**Disk**
Traditional filesystem storage (local or network-mounted). Used for SOURCE, BUFFER,
and PROCESSING locations where performance matters.
**S3-Compatible Object Storage**
Cloud-native storage with REST API access. Used for LONG_TERM_ARCHIVE locations
(Coscine), providing virtually unlimited capacity and built-in redundancy.
**Tape Libraries (Not currently in production)**
High-capacity sequential storage for archival. Economical for long-term preservation
of large datasets that are accessed infrequently. (Supported by architecture, not
currently deployed.)
Geographic Distribution
^^^^^^^^^^^^^^^^^^^^^^^
Storage locations currently span two sites:
* **CCAT Observatory (Chile)** - SOURCE and BUFFER locations at telescope
* **University of Cologne (Germany)** - BUFFER, LONG_TERM_ARCHIVE, and PROCESSING
(RAMSES)
**Future Expansion:** The architecture supports additional sites (e.g., Cornell for
secondary archive) and multi-tiered transfer routing.
Data Models & Relationships
----------------------------
Understanding the Data Lifecycle
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The CCAT Data Center tracks three parallel hierarchies that converge during
observations:
**Observational Hierarchy:** How observations are organized scientifically
::
Observatory → Telescope → Instrument → InstrumentModule
↓
ObservingProgram → ObsUnit → ExecutedObsUnit
**Data Hierarchy:** How data files are packaged and tracked
::
RawDataFile → RawDataPackage → DataTransferPackage
↓ ↓ ↓
PhysicalCopy (tracks locations)
**Storage Hierarchy:** Where data physically exists
::
Site → DataLocation → PhysicalCopy
These hierarchies connect when an observation executes: an
:py:class:`~ccat_ops_db.models.ExecutedObsUnit` produces
:py:class:`~ccat_ops_db.models.RawDataFile`, which are packaged into
:py:class:`~ccat_ops_db.models.RawDataPackage`, which have
:py:class:`~ccat_ops_db.models.PhysicalCopy` at various various DataLocations across
different Sites.
Key Terminology
^^^^^^^^^^^^^^^
**ObsUnit**
A planned observation - what the telescope *will* observe (target, duration,
instrument configuration)
**ExecutedObsUnit**
An actual observation execution - what the telescope *did* observe (includes
start/end times, conditions, quality assessment)
**RawDataFile**
An individual file produced by an instrument during an observation
**RawDataPackage**
A tar archive bundling related files from one observation for efficient management
**Site**
A physical location with data center infrastructure (Chile, Cologne)
**DataLocation**
A specific storage system at a site (e.g., "chile-buffer-01", "cologne-archive")
**PhysicalCopy**
A record tracking that a specific file or package exists at a specific location
**Filing**
The process of registering an observation and its data in ops-db via the API
**Staging**
Retrieving archived data and making it available at a
:py:class:`~ccat_ops_db.models.DataLocation` of type **PROCESSING**
Key Principles & Patterns
--------------------------
Single Source of Truth
^^^^^^^^^^^^^^^^^^^^^^
:doc:`/ops-db/docs/index` is the authoritative record for all metadata. Rather than
duplicating information across systems, all components query or update the central
database. This ensures consistency and simplifies data governance.
**Example:** When a :py:class:`~ccat_ops_db.models.RawDataPackage` is archived, the
:doc:`/data-transfer/docs/index` workflow updates its state in :doc:`/ops-db/docs/index`.
The UI, API, and monitoring systems all reflect this change immediately.
state in ops-db. The UI, API, and monitoring systems all reflect this change
immediately.
Buffer-Based Transfer Model
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Inter-site transfers operate buffer-to-buffer rather than directly from source to
archive:
* Isolates transfer operations from active observations
* Provides reliable handoff points between sites
* Enables retry and recovery
* Supports multi-tiered routing for complex site topologies
**Example:** Chile's buffer accumulates packages and transfers to Cologne's buffer
when enough data for an efficient :py:class:`~ccat_ops_db.models.DataTransferPackage`
is ready. For long distance transfers optimal package sizes exists in the range of
10-50TB. The size can be determined by benchmarks.
Automated Lifecycle Management
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Human intervention should be the exception, not the rule. The data-transfer workflows
handle routine operations automatically:
* New observations trigger packaging workflows
* Packages automatically transfer between site buffers when ready
* Integrity checks verify data at each stage
* Failed operations retry with exponential backoff
* Only persistent failures require human attention
Geographic Distribution
^^^^^^^^^^^^^^^^^^^^^^^
Data can be replicated across multiple sites (not currently in production) for
redundancy:
* Network issues can be worked around via alternative routes
* Future expansion can add additional archive sites (e.g. Cornell)
Comprehensive Tracking
^^^^^^^^^^^^^^^^^^^^^^
Every piece of data is tracked from creation through deletion:
* :py:class:`~ccat_ops_db.models.PhysicalCopy` records provide complete audit trail
* Checksums verify integrity at every stage
* Transfer metrics enable performance analysis via :doc:`/data-transfer/docs/monitoring`
* State tracking shows exactly what's happening to data in :doc:`/ops-db/docs/index`
IVOA Compliance
^^^^^^^^^^^^^^^
The data center tries to follow International Virtual Observatory Alliance standards
where possible:
* Metadata enables cross-observatory data discovery
* Astronomical concepts (coordinates, sources, lines) use standard models
* Integration with broader astronomical data ecosystem
Fault Tolerance
^^^^^^^^^^^^^^^
The system is designed to handle failures gracefully:
* Retry mechanisms with exponential backoff
* Circuit breakers prevent cascade failures
* Degraded operation when components fail
* Clear error reporting for investigation
* Recovery procedures for common failure modes
Data Model Reference
^^^^^^^^^^^^^^^^^^^^
The following models are defined in the ops-db package. For complete technical details
including all attributes, relationships, and methods, see the :doc:`/ops-db/docs/index`
documentation.
.. data-model-glossary::
:module: ccat_ops_db.models
:sort: alphabetical
Next Steps
----------
Now that you understand the core architecture and data flow:
* **Scientists:** See :doc:`../scientists/guide` to learn how to access and analyze data
* **Instrument Teams:** See :doc:`../instrument/integration` to integrate your instrument
* **Operators:** See :doc:`../operations/datacenter_operations` for deployment and monitoring
* **Developers:** See :doc:`../components/developers` for component technical documentation