Core Concepts
=============

.. verified:: 2025-11-20
   :reviewer: Christof Buchbender

This section explains the fundamental architecture and data flow of the CCAT Data
Center, providing essential context for all users.

System Overview
---------------

The CCAT Data Center is built around a central
:doc:`/ops-db/docs/index` that tracks the complete lifecycle of astronomical
observations and their associated data. Think of ops-db as the "brain" of the system -
it knows what observations have been executed, where data is physically located, what
state it's in, and who has access to it.

The system handles massive data volumes from the Fred Young Submillimeter Telescope
(FYST), automatically moving data through stages from initial capture at the telescope
through long-term archival storage and eventual staging for scientific analysis.

**Key Components:**

* **ops-db** - Central PostgreSQL database tracking all observations, data packages, and locations
* **data-transfer** - Automated workflows that move data through its lifecycle
* **ops-db-api** - RESTful API for instruments to register observations and data
* **ops-db-ui** - Web interface for browsing observations and monitoring system state
* **Storage Infrastructure** - Multi-tier storage across sites (Chile, Germany)
* **HPC Integration** - Connection to RAMSES cluster for data processing

Data Center Architecture
------------------------

.. mermaid::

   graph TB
       subgraph Chile["Telescope Site (Chile)"]
           INST[Instruments<br/>Prime-Cam, CHAI]
           SOURCE_CL[SOURCE Storage<br/>Instrument Computers]
           BUFFER_CL[BUFFER Storage<br/>Transfer Staging]
           LOCAL_API[ops-db-api<br/>REST API]
       end
       
       subgraph Cologne["CCAT-DC (Cologne)"]
           BUFFER_COL[BUFFER Storage<br/>Transfer Staging]
           OPSDB[(ops-db<br/>PostgreSQL)]
           API[ops-db-api<br/>REST API]
           UI[ops-db-ui<br/>Web Interface]
           TRANSFER[data-transfer<br/>Automated Workflows]
           LTA[LONG_TERM_ARCHIVE<br/>S3 Object Storage]
           HPC[RAMSES HPC Cluster<br/>PROCESSING Storage]
       end
       
       subgraph Monitoring["Monitoring"]
           INFLUX[InfluxDB<br/>Metrics]
           GRAFANA[Grafana<br/>Dashboards]
       end
       
       INST -->|write data| SOURCE_CL
       INST -->|file observations| LOCAL_API
       SOURCE_CL -->|package| BUFFER_CL
       BUFFER_CL -->|transfer| BUFFER_COL
       BUFFER_COL -->|archive| LTA
       
       LOCAL_API --> OPSDB
       UI --> API
       API --> OPSDB
       TRANSFER <-->|track state| OPSDB
       TRANSFER -->|orchestrate movement| BUFFER_CL
       TRANSFER -->|orchestrate movement| BUFFER_COL
       TRANSFER -->|stage for processing| HPC
       TRANSFER -->|metrics| INFLUX
       INFLUX --> GRAFANA
       
       Scientists -->|browse| UI
       Scientists -->|request data| API
       Scientists -->|analyze| HPC

The ops-db sits at the center, with all other components updating or querying it.
Automated workflows (:doc:`/data-transfer/docs/index`) orchestrate data movement between
site buffers, while APIs and web interfaces provide access for humans and instruments.

**Current Configuration:** Data flows from Chile to Cologne. The system is designed for
multi-site architecture and can be extended to additional sites (e.g., Cornell) as
needed.

Data Flow Through the System
-----------------------------

Data moves through several stages from telescope to scientific publication. Each stage
serves a specific purpose in ensuring data integrity, accessibility, and long-term
preservation.

.. mermaid::

   sequenceDiagram
       participant Telescope
       participant SOURCE as SOURCE<br/>Storage<br/>(Chile)
       participant BUFFER_CL as BUFFER<br/>Storage<br/>(Chile)
       participant Transfer as data-transfer<br/>workflows
       participant BUFFER_COL as BUFFER<br/>Storage<br/>(Cologne)
       participant Archive as LONG_TERM<br/>ARCHIVE<br/>(Cologne)
       participant HPC as RAMSES<br/>HPC Cluster
       participant Scientist
       
       Telescope->>SOURCE: Write raw data files
       Telescope->>Transfer: Register observation via API
       
       Note over Transfer,BUFFER_CL: Stage 1: Packaging
       Transfer->>SOURCE: Scan for new files
       Transfer->>Transfer: Create RawDataPackage (tar archive)
       Transfer->>BUFFER_CL: Copy package to Chile buffer
       
       Note over BUFFER_CL,BUFFER_COL: Stage 2: Inter-Site Transfer
       Transfer->>Transfer: Bundle packages for efficiency
       Transfer->>BUFFER_CL: Transfer from Chile buffer
       Transfer->>BUFFER_COL: To Cologne buffer (BBCP)
       
       Note over Transfer,Archive: Stage 3: Archive
       Transfer->>BUFFER_COL: Verify checksums
       Transfer->>BUFFER_COL: Unpack DataTransferPackages into RawDataPackages
       Transfer->>Archive: Move RawDataPackages to long-term storage
       Transfer->>Transfer: Update ops-db with locations
       
       Note over Scientist,HPC: Stage 4: Processing (on demand)
       Scientist->>Transfer: Request data via UI/API
       Transfer->>Archive: Retrieve archived data
       Transfer->>HPC: Stage to PROCESSING location
       Scientist->>HPC: Run analysis pipelines
       HPC->>Archive: Store results

Stage 1: Observation & Capture
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Data originates at :py:class:`~ccat_ops_db.models.DataLocation` of type **SOURCE** - the
instrument computers at the telescope in Chile. When an observation executes:

1. Instrument writes raw data files (detector timestreams, housekeeping data)
2. Instrument files the observation via :doc:`/ops-db-api/docs/index`, providing
   metadata (target, timing, configuration)
3. ops-db creates an ExecutedObsUnit record linking the observation to its data files

**Key Concept:** *Filing* means registering an observation in :doc:`/ops-db/docs/index`. This doesn't move
data, it just tells the system "this observation happened, and these are its files."

Stage 2: Packaging
^^^^^^^^^^^^^^^^^^^

The packaging workflow automatically discovers new files at
:py:class:`~ccat_ops_db.models.DataLocation` of type **SOURCE** and prepares them for
efficient transfer:

1. Related files (from one observation and instrument module) are grouped together
2. Files are bundled into tar archives (RawDataPackages, max ~50GB each)
3. Packages are copied to the :py:class:`~ccat_ops_db.models.DataLocation` of type **BUFFER** storage at the Chile site for transfer staging
4. Checksums are computed for integrity verification

**Why package?** Thousands of small files are inefficient to transfer and manage.
Packaging consolidates them into manageable units while preserving directory structure
and metadata.

**Why buffer?** The buffer provides a staging area that isolates transfer operations
from active observations. :py:class:`~ccat_ops_db.models.DataLocation` of type
**SOURCE** storage can continue capturing new data while the buffer handles transfer
workloads.

Stage 3: Inter-Site Transfer
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Multiple :py:class:`~ccat_ops_db.models.RawDataPackage` are aggregated into
:py:class:`~ccat_ops_db.models.DataTransferPackage` and transferred between site
buffers:

1. Packages waiting in Chile's buffer are intelligently grouped by size and priority
2. High-performance transfer protocols (e.g. BBCP) move data from Chile buffer to
   Cologne buffer
3. Automated retry mechanisms handle transient network failures
4. Transfer metrics track bandwidth utilization and completion rates

**Key Concept:** The system transfers buffer-to-buffer. This provides a reliable handoff
point between sites - data is considered "received" once it reaches the destination
site's buffer, even before final archiving.

**Multi-Site Design:** While currently configured for Chile → Cologne transfers, the
architecture supports multi-tiered routing (e.g., Chile → Cologne → Cornell) for
geographic redundancy.

Stage 4: Data Integrity & Archiving
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Once data arrives at Cologne's buffer, it moves to permanent storage:

1. DataTransferPackages in Cologne buffer are verified via checksums
2. DataTransferPackages are unpacked into :py:class:`~ccat_ops_db.models.RawDataPackage`
3. :py:class:`~ccat_ops_db.models.RawDataPackage` are transferred to
   :py:class:`~ccat_ops_db.models.DataLocation` of type
   **LONG_TERM_ARCHIVE** storage (S3-compatible Coscine)
4. :py:class:`~ccat_ops_db.models.PhysicalCopy` records track all locations in ops-db
5. IVOA-compliant metadata makes :py:class:`~ccat_ops_db.models.RawDataPackage`
   discoverable

**Storage Technologies:** Archives use :py:class:`~ccat_ops_db.models.DataLocation` of type
**LONG_TERM_ARCHIVE** storage (S3-compatible Coscine), providing
virtually unlimited capacity, built-in redundancy, and REST API access.

Stage 5: Processing (On Demand)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When scientists need data for analysis, the staging workflow retrieves it from archives:

1. Scientist requests specific observations via UI or API
2. Staging manager downloads data from long-term storage
3. Data is unpacked to :py:class:`~ccat_ops_db.models.DataLocation` of type
   **PROCESSING** locations on RAMSES HPC cluster
4. Scientists run their analysis pipelines
5. Processed results are stored back to archives or published

**Key Concept:** Processing storage is temporary. Data is staged on demand and may be
cleaned up after analysis to free space.

Stage 6: Cleanup
^^^^^^^^^^^^^^^^^

After processing is complete, the data on :py:class:`~ccat_ops_db.models.DataLocation`
of type **PROCESSING** and the :py:class:`~ccat_ops_db.models.DataLocation` of type
**SOURCE** as well as the :py:class:`~ccat_ops_db.models.DataLocation` of type
**BUFFER** are deleted automatically. This occurs if the data is replicated in the
:py:class:`~ccat_ops_db.models.DataLocation` of type **LONG_TERM_ARCHIVE** and if
customizable retention policies at the origin
:py:class:`~ccat_ops_db.models.DataLocation` are met. This assures that data does not
have to manually be deleted by operators.

Storage Locations & Types
--------------------------

The data center uses a **location-based model** to track where data physically exists.
Each location has a type that defines its role in the data lifecycle:

Location Types
^^^^^^^^^^^^^^

**SOURCE**
   Instrument computers where data originates (Chile telescope site). Data here is
   transient - once packaged and transferred to the site buffer, it may be deleted to
   free space for new observations.

**BUFFER**
   Intermediate staging areas at each site for transfer operations. Buffers hold
   packaged data during inter-site transfers and provide working space for data
   transformation workflows. Each site has its own buffer:
   
   * Chile buffer: Receives packages from :py:class:`~ccat_ops_db.models.DataLocation`
     of type **SOURCE**, stages for transfer to Cologne
   * Cologne buffer: Receives packages from :py:class:`~ccat_ops_db.models.DataLocation`
     of type **BUFFER** in Chile, stages for archiving

   A Siute can have multiple BUFFER locations, each with a different
   priority and active flag. The system uses:
   
   * **Priority** (lower number = higher priority): Determines which location to use first
   * **Active** flag: Allows temporarily disabling locations for maintenance

**LONG_TERM_ARCHIVE**
   Permanent storage systems designed for data preservation (currently at Cologne).
   Archives provide high capacity and durability, with potential for multiple geographic
   copies for redundancy.

**PROCESSING**
   Temporary storage on HPC clusters where scientists analyze data (RAMSES at Cologne).
   Processing locations provide high-performance access for compute-intensive workflows.

Storage Technologies
^^^^^^^^^^^^^^^^^^^^

Different storage technologies are used based on performance, capacity, and cost requirements:

**Disk**
   Traditional filesystem storage (local or network-mounted). Used for SOURCE, BUFFER,
   and PROCESSING locations where performance matters.

**S3-Compatible Object Storage**
   Cloud-native storage with REST API access. Used for LONG_TERM_ARCHIVE locations
   (Coscine), providing virtually unlimited capacity and built-in redundancy.

**Tape Libraries (Not currently in production)**
   High-capacity sequential storage for archival. Economical for long-term preservation
   of large datasets that are accessed infrequently. (Supported by architecture, not
   currently deployed.)

Geographic Distribution
^^^^^^^^^^^^^^^^^^^^^^^

Storage locations currently span two sites:

* **CCAT Observatory (Chile)** - SOURCE and BUFFER locations at telescope
* **University of Cologne (Germany)** - BUFFER, LONG_TERM_ARCHIVE, and PROCESSING
  (RAMSES)

**Future Expansion:** The architecture supports additional sites (e.g., Cornell for
secondary archive) and multi-tiered transfer routing.

Data Models & Relationships
----------------------------

Understanding the Data Lifecycle
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The CCAT Data Center tracks three parallel hierarchies that converge during
observations:

**Observational Hierarchy:** How observations are organized scientifically

::

   Observatory → Telescope → Instrument → InstrumentModule
                                              ↓
   ObservingProgram → ObsUnit → ExecutedObsUnit

**Data Hierarchy:** How data files are packaged and tracked

::

   RawDataFile → RawDataPackage → DataTransferPackage
          ↓           ↓               ↓ 
         PhysicalCopy (tracks locations)

**Storage Hierarchy:** Where data physically exists

::

   Site → DataLocation → PhysicalCopy

These hierarchies connect when an observation executes: an
:py:class:`~ccat_ops_db.models.ExecutedObsUnit` produces
:py:class:`~ccat_ops_db.models.RawDataFile`, which are packaged into
:py:class:`~ccat_ops_db.models.RawDataPackage`, which have
:py:class:`~ccat_ops_db.models.PhysicalCopy` at various various DataLocations across
different Sites.

Key Terminology
^^^^^^^^^^^^^^^

**ObsUnit**
   A planned observation - what the telescope *will* observe (target, duration,
   instrument configuration)

**ExecutedObsUnit**
   An actual observation execution - what the telescope *did* observe (includes
   start/end times, conditions, quality assessment)

**RawDataFile**
   An individual file produced by an instrument during an observation

**RawDataPackage**
   A tar archive bundling related files from one observation for efficient management

**Site**
   A physical location with data center infrastructure (Chile, Cologne)

**DataLocation**
   A specific storage system at a site (e.g., "chile-buffer-01", "cologne-archive")

**PhysicalCopy**
   A record tracking that a specific file or package exists at a specific location

**Filing**
   The process of registering an observation and its data in ops-db via the API

**Staging**
   Retrieving archived data and making it available at a
   :py:class:`~ccat_ops_db.models.DataLocation` of type **PROCESSING**


Key Principles & Patterns
--------------------------

Single Source of Truth
^^^^^^^^^^^^^^^^^^^^^^

:doc:`/ops-db/docs/index` is the authoritative record for all metadata. Rather than
duplicating information across systems, all components query or update the central
database. This ensures consistency and simplifies data governance.

**Example:** When a :py:class:`~ccat_ops_db.models.RawDataPackage` is archived, the
:doc:`/data-transfer/docs/index` workflow updates its state in :doc:`/ops-db/docs/index`.
The UI, API, and monitoring systems all reflect this change immediately.
state in ops-db. The UI, API, and monitoring systems all reflect this change
immediately.

Buffer-Based Transfer Model
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Inter-site transfers operate buffer-to-buffer rather than directly from source to
archive:

* Isolates transfer operations from active observations
* Provides reliable handoff points between sites
* Enables retry and recovery
* Supports multi-tiered routing for complex site topologies

**Example:** Chile's buffer accumulates packages and transfers to Cologne's buffer
 when enough data for an efficient :py:class:`~ccat_ops_db.models.DataTransferPackage`
 is ready. For long distance transfers optimal package sizes exists in the range of
 10-50TB. The size can be determined by benchmarks.

Automated Lifecycle Management
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Human intervention should be the exception, not the rule. The data-transfer workflows
handle routine operations automatically:

* New observations trigger packaging workflows
* Packages automatically transfer between site buffers when ready
* Integrity checks verify data at each stage
* Failed operations retry with exponential backoff
* Only persistent failures require human attention

Geographic Distribution
^^^^^^^^^^^^^^^^^^^^^^^

Data can be replicated across multiple sites (not currently in production) for
redundancy:

* Network issues can be worked around via alternative routes
* Future expansion can add additional archive sites (e.g. Cornell)

Comprehensive Tracking
^^^^^^^^^^^^^^^^^^^^^^

Every piece of data is tracked from creation through deletion:

* :py:class:`~ccat_ops_db.models.PhysicalCopy` records provide complete audit trail
* Checksums verify integrity at every stage
* Transfer metrics enable performance analysis via :doc:`/data-transfer/docs/monitoring`
* State tracking shows exactly what's happening to data in :doc:`/ops-db/docs/index`

IVOA Compliance
^^^^^^^^^^^^^^^

The data center tries to follow International Virtual Observatory Alliance standards
where possible:

* Metadata enables cross-observatory data discovery
* Astronomical concepts (coordinates, sources, lines) use standard models
* Integration with broader astronomical data ecosystem

Fault Tolerance
^^^^^^^^^^^^^^^

The system is designed to handle failures gracefully:

* Retry mechanisms with exponential backoff
* Circuit breakers prevent cascade failures
* Degraded operation when components fail
* Clear error reporting for investigation
* Recovery procedures for common failure modes

Data Model Reference
^^^^^^^^^^^^^^^^^^^^

The following models are defined in the ops-db package. For complete technical details
including all attributes, relationships, and methods, see the :doc:`/ops-db/docs/index`
documentation.

.. data-model-glossary::
   :module: ccat_ops_db.models
   :sort: alphabetical


Next Steps
----------

Now that you understand the core architecture and data flow:

* **Scientists:** See :doc:`../scientists/guide` to learn how to access and analyze data
* **Instrument Teams:** See :doc:`../instrument/integration` to integrate your instrument
* **Operators:** See :doc:`../operations/datacenter_operations` for deployment and monitoring
* **Developers:** See :doc:`../components/developers` for component technical documentation