Overview#
The Workflow Manager is the science pipeline orchestration system for the CCAT Data Center. It takes archived raw observation data, runs reduction software against it on HPC infrastructure, and produces calibrated data products with full provenance tracking.
What Problem Does It Solve?#
After the data-transfer system archives raw observations, scientists need to:
Select which data to process (by source, instrument, observation configuration, etc.)
Run calibration, baseline subtraction, gridding, and map-making software
Track which raw data produced which outputs (provenance)
Re-process when new data arrives or software improves
Manage intermediate products and disk space
The Workflow Manager automates all of this. Scientists define what to process and how to process it. The system handles the when and where automatically.
Key Abstractions#
The system is built around a small number of core concepts:
- Pipeline
A DAG (directed acyclic graph) of reduction steps. The top-level object that defines what data to process, where to process it, and when to trigger. See Pipeline Hierarchy.
- ReductionStep
One execution unit within a Pipeline. Ties together a piece of reduction software with configuration and scheduling parameters. Steps are ordered and can depend on upstream steps.
- DataGrouping
Defines “what data” to process using declarative filter rules. Shared across pipelines — different pipelines can use the same grouping with different execution granularity. See Data Grouping & Filter Engine.
- ExecutedReductionStep
A concrete run of a ReductionStep for a specific sub-group of data. Tracks status, HPC job ID, execution metrics, and links to input/output data products.
- DataProduct
An output file produced by a run. Carries type classification (SCIENCE, QA, PLOT, etc.), checksums, quality metrics, and provenance lineage back to raw data.
Two Operational Modes#
Standing Pipelines — always-on, autonomous, incremental:
Broad DataGrouping (e.g., “all CHAI data, state=archived”)
Trigger: CONTINUOUS — system continuously evaluates gaps per step
Per-step cooldown: calibration fires fast, map fires daily
Intermediate products cached in workspace; purged under disk pressure
Ad-hoc Pipelines — scientist one-offs:
Specific DataGrouping (e.g., “NGC253, CO43, specific obs dates”)
Trigger: MANUAL
Custom config tweaks per ReductionStep
Run once, done
How Data Flows#
sequenceDiagram
participant Sci as Scientist
participant API as ops-db-api
participant TM as trigger-manager
participant WM as workflow-manager
participant HPC as HPC Backend
participant RM as result-manager
participant DB as ops-db
Sci->>API: Create Pipeline + Steps
Note over TM: Poll loop
TM->>DB: Resolve DataGrouping
TM->>DB: Evaluate gap per step
TM->>DB: Create ExecutedReductionStep (PENDING)
Note over WM: Poll loop
WM->>DB: Pick up PENDING runs
WM->>HPC: Submit Apptainer job
WM->>DB: Update status (SUBMITTED)
Note over RM: Poll loop
RM->>HPC: Check job status
RM->>DB: Scan output directory
RM->>DB: Create DataProduct records
RM->>DB: Update status (COMPLETED)