Workflow Manager#
The Workflow Manager orchestrates science reduction pipelines for the CCAT Data Center. It picks up where data-transfer leaves off: takes archived/staged raw data, runs containerized reduction software against it on HPC infrastructure, and produces calibrated data products with full provenance tracking back to raw observations.
This documentation covers the pipeline orchestration system — not the database models (see Operations Database (ops-db)), the REST API (see Operations Database API (ops-db-api)), or the deployment infrastructure (see CCAT System Integration Documentation).
The Workflow Manager is the backend for:
ops-db-ui - Pipeline dashboard showing run status, data products, and lineage
ops-db-api - Pipeline endpoints for creating and managing pipelines via REST
ops-db - All pipeline models (Pipeline, ReductionStep, DataProduct, etc.)
What It Does#
The Workflow Manager automates the complete science pipeline lifecycle:
Trigger enabled pipelines when new data arrives or on schedule
Resolve data groupings into sub-groups via a generic filter engine
Stage raw data and upstream intermediates to HPC storage
Execute containerized reduction software via Apptainer on HPC backends
Collect output data products with convention-based directory discovery
Track full provenance lineage from raw observations to final products
Documentation Structure#
Concepts
Pipeline hierarchy, execution model, and key abstractions
Architecture
Manager/worker pattern, HPC backends, and system design
API Reference
Python API for managers, tasks, and core modules
Operations
Configuration, deployment, and monitoring
Container Contract
How to build pipeline containers for CCAT
Integration
How this fits with other CCAT components
System Context#
graph TD
RDP["RawDataPackage<br/>(ops-db)"]
DT["data-transfer<br/>Archive & Stage"]
subgraph WM["Workflow Manager"]
TM["trigger-manager<br/>Evaluate gaps, create runs"]
WMgr["workflow-manager<br/>Build commands, submit jobs"]
RM["result-manager<br/>Collect outputs, track lineage"]
TM --> WMgr --> RM
end
RDP -->|archived data| DT
DT -->|staged files| WM
WM -->|DataProduct records| DB["ops-db"]
style WM fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
style DT fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
style DB fill:#fff3e0,stroke:#ef6c00,stroke-width:2px
Quick Links#
Operations Database (ops-db) - Database models and schema
Operations Database API (ops-db-api) - REST API endpoints for pipelines
CCAT System Integration Documentation - Deployment and infrastructure
Data Transfer System - Data transfer and staging
Getting Started for Developers#
To use workflow-manager in your environment:
Install the package:
pip install -e /path/to/workflow-manager
Configure via environment variables (Dynaconf prefix
CCAT_WORKFLOW_MANAGER_):export CCAT_WORKFLOW_MANAGER_HPC_BACKEND=local export CCAT_WORKFLOW_MANAGER_PIPELINE_BASE_DIR=/data/pipelines export CCAT_WORKFLOW_MANAGER_PROCESSING_LOCATION_ID=1
Start the three manager processes:
ccat_workflow_manager trigger-manager ccat_workflow_manager workflow-manager ccat_workflow_manager result-manager
For detailed configuration, see Configuration.