Workflow Manager#

✓

Documentation Verified Last checked: 2026-03-07 Reviewer: Christof Buchbender

The Workflow Manager orchestrates science reduction pipelines for the CCAT Data Center. It picks up where data-transfer leaves off: takes archived/staged raw data, runs containerized reduction software against it on HPC infrastructure, and produces calibrated data products with full provenance tracking back to raw observations.

This documentation covers the pipeline orchestration system — not the database models (see Operations Database (ops-db)), the REST API (see Operations Database API (ops-db-api)), or the deployment infrastructure (see CCAT System Integration Documentation).

The Workflow Manager is the backend for:

ops-db-ui - Pipeline dashboard showing run status, data products, and lineage
ops-db-api - Pipeline endpoints for creating and managing pipelines via REST
ops-db - All pipeline models (Pipeline, ReductionStep, DataProduct, etc.)

What It Does#

The Workflow Manager automates the complete science pipeline lifecycle:

Trigger enabled pipelines when new data arrives or on schedule
Resolve data groupings into sub-groups via a generic filter engine
Stage raw data and upstream intermediates to HPC storage
Execute containerized reduction software via Apptainer on HPC backends
Collect output data products with convention-based directory discovery
Track full provenance lineage from raw observations to final products

Documentation Structure#

Concepts

Pipeline hierarchy, execution model, and key abstractions

Overview

Architecture

Manager/worker pattern, HPC backends, and system design

Manager/Worker Pattern

API Reference

Python API for managers, tasks, and core modules

API Reference

Operations

Configuration, deployment, and monitoring

Configuration

Container Contract

How to build pipeline containers for CCAT

Container Contract

Integration

How this fits with other CCAT components

Related Components

System Context#

        graph TD
    RDP["RawDataPackage<br/>(ops-db)"]
    DT["data-transfer<br/>Archive & Stage"]

    subgraph WM["Workflow Manager"]
        TM["trigger-manager<br/>Evaluate gaps, create runs"]
        WMgr["workflow-manager<br/>Build commands, submit jobs"]
        RM["result-manager<br/>Collect outputs, track lineage"]

        TM --> WMgr --> RM
    end

    RDP -->|archived data| DT
    DT -->|staged files| WM
    WM -->|DataProduct records| DB["ops-db"]

    style WM fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
    style DT fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
    style DB fill:#fff3e0,stroke:#ef6c00,stroke-width:2px

Quick Links#

Operations Database (ops-db) - Database models and schema
Operations Database API (ops-db-api) - REST API endpoints for pipelines
CCAT System Integration Documentation - Deployment and infrastructure
Data Transfer System - Data transfer and staging

Getting Started for Developers#

To use workflow-manager in your environment:

Install the package:

pip install -e /path/to/workflow-manager

Configure via environment variables (Dynaconf prefix CCAT_WORKFLOW_MANAGER_):

export CCAT_WORKFLOW_MANAGER_HPC_BACKEND=local
export CCAT_WORKFLOW_MANAGER_PIPELINE_BASE_DIR=/data/pipelines
export CCAT_WORKFLOW_MANAGER_PROCESSING_LOCATION_ID=1

Start the three manager processes:

ccat_workflow_manager trigger-manager
ccat_workflow_manager workflow-manager
ccat_workflow_manager result-manager

For detailed configuration, see Configuration.