Workflow Manager#

Documentation Verified Last checked: 2026-03-07 Reviewer: Christof Buchbender

The Workflow Manager orchestrates science reduction pipelines for the CCAT Data Center. It picks up where data-transfer leaves off: takes archived/staged raw data, runs containerized reduction software against it on HPC infrastructure, and produces calibrated data products with full provenance tracking back to raw observations.

This documentation covers the pipeline orchestration system — not the database models (see Operations Database (ops-db)), the REST API (see Operations Database API (ops-db-api)), or the deployment infrastructure (see CCAT System Integration Documentation).

The Workflow Manager is the backend for:

  • ops-db-ui - Pipeline dashboard showing run status, data products, and lineage

  • ops-db-api - Pipeline endpoints for creating and managing pipelines via REST

  • ops-db - All pipeline models (Pipeline, ReductionStep, DataProduct, etc.)

What It Does#

The Workflow Manager automates the complete science pipeline lifecycle:

  • Trigger enabled pipelines when new data arrives or on schedule

  • Resolve data groupings into sub-groups via a generic filter engine

  • Stage raw data and upstream intermediates to HPC storage

  • Execute containerized reduction software via Apptainer on HPC backends

  • Collect output data products with convention-based directory discovery

  • Track full provenance lineage from raw observations to final products

Documentation Structure#

Concepts

Pipeline hierarchy, execution model, and key abstractions

Overview

Architecture

Manager/worker pattern, HPC backends, and system design

Manager/Worker Pattern

API Reference

Python API for managers, tasks, and core modules

API Reference

Operations

Configuration, deployment, and monitoring

Configuration

Container Contract

How to build pipeline containers for CCAT

Container Contract

Integration

How this fits with other CCAT components

Related Components

System Context#

        graph TD
    RDP["RawDataPackage<br/>(ops-db)"]
    DT["data-transfer<br/>Archive & Stage"]

    subgraph WM["Workflow Manager"]
        TM["trigger-manager<br/>Evaluate gaps, create runs"]
        WMgr["workflow-manager<br/>Build commands, submit jobs"]
        RM["result-manager<br/>Collect outputs, track lineage"]

        TM --> WMgr --> RM
    end

    RDP -->|archived data| DT
    DT -->|staged files| WM
    WM -->|DataProduct records| DB["ops-db"]

    style WM fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
    style DT fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
    style DB fill:#fff3e0,stroke:#ef6c00,stroke-width:2px
    

Getting Started for Developers#

To use workflow-manager in your environment:

  1. Install the package:

    pip install -e /path/to/workflow-manager
    
  2. Configure via environment variables (Dynaconf prefix CCAT_WORKFLOW_MANAGER_):

    export CCAT_WORKFLOW_MANAGER_HPC_BACKEND=local
    export CCAT_WORKFLOW_MANAGER_PIPELINE_BASE_DIR=/data/pipelines
    export CCAT_WORKFLOW_MANAGER_PROCESSING_LOCATION_ID=1
    
  3. Start the three manager processes:

    ccat_workflow_manager trigger-manager
    ccat_workflow_manager workflow-manager
    ccat_workflow_manager result-manager
    

For detailed configuration, see Configuration.

Documentation Contents#