# Workflow Manager ```{eval-rst} .. verified:: 2026-03-07 :reviewer: Christof Buchbender ``` The Workflow Manager orchestrates science reduction pipelines for the CCAT Data Center. It picks up where data-transfer leaves off: takes archived/staged raw data, runs containerized reduction software against it on HPC infrastructure, and produces calibrated data products with full provenance tracking back to raw observations. This documentation covers the **pipeline orchestration system** --- not the database models (see {doc}`/ops-db/docs/index`), the REST API (see {doc}`/ops-db-api/docs/index`), or the deployment infrastructure (see {doc}`/system-integration/docs/index`). The Workflow Manager is the backend for: - **ops-db-ui** - Pipeline dashboard showing run status, data products, and lineage - **ops-db-api** - Pipeline endpoints for creating and managing pipelines via REST - **ops-db** - All pipeline models (Pipeline, ReductionStep, DataProduct, etc.) ## What It Does The Workflow Manager automates the complete science pipeline lifecycle: - **Trigger** enabled pipelines when new data arrives or on schedule - **Resolve** data groupings into sub-groups via a generic filter engine - **Stage** raw data and upstream intermediates to HPC storage - **Execute** containerized reduction software via Apptainer on HPC backends - **Collect** output data products with convention-based directory discovery - **Track** full provenance lineage from raw observations to final products ## Documentation Structure ```{eval-rst} .. grid:: 1 2 2 2 :gutter: 2 .. grid-item-card:: :text-align: center :link: source/concepts/overview :link-type: doc **Concepts** ^^^^^^^^^^^^ Pipeline hierarchy, execution model, and key abstractions .. grid-item-card:: :text-align: center :link: source/architecture/manager_worker :link-type: doc **Architecture** ^^^^^^^^^^^^^^^^ Manager/worker pattern, HPC backends, and system design .. grid-item-card:: :text-align: center :link: source/api_reference/index :link-type: doc **API Reference** ^^^^^^^^^^^^^^^^^ Python API for managers, tasks, and core modules .. grid-item-card:: :text-align: center :link: source/operations/configuration :link-type: doc **Operations** ^^^^^^^^^^^^^^ Configuration, deployment, and monitoring .. grid-item-card:: :text-align: center :link: source/concepts/container_contract :link-type: doc **Container Contract** ^^^^^^^^^^^^^^^^^^^^^^ How to build pipeline containers for CCAT .. grid-item-card:: :text-align: center :link: source/integration/related_components :link-type: doc **Integration** ^^^^^^^^^^^^^^^ How this fits with other CCAT components ``` ## System Context ```{eval-rst} .. mermaid:: graph TD RDP["RawDataPackage
(ops-db)"] DT["data-transfer
Archive & Stage"] subgraph WM["Workflow Manager"] TM["trigger-manager
Evaluate gaps, create runs"] WMgr["workflow-manager
Build commands, submit jobs"] RM["result-manager
Collect outputs, track lineage"] TM --> WMgr --> RM end RDP -->|archived data| DT DT -->|staged files| WM WM -->|DataProduct records| DB["ops-db"] style WM fill:#e3f2fd,stroke:#1565c0,stroke-width:2px style DT fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px style DB fill:#fff3e0,stroke:#ef6c00,stroke-width:2px ``` ## Quick Links - {doc}`/ops-db/docs/index` - Database models and schema - {doc}`/ops-db-api/docs/index` - REST API endpoints for pipelines - {doc}`/system-integration/docs/index` - Deployment and infrastructure - {doc}`/data-transfer/docs/index` - Data transfer and staging ## Getting Started for Developers To use workflow-manager in your environment: 1. Install the package: ``` pip install -e /path/to/workflow-manager ``` 2. Configure via environment variables (Dynaconf prefix `CCAT_WORKFLOW_MANAGER_`): ``` export CCAT_WORKFLOW_MANAGER_HPC_BACKEND=local export CCAT_WORKFLOW_MANAGER_PIPELINE_BASE_DIR=/data/pipelines export CCAT_WORKFLOW_MANAGER_PROCESSING_LOCATION_ID=1 ``` 3. Start the three manager processes: ``` ccat_workflow_manager trigger-manager ccat_workflow_manager workflow-manager ccat_workflow_manager result-manager ``` For detailed configuration, see {doc}`source/operations/configuration`. ## Documentation Contents ```{toctree} :caption: 'Concepts:' :hidden: true :maxdepth: 2 source/concepts/overview source/concepts/pipeline_hierarchy source/concepts/execution_flow source/concepts/data_grouping source/concepts/container_contract ``` ```{toctree} :caption: 'Architecture:' :hidden: true :maxdepth: 2 source/architecture/manager_worker source/architecture/hpc_backends source/architecture/filter_engine ``` ```{toctree} :caption: 'API Reference:' :hidden: true :maxdepth: 2 source/api_reference/index source/api_reference/managers source/api_reference/tasks source/api_reference/hpc source/api_reference/grouping source/api_reference/execution ``` ```{toctree} :caption: 'Operations:' :hidden: true :maxdepth: 1 source/operations/configuration source/operations/deployment ``` ```{toctree} :caption: 'Integration:' :hidden: true :maxdepth: 1 source/integration/related_components ```