# Manager/Worker Pattern

```{eval-rst}
.. verified:: 2026-03-07
   :reviewer: Christof Buchbender
```

The Workflow Manager follows the same Manager/Worker duality pattern used by the
data-transfer system. This shared pattern enables consistent operational behavior
across the CCAT Data Center.

## Pattern Overview

```{eval-rst}
.. mermaid::

   graph TD
       Manager["Manager Process<br/>(Python, poll loop)"]
       Queue["Celery / Redis<br/>(Task Broker)"]
       Workers["Celery Workers<br/>(Distributed)"]

       Manager -->|"Submit tasks"| Queue
       Queue -->|"Dispatch"| Workers
       Workers -->|"Update DB"| Manager

       style Manager fill:#e1f5ff,stroke:#01579b,stroke-width:2px
       style Queue fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
       style Workers fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
```

**Manager responsibilities:**

- Scan database for work to be done (poll loop)
- Prepare operations (create DB records, validate preconditions)
- Submit Celery tasks to appropriate queues
- Run anywhere with database and Redis access

**Worker responsibilities:**

- Listen to Celery queues
- Perform actual work (stage data, submit HPC jobs, collect results)
- Update database with results
- Must run where needed resources are accessible

## Three Manager Processes

The Workflow Manager runs three independent manager processes:

**trigger-manager**

: Evaluates enabled Pipelines, resolves DataGroupings, computes gaps, creates
  PENDING runs. Lightweight --- mostly database queries.

**workflow-manager**

: Picks up PENDING runs, builds execution commands, creates directory structures,
  submits jobs to HPC backends. Needs filesystem access to the pipeline base directory.

**result-manager**

: Polls HPC backends for job status, scans output directories, creates DataProduct
  records. Needs filesystem access to read outputs.

Each runs as a separate Docker container in production (see
{doc}`/source/operations/deployment`).

## Poll-Dispatch-Sleep Loop

All three managers follow the same loop structure:

```
while not shutdown_requested:
    try:
        session = get_database_session()
        work_items = query_for_work(session)
        for item in work_items:
            process(item, session)
        session.commit()
    except Exception:
        logger.exception("Error in poll cycle")
        session.rollback()
    finally:
        session.close()
    sleep(poll_interval)
```

The poll interval is configurable per manager via Dynaconf settings.

## Celery Integration

The Workflow Manager shares a Redis broker with data-transfer. Celery queues are used
for:

- **Staging tasks** --- reuse data-transfer's staging infrastructure
- **HPC tasks** --- submit and monitor jobs
- **Result tasks** --- collect outputs and metrics

Queue routing is configured in `setup_celery_app.py`. The `workflow.staging`,
`workflow.hpc`, and `workflow.results` queues isolate workflow tasks from
data-transfer tasks on the shared broker.

## Health Checks

Each manager publishes a heartbeat to Redis with a TTL. If the heartbeat expires,
monitoring can detect that a manager is down.

This uses the same `HealthCheck` class from data-transfer, writing to Redis keys
like `health:workflow:trigger-manager`.

## Related Documentation

- {doc}`hpc_backends` - HPC backend implementations
- {doc}`/source/concepts/execution_flow` - How runs progress through statuses
- {doc}`/source/operations/deployment` - Running the managers in production