Manager/Worker Pattern#

Documentation Verified Last checked: 2026-03-07 Reviewer: Christof Buchbender

The Workflow Manager follows the same Manager/Worker duality pattern used by the data-transfer system. This shared pattern enables consistent operational behavior across the CCAT Data Center.

Pattern Overview#

        graph TD
    Manager["Manager Process<br/>(Python, poll loop)"]
    Queue["Celery / Redis<br/>(Task Broker)"]
    Workers["Celery Workers<br/>(Distributed)"]

    Manager -->|"Submit tasks"| Queue
    Queue -->|"Dispatch"| Workers
    Workers -->|"Update DB"| Manager

    style Manager fill:#e1f5ff,stroke:#01579b,stroke-width:2px
    style Queue fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    style Workers fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
    

Manager responsibilities:

  • Scan database for work to be done (poll loop)

  • Prepare operations (create DB records, validate preconditions)

  • Submit Celery tasks to appropriate queues

  • Run anywhere with database and Redis access

Worker responsibilities:

  • Listen to Celery queues

  • Perform actual work (stage data, submit HPC jobs, collect results)

  • Update database with results

  • Must run where needed resources are accessible

Three Manager Processes#

The Workflow Manager runs three independent manager processes:

trigger-manager

Evaluates enabled Pipelines, resolves DataGroupings, computes gaps, creates PENDING runs. Lightweight — mostly database queries.

workflow-manager

Picks up PENDING runs, builds execution commands, creates directory structures, submits jobs to HPC backends. Needs filesystem access to the pipeline base directory.

result-manager

Polls HPC backends for job status, scans output directories, creates DataProduct records. Needs filesystem access to read outputs.

Each runs as a separate Docker container in production (see /source/operations/deployment).

Poll-Dispatch-Sleep Loop#

All three managers follow the same loop structure:

while not shutdown_requested:
    try:
        session = get_database_session()
        work_items = query_for_work(session)
        for item in work_items:
            process(item, session)
        session.commit()
    except Exception:
        logger.exception("Error in poll cycle")
        session.rollback()
    finally:
        session.close()
    sleep(poll_interval)

The poll interval is configurable per manager via Dynaconf settings.

Celery Integration#

The Workflow Manager shares a Redis broker with data-transfer. Celery queues are used for:

  • Staging tasks — reuse data-transfer’s staging infrastructure

  • HPC tasks — submit and monitor jobs

  • Result tasks — collect outputs and metrics

Queue routing is configured in setup_celery_app.py. The workflow.staging, workflow.hpc, and workflow.results queues isolate workflow tasks from data-transfer tasks on the shared broker.

Health Checks#

Each manager publishes a heartbeat to Redis with a TTL. If the heartbeat expires, monitoring can detect that a manager is down.

This uses the same HealthCheck class from data-transfer, writing to Redis keys like health:workflow:trigger-manager.