Container Contract#

Documentation Verified Last checked: 2026-03-07 Reviewer: Christof Buchbender

This page describes the interface between the Workflow Manager and science reduction containers. Instrument teams building pipeline software need to follow this contract.

Bind Mount Layout#

The Workflow Manager runs containers via apptainer exec with the following bind-mounted directory structure:

/workflow/
├── input/           # Raw data from staging (read-only)
├── workspace/       # Shared across ALL steps — intermediates live here
│   ├── calibrate/   # Step 1 output dir
│   ├── baseline/    # Step 2 output dir
│   └── ...
├── output/          # Final pipeline products ONLY (last step writes here)
│   ├── science/     # → DataProductType.SCIENCE
│   ├── qa/          # → DataProductType.QA_METRIC
│   ├── plots/       # → DataProductType.PLOT
│   ├── statistics/  # → DataProductType.STATISTICS
│   └── logs/        # → DataProductType.LOG
├── config/          # Input manifest JSON (read-only)
└── logs/            # Container logs (captured by result-manager)

On the host filesystem, this maps to:

/data/pipelines/{pipeline_id}/{sub_group_key}/
├── workspace/       # Persistent across steps, managed by retention
├── output/          # Final products, collected by result-manager
└── logs/

Raw input comes from /data/staged/... via existing staging paths (read-only bind).

Output Directory Convention#

The WORKFLOW_OUTPUT_DIR environment variable tells the container where to write:

  • Intermediate steps: WORKFLOW_OUTPUT_DIR=/workflow/workspace/{step_name}/

  • Final step: WORKFLOW_OUTPUT_DIR=/workflow/output/

The container just writes to $WORKFLOW_OUTPUT_DIR. It does not need to know whether it is intermediate or final.

For the final step, use subdirectories to classify outputs:

Subdirectory

DataProductType

science/

SCIENCE

qa/

QA_METRIC

plots/

PLOT

statistics/

STATISTICS

logs/

LOG

Files in unrecognized subdirectories default to SCIENCE.

Input Manifest#

The Workflow Manager writes /workflow/config/manifest.json before launching the container. This is the primary way for containers to discover their inputs and configuration.

{
  "run_id": 123,
  "step_name": "baseline",
  "sub_group_key": "source=NGC253|line=CO43",

  "input_files": [
    {"path": "/workflow/input/chai/2026-03-01/scan001.fits", "role": "science"},
    {"path": "/workflow/input/chai/2026-03-01/hot_load.fits", "role": "calibration"}
  ],

  "input_products": [
    {"path": "/workflow/workspace/calibrate/scan001_cal.hdf5", "data_product_id": 101},
    {"path": "/workflow/workspace/calibrate/scan002_cal.hdf5", "data_product_id": 102}
  ],

  "config": {
    "calibration_mode": "hot_cold",
    "baseline_order": 3
  },

  "environment": {
    "influxdb_url": "...",
    "influxdb_token": "...",
    "ops_db_api_url": "..."
  }
}

Fields:

input_files

Raw data files from staging. The role field is informational.

input_products

Intermediate DataProducts from upstream steps. The data_product_id can be used to query the API for metadata.

config

Parameters from the ReductionStepConfig, passed through as-is.

environment

Connection details for optional services (InfluxDB metrics, API access).

Metadata Sidecars#

Containers can optionally drop .metadata.json sidecar files next to their outputs. If a file named scan001_cal.hdf5.metadata.json exists next to scan001_cal.hdf5, its contents are stored in the DataProduct’s metadata JSON column.

Example sidecar:

{
  "rms_noise": 0.023,
  "integration_time_s": 3600,
  "calibration_quality": "good"
}

What Instrument Teams Need To Do#

  1. Read /workflow/config/manifest.json for input file list and configuration

  2. Read raw data from /workflow/input/...

  3. Read upstream intermediates from /workflow/workspace/{prev_step}/...

  4. Write results to $WORKFLOW_OUTPUT_DIR

  5. For the final step: use subdirectories (science/, plots/, qa/, etc.)

  6. Optionally: drop .metadata.json sidecars for extra metadata

  7. Optionally: push real-time metrics to InfluxDB (connection info in manifest)

No output manifest to write. No schema to learn. Just files in directories.

Reproducibility#

The full apptainer exec command used for each run is stored on the ExecutedReductionStep.execution_command field. Combined with the manifest and the exact container image digest (from ReductionSoftwareVersion), any run can be reproduced exactly.

CWL support is deferred to v2. For v1, the stored command + manifest provides equivalent reproducibility.