# Container Contract

```{eval-rst}
.. verified:: 2026-03-07
   :reviewer: Christof Buchbender
```

This page describes the interface between the Workflow Manager and science reduction
containers. Instrument teams building pipeline software need to follow this contract.

## Bind Mount Layout

The Workflow Manager runs containers via `apptainer exec` with the following
bind-mounted directory structure:

```text
/workflow/
├── input/           # Raw data from staging (read-only)
├── workspace/       # Shared across ALL steps — intermediates live here
│   ├── calibrate/   # Step 1 output dir
│   ├── baseline/    # Step 2 output dir
│   └── ...
├── output/          # Final pipeline products ONLY (last step writes here)
│   ├── science/     # → DataProductType.SCIENCE
│   ├── qa/          # → DataProductType.QA_METRIC
│   ├── plots/       # → DataProductType.PLOT
│   ├── statistics/  # → DataProductType.STATISTICS
│   └── logs/        # → DataProductType.LOG
├── config/          # Input manifest JSON (read-only)
└── logs/            # Container logs (captured by result-manager)
```

On the host filesystem, this maps to:

```text
/data/pipelines/{pipeline_id}/{sub_group_key}/
├── workspace/       # Persistent across steps, managed by retention
├── output/          # Final products, collected by result-manager
└── logs/
```

Raw input comes from `/data/staged/...` via existing staging paths (read-only bind).

## Output Directory Convention

The `WORKFLOW_OUTPUT_DIR` environment variable tells the container where to write:

- **Intermediate steps**: `WORKFLOW_OUTPUT_DIR=/workflow/workspace/{step_name}/`
- **Final step**: `WORKFLOW_OUTPUT_DIR=/workflow/output/`

The container just writes to `$WORKFLOW_OUTPUT_DIR`. It does not need to know whether
it is intermediate or final.

**For the final step**, use subdirectories to classify outputs:

| Subdirectory  | DataProductType |
| ------------- | --------------- |
| `science/`    | `SCIENCE`       |
| `qa/`         | `QA_METRIC`     |
| `plots/`      | `PLOT`          |
| `statistics/` | `STATISTICS`    |
| `logs/`       | `LOG`           |

Files in unrecognized subdirectories default to `SCIENCE`.

## Input Manifest

The Workflow Manager writes `/workflow/config/manifest.json` before launching the
container. This is the primary way for containers to discover their inputs and
configuration.

```json
{
  "run_id": 123,
  "step_name": "baseline",
  "sub_group_key": "source=NGC253|line=CO43",

  "input_files": [
    {"path": "/workflow/input/chai/2026-03-01/scan001.fits", "role": "science"},
    {"path": "/workflow/input/chai/2026-03-01/hot_load.fits", "role": "calibration"}
  ],

  "input_products": [
    {"path": "/workflow/workspace/calibrate/scan001_cal.hdf5", "data_product_id": 101},
    {"path": "/workflow/workspace/calibrate/scan002_cal.hdf5", "data_product_id": 102}
  ],

  "config": {
    "calibration_mode": "hot_cold",
    "baseline_order": 3
  },

  "environment": {
    "influxdb_url": "...",
    "influxdb_token": "...",
    "ops_db_api_url": "..."
  }
}
```

**Fields:**

`input_files`

: Raw data files from staging. The `role` field is informational.

`input_products`

: Intermediate DataProducts from upstream steps. The `data_product_id` can be used
  to query the API for metadata.

`config`

: Parameters from the ReductionStepConfig, passed through as-is.

`environment`

: Connection details for optional services (InfluxDB metrics, API access).

## Metadata Sidecars

Containers can optionally drop `.metadata.json` sidecar files next to their outputs.
If a file named `scan001_cal.hdf5.metadata.json` exists next to `scan001_cal.hdf5`,
its contents are stored in the DataProduct's `metadata` JSON column.

Example sidecar:

```json
{
  "rms_noise": 0.023,
  "integration_time_s": 3600,
  "calibration_quality": "good"
}
```

## What Instrument Teams Need To Do

1. Read `/workflow/config/manifest.json` for input file list and configuration
2. Read raw data from `/workflow/input/...`
3. Read upstream intermediates from `/workflow/workspace/{prev_step}/...`
4. Write results to `$WORKFLOW_OUTPUT_DIR`
5. For the final step: use subdirectories (`science/`, `plots/`, `qa/`, etc.)
6. Optionally: drop `.metadata.json` sidecars for extra metadata
7. Optionally: push real-time metrics to InfluxDB (connection info in manifest)

No output manifest to write. No schema to learn. Just files in directories.

## Reproducibility

The full `apptainer exec` command used for each run is stored on the
`ExecutedReductionStep.execution_command` field. Combined with the manifest and the
exact container image digest (from ReductionSoftwareVersion), any run can be
reproduced exactly.

CWL support is deferred to v2. For v1, the stored command + manifest provides
equivalent reproducibility.

## Related Documentation

- {doc}`execution_flow` - How runs progress through statuses
- {doc}`pipeline_hierarchy` - ReductionStepConfig and resource requirements