# Container Contract ```{eval-rst} .. verified:: 2026-03-07 :reviewer: Christof Buchbender ``` This page describes the interface between the Workflow Manager and science reduction containers. Instrument teams building pipeline software need to follow this contract. ## Bind Mount Layout The Workflow Manager runs containers via `apptainer exec` with the following bind-mounted directory structure: ```text /workflow/ ├── input/ # Raw data from staging (read-only) ├── workspace/ # Shared across ALL steps — intermediates live here │ ├── calibrate/ # Step 1 output dir │ ├── baseline/ # Step 2 output dir │ └── ... ├── output/ # Final pipeline products ONLY (last step writes here) │ ├── science/ # → DataProductType.SCIENCE │ ├── qa/ # → DataProductType.QA_METRIC │ ├── plots/ # → DataProductType.PLOT │ ├── statistics/ # → DataProductType.STATISTICS │ └── logs/ # → DataProductType.LOG ├── config/ # Input manifest JSON (read-only) └── logs/ # Container logs (captured by result-manager) ``` On the host filesystem, this maps to: ```text /data/pipelines/{pipeline_id}/{sub_group_key}/ ├── workspace/ # Persistent across steps, managed by retention ├── output/ # Final products, collected by result-manager └── logs/ ``` Raw input comes from `/data/staged/...` via existing staging paths (read-only bind). ## Output Directory Convention The `WORKFLOW_OUTPUT_DIR` environment variable tells the container where to write: - **Intermediate steps**: `WORKFLOW_OUTPUT_DIR=/workflow/workspace/{step_name}/` - **Final step**: `WORKFLOW_OUTPUT_DIR=/workflow/output/` The container just writes to `$WORKFLOW_OUTPUT_DIR`. It does not need to know whether it is intermediate or final. **For the final step**, use subdirectories to classify outputs: | Subdirectory | DataProductType | | ------------- | --------------- | | `science/` | `SCIENCE` | | `qa/` | `QA_METRIC` | | `plots/` | `PLOT` | | `statistics/` | `STATISTICS` | | `logs/` | `LOG` | Files in unrecognized subdirectories default to `SCIENCE`. ## Input Manifest The Workflow Manager writes `/workflow/config/manifest.json` before launching the container. This is the primary way for containers to discover their inputs and configuration. ```json { "run_id": 123, "step_name": "baseline", "sub_group_key": "source=NGC253|line=CO43", "input_files": [ {"path": "/workflow/input/chai/2026-03-01/scan001.fits", "role": "science"}, {"path": "/workflow/input/chai/2026-03-01/hot_load.fits", "role": "calibration"} ], "input_products": [ {"path": "/workflow/workspace/calibrate/scan001_cal.hdf5", "data_product_id": 101}, {"path": "/workflow/workspace/calibrate/scan002_cal.hdf5", "data_product_id": 102} ], "config": { "calibration_mode": "hot_cold", "baseline_order": 3 }, "environment": { "influxdb_url": "...", "influxdb_token": "...", "ops_db_api_url": "..." } } ``` **Fields:** `input_files` : Raw data files from staging. The `role` field is informational. `input_products` : Intermediate DataProducts from upstream steps. The `data_product_id` can be used to query the API for metadata. `config` : Parameters from the ReductionStepConfig, passed through as-is. `environment` : Connection details for optional services (InfluxDB metrics, API access). ## Metadata Sidecars Containers can optionally drop `.metadata.json` sidecar files next to their outputs. If a file named `scan001_cal.hdf5.metadata.json` exists next to `scan001_cal.hdf5`, its contents are stored in the DataProduct's `metadata` JSON column. Example sidecar: ```json { "rms_noise": 0.023, "integration_time_s": 3600, "calibration_quality": "good" } ``` ## What Instrument Teams Need To Do 1. Read `/workflow/config/manifest.json` for input file list and configuration 2. Read raw data from `/workflow/input/...` 3. Read upstream intermediates from `/workflow/workspace/{prev_step}/...` 4. Write results to `$WORKFLOW_OUTPUT_DIR` 5. For the final step: use subdirectories (`science/`, `plots/`, `qa/`, etc.) 6. Optionally: drop `.metadata.json` sidecars for extra metadata 7. Optionally: push real-time metrics to InfluxDB (connection info in manifest) No output manifest to write. No schema to learn. Just files in directories. ## Reproducibility The full `apptainer exec` command used for each run is stored on the `ExecutedReductionStep.execution_command` field. Combined with the manifest and the exact container image digest (from ReductionSoftwareVersion), any run can be reproduced exactly. CWL support is deferred to v2. For v1, the stored command + manifest provides equivalent reproducibility. ## Related Documentation - {doc}`execution_flow` - How runs progress through statuses - {doc}`pipeline_hierarchy` - ReductionStepConfig and resource requirements