# Data Formats The calibration engine reads L0 Zarr stores and writes L1 Zarr stores. Both use Zarr v3 with zstd compression and vlen-utf8 string arrays. ## Data Levels ```{eval-rst} .. list-table:: :header-rows: 1 :widths: 10 25 30 35 * - Level - Content - Producer - Consumer * - **L0** - Raw counts + metadata - zarr-fits-core - calibrate (cal-io) * - **L1** - Calibrated :math:`T_A^*` spectra - calibrate (cal-io) - reduction_pipeline * - **L1b** - Baseline-subtracted, flagged - reduction_pipeline - reduction_pipeline * - **L2** - Gridded maps - reduction_pipeline - Science users ``` ## L0 Zarr Structure ```text session.zarr/ ├── zarr.json # Zarr v3 root metadata ├── scan_025650/ │ ├── source/ │ │ ├── data_5d [C, D, R, A, S] i32 raw counts │ │ ├── sobsmode [S] str ON/OFF/OTF-ON/OTF-OFF │ │ ├── mjd [S] f64 Modified Julian Date │ │ ├── exptime [S] f32 integration time (s) │ │ ├── elevation [S] f32 elevation (rad) │ │ ├── azimuth [S] f32 azimuth (rad) │ │ ├── pamb [S] f32 ambient pressure (Torr) │ │ ├── tamb [S] f32 ambient temperature (K) │ │ ├── signal_freq [S] f64 signal sideband freq (Hz) │ │ ├── image_freq [S] f64 image sideband freq (Hz) │ │ ├── freq_res [S] f64 channel width (Hz) │ │ ├── freq_off [S] f64 frequency offset (Hz) │ │ ├── ref_channel [S] f32 reference channel index │ │ ├── lloadsn [S] i32 last-load scan number │ │ ├── otf_lon [S] f64 OTF longitude offset (deg) │ │ ├── otf_lat [S] f64 OTF latitude offset (deg) │ │ ├── pixel_offset_lon [R, A, S] f64 per-pixel focal-plane offset, deg (per-subscan, issue #48) │ │ ├── pixel_offset_lat [R, A, S] f64 per-pixel focal-plane offset, deg (per-subscan, issue #48) │ │ ├── frontend_backends [S] str pixel IDs (e.g. HFAV_PX00) │ │ └── raw_fits_headers [S] str original FITS headers (JSON) │ └── calibration/ │ ├── data_5d [C, D, R, A, S] i32 HOT/COLD counts │ ├── sobsmode [S] str HOT/COL/COLD/SKY │ ├── thot [S] f32 hot load temperature (K) │ ├── tcold [S] f32 cold load temperature (K) │ └── ... (same coords as source) └── scan_025651/ └── ... ``` Each `scan_NNNNNN` group carries scan-level attributes: `scan_number`, `source`, `instmode`, `line`, `date_obs`, `telescope`, `observer`, `rest_freq_hz`, `velocity_source_kms`, `lloadsn`. Profiles with a `scan_metadata` section (e.g. `sofia_upgreat`) add identity / QA keywords verbatim — `mission_id`, `flight_leg`, `obs_id`, `plan_id`, `aor_id`, `aot_id`, instrument config, and atmosphere/QA scalars — for downstream flight/day grouping. See {doc}`developer/telescope-profiles`. ### 5D Array Layout The primary data array has shape `[C, D, R, A, S]`: ```{eval-rst} .. list-table:: :header-rows: 1 :widths: 10 25 65 * - Axis - Name - Description * - C - Channels - Spectral channels (typically 1024–16384) * - D - Dumps - Time samples within a subscan * - R - Receivers - Pixels within an array (e.g. 7 for HFA) * - A - Arrays - Frontend-backend combinations (e.g. HFA, LFA, 4G1–4G4) * - S - Subscans - Observing phases (ON, OFF, HOT, COLD, etc.) ``` Missing dumps are padded with `i32::MIN` (converted to `NaN` on read). ## L1 Zarr Structure ```text l1_output/ ├── zarr.json # cal_schema_version, cal_engine_version └── scan_025650/ ├── spectra [C, D, R, A, S] f64 calibrated T_A* (K) ├── t_sys [C, R, A, S] f64 system temperature (K) ├── t_int [S] f64 integration time (s) ├── t_rec_ssb [C, R, A] f64 receiver temperature SSB (K) ├── gamma [C, R, A] f64 gain calibration factor ├── tau_signal [C] f64 zenith opacity, signal (Np) ├── tau_image [C] f64 zenith opacity, image (Np) ├── t_sky [C, R, A] f64 sky temperature (K) ├── flags [C, D, R, A, S] u16 quality bitmask ├── signal_freqs [C] f64 signal frequencies (Hz) ├── image_freqs [C] f64 image frequencies (Hz) ├── otf_lon [D, S] f64 OTF longitude (deg, if OTF) ├── otf_lat [D, S] f64 OTF latitude (deg, if OTF) ├── otf_airmass [D, S] f64 per-dump airmass (if OTF) ├── pixel_offset_lon [R, A, S] f64 per-pixel focal-plane offset, deg (per-subscan, issue #48) └── pixel_offset_lat [R, A, S] f64 per-pixel focal-plane offset, deg (per-subscan, issue #48) ``` Scan-level attributes include `scan_number`, `source`, `instmode` (reflects calibrated state: OTF, OTF_DBS, TP -- not raw obs mode), `rest_freq_hz`, `pwv_mm`, `cal_strategy`, `ref_strategy`, QA metrics, and **provenance** metadata for L2-to-L0 traceability (source L0 store, calibration scan number, ATM table path, processing parameters). Identity / QA metadata is propagated unchanged from the companion L0 scan group (schema `1.2`): `telescope`, `date_obs`, a representative scan `mjd`, and any keywords carried by the profile's `scan_metadata` section (`mission_id`, `obs_id`, `aor_id`, ...). This lets downstream tools group scans by SOFIA flight or ground day without walking back to L0. L1 `instmode` uses the calibrated observation mode (e.g., `OTF` for OTF TotalPower, `OTF_DBS` for OTF Chopped, `TP` for TotalPower) rather than the raw FITS observation mode string. For multi-frontend observations, L1 output uses **original scan numbers** with band sub-groups (not virtual scan numbers, which are internal-only). ## Schema Constants All array and group names are defined in the `cal-schema` crate (`crates/cal-schema/src/lib.rs`), the single source of truth for both the FITS→Zarr writer and the calibration reader.