===========
Performance
===========

The engine is optimised for HPC workloads processing thousands of
scans with up to 16384 channels and 7+ pixels.

Key Optimisations
-----------------

Prefetch I/O Overlap
~~~~~~~~~~~~~~~~~~~~

The pipeline overlaps disk reads with calibration compute using a
dedicated prefetch thread (see :doc:`/source/architecture/pipeline-internals`).
This hides NVMe latency entirely for compute-bound scans.

Vectorized Calibration
~~~~~~~~~~~~~~~~~~~~~~

``calibrate_full()`` precomputes a combined calibration factor
``gamma / (H - C) / tr_s`` with shape ``[C, R, A]`` and broadcasts
it across the dump and subscan axes. The inner loop order
``C → R → A → D → S`` ensures stride-1 access on the innermost
(subscan) dimension.

Zero-Copy ON Data
~~~~~~~~~~~~~~~~~

``PreparedData`` stores indices into ``ScanData.data`` rather than
copying ON-source arrays. For large OTF datasets this avoids a ~1 GB
allocation.

OFF Reference Deduplication
~~~~~~~~~~~~~~~~~~~~~~~~~~~

When multiple ON subscans share the same OFF reference (common in
OTF), the reference average is computed once and reused via a
``HashMap`` cache.

PWV Grid Search
~~~~~~~~~~~~~~~

``residual_ss_precomputed()`` fuses transmission computation and
residual summation in one loop with pre-scaled ATM coefficients.
This eliminates all heap allocation per evaluation and achieves
5–10× speedup over separate function calls.

Binary ATM Table
~~~~~~~~~~~~~~~~

Converting the text ATM table (``.dat.gz``) to binary (``.catm``)
via ``calibrate convert`` reduces load time from ~15s to <100ms
(memory-mapped I/O).

Benchmarking
------------

.. code-block:: bash

   # Criterion micro-benchmarks
   cargo bench -p cal-io

   # Full pipeline benchmark (CHESTER dataset, SLURM)
   sbatch bench_chester.slurm

Typical per-scan timing breakdown (HFAV OTF, 16384 channels, debug build):

.. code-block:: text

   load_ms=753 resolve_ms=244 atm_ms=762
   prepare_ms=2399 calibrate_ms=36876 write_ms=1175

In release builds with LTO, the calibrate stage is ~5× faster.