Transfer Model
==============

.. verified:: 2025-11-25
   :reviewer: Christof Buchbender

ops-db tracks data movement operations but doesn't perform them. The
:doc:`/data-transfer/docs/index` package reads these records to orchestrate actual file
transfers. This section covers the "how do we route and track transfers" infrastructure.

Routing Infrastructure
----------------------

DataTransferRoute
^^^^^^^^^^^^^^^^^

:py:class:`~ccat_ops_db.models.DataTransferRoute` defines how data should flow between
sites. Routes are defined at site level, with optional location-level overrides. This
decouples transfer logic from hardcoded paths and allows dynamic routing based on network
conditions.

**Example**: "ccat_to_cologne" route: origin_site=CCAT, destination_site=Cologne,
method=bbcp

**RouteType Enum**: DIRECT (skip destination buffer), RELAY (route through intermediate
site), CUSTOM (location-to-location override).

For complete attribute details, see :py:class:`~ccat_ops_db.models.DataTransferRoute`.

Transfer Packages
-----------------

DataTransferPackage
^^^^^^^^^^^^^^^^^^^

:py:class:`~ccat_ops_db.models.DataTransferPackage` bundles multiple
:py:class:`~ccat_ops_db.models.RawDataPackage` objects for efficient network transfer.
Optimizes network transfer efficiency - many packages → fewer transfer operations. For
long distance transfers, optimal package sizes exist in the range of 10-50TB. One
:py:class:`~ccat_ops_db.models.DataTransferPackage` can have multiple
:py:class:`~ccat_ops_db.models.DataTransfer` records (same bundle to multiple destinations).

For complete attribute details, see :py:class:`~ccat_ops_db.models.DataTransferPackage`.

Transfer Operations
-------------------

DataTransfer
^^^^^^^^^^^^

:py:class:`~ccat_ops_db.models.DataTransfer` records a specific transfer operation from
origin to destination. Separates transfer (move the archive) from unpacking (extract at
destination), allowing distinguishing transfer failures from unpacking failures and
retrying unpacking without re-transferring.

For complete attribute details, see :py:class:`~ccat_ops_db.models.DataTransfer`.

DataTransferLog
^^^^^^^^^^^^^^^

:py:class:`~ccat_ops_db.models.DataTransferLog` provides lightweight log entries with
references to detailed log files. Avoids storing large log text in database - only stores
path. Full command outputs are stored in files, detailed metrics are stored in InfluxDB.

For complete attribute details, see :py:class:`~ccat_ops_db.models.DataTransferLog`.

Transfer Flow
-------------

.. mermaid::

   sequenceDiagram
       participant Source
       participant Buffer1
       participant DTP as DataTransferPackage
       participant DT as DataTransfer
       participant Buffer2
       participant Archive
       
       Source->>Buffer1: Package RawDataPackages
       Buffer1->>DTP: Create DataTransferPackage
       DTP->>DT: Create DataTransfer
       DT->>Buffer2: Transfer archive
       DT->>Buffer2: Unpack RawDataPackages
       Buffer2->>Archive: Archive RawDataPackages

Archive Operations
------------------

LongTermArchiveTransfer
^^^^^^^^^^^^^^^^^^^^^^^

:py:class:`~ccat_ops_db.models.LongTermArchiveTransfer` tracks transfer of
:py:class:`~ccat_ops_db.models.RawDataPackage` objects to permanent archive storage.
Separate from :py:class:`~ccat_ops_db.models.DataTransfer` because this transfer happens
within a :py:class:`~ccat_ops_db.models.Site` between
:py:class:`~ccat_ops_db.models.DataLocation` of type `BUFFER`` and `LONG_TERM_ARCHIVE`
and not between two different sites `BUFFERS` like the
:py:class:`~ccat_ops_db.models.DataTransfer` does.

For complete attribute details, see :py:class:`~ccat_ops_db.models.LongTermArchiveTransfer`.

StagingJob
^^^^^^^^^^

:py:class:`~ccat_ops_db.models.StagingJob` makes archived data available for scientific
processing. Downloads from long-term archive → unpacks → creates file access records.
Multiple packages can be staged together for efficiency. Staging is on-demand (scientist
requests data), unlike archive which is fire-and-forget.

For complete attribute details, see :py:class:`~ccat_ops_db.models.StagingJob`.

Archive and Staging Flow
------------------------

.. mermaid::

   sequenceDiagram
       participant Buffer
       participant Archive
       participant LTA as LongTermArchiveTransfer
       participant Staging
       participant Processing
       
       Buffer->>LTA: Create archive transfer
       LTA->>Archive: Transfer RawDataPackage
       Archive->>Staging: Scientist requests data
       Staging->>Processing: Stage packages
       Processing->>Archive: Store results

Why This Structure?
-------------------

**Separation of Routing and Execution**

   Allows:

   * Updating routes without affecting transfer history
   * Multiple transfer strategies
   * Dynamic routing based on conditions

**DataTransferPackage Bundling**

   Optimizes network usage:

   * Reduces overhead of many small transfers
   * Enables resumable transfers
   * Optimal package sizes (10-50TB for long distance)

**Separate Transfer and Unpacking Tracking**

   Allows:

   * Distinguishing transfer failures from unpacking failures
   * Retrying unpacking without re-transferring
   * Better error diagnosis

**LongTermArchiveTransfer and StagingJob Separation**

   Different workflows:

   * Archive is fire-and-forget (automatic)
   * Staging is on-demand (scientist requests)

Retry and Failure Handling
--------------------------

**DataTransfer**: Retries are handled by the data-transfer package with exponential
backoff. Failed transfers can be manually retried.

**LongTermArchiveTransfer**: After 3 attempts, the ``requires_intervention`` property
returns True, indicating manual intervention is needed.

**StagingJob**: Retries follow the same pattern as DataTransfer, but staging failures
are less critical (data is still in archive).

Integration with Data Transfer Package
--------------------------------------

The :doc:`/data-transfer/docs/index` Python package implements the actual transfer
operations:

1. **Reads** these records to determine what needs to be transferred
2. **Performs** the actual file transfers using BBCP, S3, or other methods
3. **Updates** status and physical copy records as transfers complete

For detailed workflow documentation, see the :doc:`/data-transfer/docs/index`
documentation.

Related Documentation
---------------------

* Complete API reference: :doc:`../api_reference/models`
* Location model: :doc:`location_model`
* Data model: :doc:`data_model`
* Data transfer workflows: :doc:`/data-transfer/docs/index`