# Transfer Model

```{eval-rst}
.. verified:: 2025-11-25
   :reviewer: Christof Buchbender
```

ops-db tracks data movement operations but doesn't perform them. The
{doc}`/data-transfer/docs/index` package reads these records to orchestrate actual file
transfers. This section covers the "how do we route and track transfers" infrastructure.

## Routing Infrastructure

### DataTransferRoute

{py:class}`~ccat_ops_db.models.DataTransferRoute` defines how data should flow between
sites. Routes are defined at site level, with optional location-level overrides. This
decouples transfer logic from hardcoded paths and allows dynamic routing based on network
conditions.

**Example**: "ccat_to_cologne" route: origin_site=CCAT, destination_site=Cologne,
method=bbcp

**RouteType Enum**: DIRECT (skip destination buffer), RELAY (route through intermediate
site), CUSTOM (location-to-location override).

For complete attribute details, see {py:class}`~ccat_ops_db.models.DataTransferRoute`.

## Transfer Packages

### DataTransferPackage

{py:class}`~ccat_ops_db.models.DataTransferPackage` bundles multiple
{py:class}`~ccat_ops_db.models.RawDataPackage` objects for efficient network transfer.
Optimizes network transfer efficiency - many packages → fewer transfer operations. For
long distance transfers, optimal package sizes exist in the range of 10-50TB. One
{py:class}`~ccat_ops_db.models.DataTransferPackage` can have multiple
{py:class}`~ccat_ops_db.models.DataTransfer` records (same bundle to multiple destinations).

For complete attribute details, see {py:class}`~ccat_ops_db.models.DataTransferPackage`.

## Transfer Operations

### DataTransfer

{py:class}`~ccat_ops_db.models.DataTransfer` records a specific transfer operation from
origin to destination. Separates transfer (move the archive) from unpacking (extract at
destination), allowing distinguishing transfer failures from unpacking failures and
retrying unpacking without re-transferring.

For complete attribute details, see {py:class}`~ccat_ops_db.models.DataTransfer`.

### DataTransferLog

{py:class}`~ccat_ops_db.models.DataTransferLog` provides lightweight log entries with
references to detailed log files. Avoids storing large log text in database - only stores
path. Full command outputs are stored in files, detailed metrics are stored in InfluxDB.

For complete attribute details, see {py:class}`~ccat_ops_db.models.DataTransferLog`.

## Transfer Flow

```{eval-rst}
.. mermaid::

   sequenceDiagram
       participant Source
       participant Buffer1
       participant DTP as DataTransferPackage
       participant DT as DataTransfer
       participant Buffer2
       participant Archive

       Source->>Buffer1: Package RawDataPackages
       Buffer1->>DTP: Create DataTransferPackage
       DTP->>DT: Create DataTransfer
       DT->>Buffer2: Transfer archive
       DT->>Buffer2: Unpack RawDataPackages
       Buffer2->>Archive: Archive RawDataPackages
```

## Archive Operations

### LongTermArchiveTransfer

{py:class}`~ccat_ops_db.models.LongTermArchiveTransfer` tracks transfer of
{py:class}`~ccat_ops_db.models.RawDataPackage` objects to permanent archive storage.
Separate from {py:class}`~ccat_ops_db.models.DataTransfer` because this transfer happens
within a {py:class}`~ccat_ops_db.models.Site` between
{py:class}`~ccat_ops_db.models.DataLocation` of type `` BUFFER` `` and `LONG_TERM_ARCHIVE`
and not between two different sites `BUFFERS` like the
{py:class}`~ccat_ops_db.models.DataTransfer` does.

For complete attribute details, see {py:class}`~ccat_ops_db.models.LongTermArchiveTransfer`.

### StagingJob

{py:class}`~ccat_ops_db.models.StagingJob` makes archived data available for scientific
processing. Downloads from long-term archive → unpacks → creates file access records.
Multiple packages can be staged together for efficiency. Staging is on-demand (scientist
requests data), unlike archive which is fire-and-forget.

For complete attribute details, see {py:class}`~ccat_ops_db.models.StagingJob`.

## Archive and Staging Flow

```{eval-rst}
.. mermaid::

   sequenceDiagram
       participant Buffer
       participant Archive
       participant LTA as LongTermArchiveTransfer
       participant Staging
       participant Processing

       Buffer->>LTA: Create archive transfer
       LTA->>Archive: Transfer RawDataPackage
       Archive->>Staging: Scientist requests data
       Staging->>Processing: Stage packages
       Processing->>Archive: Store results
```

## Why This Structure?

**Separation of Routing and Execution**

> Allows:
>
> - Updating routes without affecting transfer history
> - Multiple transfer strategies
> - Dynamic routing based on conditions

**DataTransferPackage Bundling**

> Optimizes network usage:
>
> - Reduces overhead of many small transfers
> - Enables resumable transfers
> - Optimal package sizes (10-50TB for long distance)

**Separate Transfer and Unpacking Tracking**

> Allows:
>
> - Distinguishing transfer failures from unpacking failures
> - Retrying unpacking without re-transferring
> - Better error diagnosis

**LongTermArchiveTransfer and StagingJob Separation**

> Different workflows:
>
> - Archive is fire-and-forget (automatic)
> - Staging is on-demand (scientist requests)

## Retry and Failure Handling

**DataTransfer**: Retries are handled by the data-transfer package with exponential
backoff. Failed transfers can be manually retried.

**LongTermArchiveTransfer**: After 3 attempts, the `requires_intervention` property
returns True, indicating manual intervention is needed.

**StagingJob**: Retries follow the same pattern as DataTransfer, but staging failures
are less critical (data is still in archive).

## Integration with Data Transfer Package

The {doc}`/data-transfer/docs/index` Python package implements the actual transfer
operations:

1. **Reads** these records to determine what needs to be transferred
2. **Performs** the actual file transfers using BBCP, S3, or other methods
3. **Updates** status and physical copy records as transfers complete

For detailed workflow documentation, see the {doc}`/data-transfer/docs/index`
documentation.

## Related Documentation

- Complete API reference: {doc}`../api_reference/models`
- Location model: {doc}`location_model`
- Data model: {doc}`data_model`
- Data transfer workflows: {doc}`/data-transfer/docs/index`