# Transfer Model ```{eval-rst} .. verified:: 2025-11-25 :reviewer: Christof Buchbender ``` ops-db tracks data movement operations but doesn't perform them. The {doc}`/data-transfer/docs/index` package reads these records to orchestrate actual file transfers. This section covers the "how do we route and track transfers" infrastructure. ## Routing Infrastructure ### DataTransferRoute {py:class}`~ccat_ops_db.models.DataTransferRoute` defines how data should flow between sites. Routes are defined at site level, with optional location-level overrides. This decouples transfer logic from hardcoded paths and allows dynamic routing based on network conditions. **Example**: "ccat_to_cologne" route: origin_site=CCAT, destination_site=Cologne, method=bbcp **RouteType Enum**: DIRECT (skip destination buffer), RELAY (route through intermediate site), CUSTOM (location-to-location override). For complete attribute details, see {py:class}`~ccat_ops_db.models.DataTransferRoute`. ## Transfer Packages ### DataTransferPackage {py:class}`~ccat_ops_db.models.DataTransferPackage` bundles multiple {py:class}`~ccat_ops_db.models.RawDataPackage` objects for efficient network transfer. Optimizes network transfer efficiency - many packages → fewer transfer operations. For long distance transfers, optimal package sizes exist in the range of 10-50TB. One {py:class}`~ccat_ops_db.models.DataTransferPackage` can have multiple {py:class}`~ccat_ops_db.models.DataTransfer` records (same bundle to multiple destinations). For complete attribute details, see {py:class}`~ccat_ops_db.models.DataTransferPackage`. ## Transfer Operations ### DataTransfer {py:class}`~ccat_ops_db.models.DataTransfer` records a specific transfer operation from origin to destination. Separates transfer (move the archive) from unpacking (extract at destination), allowing distinguishing transfer failures from unpacking failures and retrying unpacking without re-transferring. For complete attribute details, see {py:class}`~ccat_ops_db.models.DataTransfer`. ### DataTransferLog {py:class}`~ccat_ops_db.models.DataTransferLog` provides lightweight log entries with references to detailed log files. Avoids storing large log text in database - only stores path. Full command outputs are stored in files, detailed metrics are stored in InfluxDB. For complete attribute details, see {py:class}`~ccat_ops_db.models.DataTransferLog`. ## Transfer Flow ```{eval-rst} .. mermaid:: sequenceDiagram participant Source participant Buffer1 participant DTP as DataTransferPackage participant DT as DataTransfer participant Buffer2 participant Archive Source->>Buffer1: Package RawDataPackages Buffer1->>DTP: Create DataTransferPackage DTP->>DT: Create DataTransfer DT->>Buffer2: Transfer archive DT->>Buffer2: Unpack RawDataPackages Buffer2->>Archive: Archive RawDataPackages ``` ## Archive Operations ### LongTermArchiveTransfer {py:class}`~ccat_ops_db.models.LongTermArchiveTransfer` tracks transfer of {py:class}`~ccat_ops_db.models.RawDataPackage` objects to permanent archive storage. Separate from {py:class}`~ccat_ops_db.models.DataTransfer` because this transfer happens within a {py:class}`~ccat_ops_db.models.Site` between {py:class}`~ccat_ops_db.models.DataLocation` of type `` BUFFER` `` and `LONG_TERM_ARCHIVE` and not between two different sites `BUFFERS` like the {py:class}`~ccat_ops_db.models.DataTransfer` does. For complete attribute details, see {py:class}`~ccat_ops_db.models.LongTermArchiveTransfer`. ### StagingJob {py:class}`~ccat_ops_db.models.StagingJob` makes archived data available for scientific processing. Downloads from long-term archive → unpacks → creates file access records. Multiple packages can be staged together for efficiency. Staging is on-demand (scientist requests data), unlike archive which is fire-and-forget. For complete attribute details, see {py:class}`~ccat_ops_db.models.StagingJob`. ## Archive and Staging Flow ```{eval-rst} .. mermaid:: sequenceDiagram participant Buffer participant Archive participant LTA as LongTermArchiveTransfer participant Staging participant Processing Buffer->>LTA: Create archive transfer LTA->>Archive: Transfer RawDataPackage Archive->>Staging: Scientist requests data Staging->>Processing: Stage packages Processing->>Archive: Store results ``` ## Why This Structure? **Separation of Routing and Execution** > Allows: > > - Updating routes without affecting transfer history > - Multiple transfer strategies > - Dynamic routing based on conditions **DataTransferPackage Bundling** > Optimizes network usage: > > - Reduces overhead of many small transfers > - Enables resumable transfers > - Optimal package sizes (10-50TB for long distance) **Separate Transfer and Unpacking Tracking** > Allows: > > - Distinguishing transfer failures from unpacking failures > - Retrying unpacking without re-transferring > - Better error diagnosis **LongTermArchiveTransfer and StagingJob Separation** > Different workflows: > > - Archive is fire-and-forget (automatic) > - Staging is on-demand (scientist requests) ## Retry and Failure Handling **DataTransfer**: Retries are handled by the data-transfer package with exponential backoff. Failed transfers can be manually retried. **LongTermArchiveTransfer**: After 3 attempts, the `requires_intervention` property returns True, indicating manual intervention is needed. **StagingJob**: Retries follow the same pattern as DataTransfer, but staging failures are less critical (data is still in archive). ## Integration with Data Transfer Package The {doc}`/data-transfer/docs/index` Python package implements the actual transfer operations: 1. **Reads** these records to determine what needs to be transferred 2. **Performs** the actual file transfers using BBCP, S3, or other methods 3. **Updates** status and physical copy records as transfers complete For detailed workflow documentation, see the {doc}`/data-transfer/docs/index` documentation. ## Related Documentation - Complete API reference: {doc}`../api_reference/models` - Location model: {doc}`location_model` - Data model: {doc}`data_model` - Data transfer workflows: {doc}`/data-transfer/docs/index`