Transfer Model ============== .. verified:: 2025-11-25 :reviewer: Christof Buchbender ops-db tracks data movement operations but doesn't perform them. The :doc:`/data-transfer/docs/index` package reads these records to orchestrate actual file transfers. This section covers the "how do we route and track transfers" infrastructure. Routing Infrastructure ---------------------- DataTransferRoute ^^^^^^^^^^^^^^^^^ :py:class:`~ccat_ops_db.models.DataTransferRoute` defines how data should flow between sites. Routes are defined at site level, with optional location-level overrides. This decouples transfer logic from hardcoded paths and allows dynamic routing based on network conditions. **Example**: "ccat_to_cologne" route: origin_site=CCAT, destination_site=Cologne, method=bbcp **RouteType Enum**: DIRECT (skip destination buffer), RELAY (route through intermediate site), CUSTOM (location-to-location override). For complete attribute details, see :py:class:`~ccat_ops_db.models.DataTransferRoute`. Transfer Packages ----------------- DataTransferPackage ^^^^^^^^^^^^^^^^^^^ :py:class:`~ccat_ops_db.models.DataTransferPackage` bundles multiple :py:class:`~ccat_ops_db.models.RawDataPackage` objects for efficient network transfer. Optimizes network transfer efficiency - many packages → fewer transfer operations. For long distance transfers, optimal package sizes exist in the range of 10-50TB. One :py:class:`~ccat_ops_db.models.DataTransferPackage` can have multiple :py:class:`~ccat_ops_db.models.DataTransfer` records (same bundle to multiple destinations). For complete attribute details, see :py:class:`~ccat_ops_db.models.DataTransferPackage`. Transfer Operations ------------------- DataTransfer ^^^^^^^^^^^^ :py:class:`~ccat_ops_db.models.DataTransfer` records a specific transfer operation from origin to destination. Separates transfer (move the archive) from unpacking (extract at destination), allowing distinguishing transfer failures from unpacking failures and retrying unpacking without re-transferring. For complete attribute details, see :py:class:`~ccat_ops_db.models.DataTransfer`. DataTransferLog ^^^^^^^^^^^^^^^ :py:class:`~ccat_ops_db.models.DataTransferLog` provides lightweight log entries with references to detailed log files. Avoids storing large log text in database - only stores path. Full command outputs are stored in files, detailed metrics are stored in InfluxDB. For complete attribute details, see :py:class:`~ccat_ops_db.models.DataTransferLog`. Transfer Flow ------------- .. mermaid:: sequenceDiagram participant Source participant Buffer1 participant DTP as DataTransferPackage participant DT as DataTransfer participant Buffer2 participant Archive Source->>Buffer1: Package RawDataPackages Buffer1->>DTP: Create DataTransferPackage DTP->>DT: Create DataTransfer DT->>Buffer2: Transfer archive DT->>Buffer2: Unpack RawDataPackages Buffer2->>Archive: Archive RawDataPackages Archive Operations ------------------ LongTermArchiveTransfer ^^^^^^^^^^^^^^^^^^^^^^^ :py:class:`~ccat_ops_db.models.LongTermArchiveTransfer` tracks transfer of :py:class:`~ccat_ops_db.models.RawDataPackage` objects to permanent archive storage. Separate from :py:class:`~ccat_ops_db.models.DataTransfer` because this transfer happens within a :py:class:`~ccat_ops_db.models.Site` between :py:class:`~ccat_ops_db.models.DataLocation` of type `BUFFER`` and `LONG_TERM_ARCHIVE` and not between two different sites `BUFFERS` like the :py:class:`~ccat_ops_db.models.DataTransfer` does. For complete attribute details, see :py:class:`~ccat_ops_db.models.LongTermArchiveTransfer`. StagingJob ^^^^^^^^^^ :py:class:`~ccat_ops_db.models.StagingJob` makes archived data available for scientific processing. Downloads from long-term archive → unpacks → creates file access records. Multiple packages can be staged together for efficiency. Staging is on-demand (scientist requests data), unlike archive which is fire-and-forget. For complete attribute details, see :py:class:`~ccat_ops_db.models.StagingJob`. Archive and Staging Flow ------------------------ .. mermaid:: sequenceDiagram participant Buffer participant Archive participant LTA as LongTermArchiveTransfer participant Staging participant Processing Buffer->>LTA: Create archive transfer LTA->>Archive: Transfer RawDataPackage Archive->>Staging: Scientist requests data Staging->>Processing: Stage packages Processing->>Archive: Store results Why This Structure? ------------------- **Separation of Routing and Execution** Allows: * Updating routes without affecting transfer history * Multiple transfer strategies * Dynamic routing based on conditions **DataTransferPackage Bundling** Optimizes network usage: * Reduces overhead of many small transfers * Enables resumable transfers * Optimal package sizes (10-50TB for long distance) **Separate Transfer and Unpacking Tracking** Allows: * Distinguishing transfer failures from unpacking failures * Retrying unpacking without re-transferring * Better error diagnosis **LongTermArchiveTransfer and StagingJob Separation** Different workflows: * Archive is fire-and-forget (automatic) * Staging is on-demand (scientist requests) Retry and Failure Handling -------------------------- **DataTransfer**: Retries are handled by the data-transfer package with exponential backoff. Failed transfers can be manually retried. **LongTermArchiveTransfer**: After 3 attempts, the ``requires_intervention`` property returns True, indicating manual intervention is needed. **StagingJob**: Retries follow the same pattern as DataTransfer, but staging failures are less critical (data is still in archive). Integration with Data Transfer Package -------------------------------------- The :doc:`/data-transfer/docs/index` Python package implements the actual transfer operations: 1. **Reads** these records to determine what needs to be transferred 2. **Performs** the actual file transfers using BBCP, S3, or other methods 3. **Updates** status and physical copy records as transfers complete For detailed workflow documentation, see the :doc:`/data-transfer/docs/index` documentation. Related Documentation --------------------- * Complete API reference: :doc:`../api_reference/models` * Location model: :doc:`location_model` * Data model: :doc:`data_model` * Data transfer workflows: :doc:`/data-transfer/docs/index`