# Location Model ```{eval-rst} .. verified:: 2025-11-25 :reviewer: Christof Buchbender ``` CCAT data centers are geographically distributed (Chile, Cologne, Cornell). Data must be tracked across multiple sites, each with multiple storage locations of different types. ## Site {py:class}`~ccat_ops_db.models.Site` groups data locations that belong to the same physical or logical location. **Examples**: : - CCAT (short_name: ccat): Cerro Chajnantor, Chile - telescope location - Cologne (short_name: cologne): University of Cologne, Germany - primary archive - Cornell (short_name: us): Cornell University, USA - US archive For complete attribute details, see {py:class}`~ccat_ops_db.models.Site`. ## DataLocation {py:class}`~ccat_ops_db.models.DataLocation` is the base class for all storage locations with polymorphic storage types. It defines WHERE data can be stored within a site. **LocationType Enum**: SOURCE (telescope instrument computers), BUFFER (intermediate storage), LONG_TERM_ARCHIVE (permanent storage), PROCESSING (temporary analysis areas). **StorageType Enum**: DISK (traditional filesystem), S3 (object storage), TAPE (tape libraries). For complete attribute details, see {py:class}`~ccat_ops_db.models.DataLocation`. ## Polymorphic Storage Types The database uses polymorphic inheritance to support different storage backends: ```{eval-rst} .. mermaid:: graph TB DL[DataLocation
Base Class] DISK[DiskDataLocation] S3[S3DataLocation] TAPE[TapeDataLocation] DL -->|polymorphic| DISK DL -->|polymorphic| S3 DL -->|polymorphic| TAPE style DL fill:#e1f5ff style DISK fill:#fff4e1 style S3 fill:#ffe1f5 style TAPE fill:#e1ffe1 ``` ### DiskDataLocation {py:class}`~ccat_ops_db.models.DiskDataLocation` represents filesystem-based storage (local or remote). Used for local telescope storage, network-mounted buffers, and processing areas. **Example**: FYST source location at "telescope.ccat.cl:/data/fyst" For complete attribute details, see {py:class}`~ccat_ops_db.models.DiskDataLocation`. ### S3DataLocation {py:class}`~ccat_ops_db.models.S3DataLocation` represents object storage for large-scale archival. Used for long-term archives and cloud storage. Credentials are retrieved via {py:func}`~ccat_ops_db.models.S3DataLocation.get_s3_credentials` method using environment variable patterns. **Example**: Cologne long-term archive using Coscine S3-compatible storage For complete attribute details, see {py:class}`~ccat_ops_db.models.S3DataLocation`. ### TapeDataLocation {py:class}`~ccat_ops_db.models.TapeDataLocation` represents tape library systems for deep archival. Used for long-term cold storage with high capacity and low access frequency. Not currently in production, but supported by the architecture. For complete attribute details, see {py:class}`~ccat_ops_db.models.TapeDataLocation`. ## Buffer Hierarchy and Failover Multiple buffer locations can exist at a site, enabling failover and load distribution. **Active Flag**: Indicates if location is operational **Priority Field**: Defines failover order (lower number = higher priority) **Use Case**: If primary buffer is full or offline, data-transfer can route to secondary buffer. **Example**: : - cologne_buffer_1 (priority 0, active=True) - Primary buffer - cologne_buffer_2 (priority 1, active=True) - Secondary buffer The system uses: : - **Priority** (lower number = higher priority): Determines which location to use first - **Active** flag: Allows temporarily disabling locations for maintenance ## Example Locations ```{eval-rst} .. list-table:: Example Data Locations :header-rows: 1 :widths: 20 20 20 20 20 * - Site - Name - LocationType - StorageType - Path/Details * - CCAT - fyst_source - SOURCE - DISK - telescope.ccat.cl:/data/fyst * - Cologne - cologne_buffer_1 - BUFFER - DISK - buffer.data.uni-koeln.de:/mnt/buffer * - Cologne - cologne_lta - LONG_TERM_ARCHIVE - S3 - bucket: ccat-archive * - Cologne - ramses_processing - PROCESSING - DISK - ramses.cluster:/scratch/ccat ``` ## Why This Structure? **Polymorphic Design** : Allows different storage backends without changing core logic. The same code can work with disk, S3, or tape storage. **Site Grouping** : Enables geographic routing and replication strategies. Data can be replicated across multiple sites for redundancy. **Location Type vs Storage Type** : - `location_type` captures functional role (where in the workflow) - `storage_type` captures technical implementation - Separation allows flexibility: A BUFFER location could be DISK or S3 depending on site infrastructure **Active/Priority Fields** : Enable dynamic routing and failover without code changes. Locations can be disabled for maintenance or prioritized based on capacity. ## Integration with Physical Copies Each {py:class}`~ccat_ops_db.models.PhysicalCopy` references a {py:class}`~ccat_ops_db.models.DataLocation`. The `full_path` property combines: - For {py:class}`~ccat_ops_db.models.DiskDataLocation`: `DataLocation.path + file.relative_path` - For {py:class}`~ccat_ops_db.models.S3DataLocation`: `DataLocation.bucket_name + file.relative_path` (S3 key) - For {py:class}`~ccat_ops_db.models.TapeDataLocation`: `DataLocation.mount_path + file.relative_path` ## Geographic Distribution Storage locations currently span multiple sites: - **CCAT Observatory (Chile)** - SOURCE and BUFFER locations at telescope - **University of Cologne (Germany)** - BUFFER, LONG_TERM_ARCHIVE, and PROCESSING (RAMSES) - **Cornell University (USA)** - Future archive site **Future Expansion**: The architecture supports additional sites and multi-tiered transfer routing (e.g., Chile → Cologne → Cornell). ## Related Documentation - Complete API reference: {doc}`../api_reference/models` - Transfer model: {doc}`transfer_model` - Data model: {doc}`data_model`