Location Model#
CCAT data centers are geographically distributed (Chile, Cologne, Cornell). Data must be tracked across multiple sites, each with multiple storage locations of different types.
Site#
Site groups data locations that belong to the same
physical or logical location.
- Examples:
CCAT (short_name: ccat): Cerro Chajnantor, Chile - telescope location
Cologne (short_name: cologne): University of Cologne, Germany - primary archive
Cornell (short_name: us): Cornell University, USA - US archive
For complete attribute details, see Site.
DataLocation#
DataLocation is the base class for all storage locations
with polymorphic storage types. It defines WHERE data can be stored within a site.
LocationType Enum: SOURCE (telescope instrument computers), BUFFER (intermediate storage), LONG_TERM_ARCHIVE (permanent storage), PROCESSING (temporary analysis areas).
StorageType Enum: DISK (traditional filesystem), S3 (object storage), TAPE (tape libraries).
For complete attribute details, see DataLocation.
Polymorphic Storage Types#
The database uses polymorphic inheritance to support different storage backends:
graph TB
DL[DataLocation<br/>Base Class]
DISK[DiskDataLocation]
S3[S3DataLocation]
TAPE[TapeDataLocation]
DL -->|polymorphic| DISK
DL -->|polymorphic| S3
DL -->|polymorphic| TAPE
style DL fill:#e1f5ff
style DISK fill:#fff4e1
style S3 fill:#ffe1f5
style TAPE fill:#e1ffe1
DiskDataLocation#
DiskDataLocation represents filesystem-based storage
(local or remote). Used for local telescope storage, network-mounted buffers, and processing areas.
Example: FYST source location at “telescope.ccat.cl:/data/fyst”
For complete attribute details, see DiskDataLocation.
S3DataLocation#
S3DataLocation represents object storage for large-scale
archival. Used for long-term archives and cloud storage. Credentials are retrieved via
get_s3_credentials() method using environment variable patterns.
Example: Cologne long-term archive using Coscine S3-compatible storage
For complete attribute details, see S3DataLocation.
TapeDataLocation#
TapeDataLocation represents tape library systems for
deep archival. Used for long-term cold storage with high capacity and low access frequency.
Not currently in production, but supported by the architecture.
For complete attribute details, see TapeDataLocation.
Buffer Hierarchy and Failover#
Multiple buffer locations can exist at a site, enabling failover and load distribution.
Active Flag: Indicates if location is operational
Priority Field: Defines failover order (lower number = higher priority)
Use Case: If primary buffer is full or offline, data-transfer can route to secondary buffer.
- Example:
cologne_buffer_1 (priority 0, active=True) - Primary buffer
cologne_buffer_2 (priority 1, active=True) - Secondary buffer
- The system uses:
Priority (lower number = higher priority): Determines which location to use first
Active flag: Allows temporarily disabling locations for maintenance
Example Locations#
Site |
Name |
LocationType |
StorageType |
Path/Details |
|---|---|---|---|---|
CCAT |
fyst_source |
SOURCE |
DISK |
telescope.ccat.cl:/data/fyst |
Cologne |
cologne_buffer_1 |
BUFFER |
DISK |
buffer.data.uni-koeln.de:/mnt/buffer |
Cologne |
cologne_lta |
LONG_TERM_ARCHIVE |
S3 |
bucket: ccat-archive |
Cologne |
ramses_processing |
PROCESSING |
DISK |
ramses.cluster:/scratch/ccat |
Why This Structure?#
- Polymorphic Design
Allows different storage backends without changing core logic. The same code can work with disk, S3, or tape storage.
- Site Grouping
Enables geographic routing and replication strategies. Data can be replicated across multiple sites for redundancy.
- Location Type vs Storage Type
location_typecaptures functional role (where in the workflow)storage_typecaptures technical implementationSeparation allows flexibility: A BUFFER location could be DISK or S3 depending on site infrastructure
- Active/Priority Fields
Enable dynamic routing and failover without code changes. Locations can be disabled for maintenance or prioritized based on capacity.
Integration with Physical Copies#
Each PhysicalCopy references a
DataLocation. The full_path property combines:
For
DiskDataLocation:DataLocation.path + file.relative_pathFor
S3DataLocation:DataLocation.bucket_name + file.relative_path(S3 key)For
TapeDataLocation:DataLocation.mount_path + file.relative_path
Geographic Distribution#
Storage locations currently span multiple sites:
CCAT Observatory (Chile) - SOURCE and BUFFER locations at telescope
University of Cologne (Germany) - BUFFER, LONG_TERM_ARCHIVE, and PROCESSING (RAMSES)
Cornell University (USA) - Future archive site
Future Expansion: The architecture supports additional sites and multi-tiered transfer routing (e.g., Chile → Cologne → Cornell).