Location Model#

Documentation Verified Last checked: 2025-11-25 Reviewer: Christof Buchbender

CCAT data centers are geographically distributed (Chile, Cologne, Cornell). Data must be tracked across multiple sites, each with multiple storage locations of different types.

Site#

Site groups data locations that belong to the same physical or logical location.

Examples:
  • CCAT (short_name: ccat): Cerro Chajnantor, Chile - telescope location

  • Cologne (short_name: cologne): University of Cologne, Germany - primary archive

  • Cornell (short_name: us): Cornell University, USA - US archive

For complete attribute details, see Site.

DataLocation#

DataLocation is the base class for all storage locations with polymorphic storage types. It defines WHERE data can be stored within a site.

LocationType Enum: SOURCE (telescope instrument computers), BUFFER (intermediate storage), LONG_TERM_ARCHIVE (permanent storage), PROCESSING (temporary analysis areas).

StorageType Enum: DISK (traditional filesystem), S3 (object storage), TAPE (tape libraries).

For complete attribute details, see DataLocation.

Polymorphic Storage Types#

The database uses polymorphic inheritance to support different storage backends:

        graph TB
    DL[DataLocation<br/>Base Class]
    DISK[DiskDataLocation]
    S3[S3DataLocation]
    TAPE[TapeDataLocation]

    DL -->|polymorphic| DISK
    DL -->|polymorphic| S3
    DL -->|polymorphic| TAPE

    style DL fill:#e1f5ff
    style DISK fill:#fff4e1
    style S3 fill:#ffe1f5
    style TAPE fill:#e1ffe1
    

DiskDataLocation#

DiskDataLocation represents filesystem-based storage (local or remote). Used for local telescope storage, network-mounted buffers, and processing areas.

Example: FYST source location at “telescope.ccat.cl:/data/fyst”

For complete attribute details, see DiskDataLocation.

S3DataLocation#

S3DataLocation represents object storage for large-scale archival. Used for long-term archives and cloud storage. Credentials are retrieved via get_s3_credentials() method using environment variable patterns.

Example: Cologne long-term archive using Coscine S3-compatible storage

For complete attribute details, see S3DataLocation.

TapeDataLocation#

TapeDataLocation represents tape library systems for deep archival. Used for long-term cold storage with high capacity and low access frequency. Not currently in production, but supported by the architecture.

For complete attribute details, see TapeDataLocation.

Buffer Hierarchy and Failover#

Multiple buffer locations can exist at a site, enabling failover and load distribution.

Active Flag: Indicates if location is operational

Priority Field: Defines failover order (lower number = higher priority)

Use Case: If primary buffer is full or offline, data-transfer can route to secondary buffer.

Example:
  • cologne_buffer_1 (priority 0, active=True) - Primary buffer

  • cologne_buffer_2 (priority 1, active=True) - Secondary buffer

The system uses:
  • Priority (lower number = higher priority): Determines which location to use first

  • Active flag: Allows temporarily disabling locations for maintenance

Example Locations#

Example Data Locations#

Site

Name

LocationType

StorageType

Path/Details

CCAT

fyst_source

SOURCE

DISK

telescope.ccat.cl:/data/fyst

Cologne

cologne_buffer_1

BUFFER

DISK

buffer.data.uni-koeln.de:/mnt/buffer

Cologne

cologne_lta

LONG_TERM_ARCHIVE

S3

bucket: ccat-archive

Cologne

ramses_processing

PROCESSING

DISK

ramses.cluster:/scratch/ccat

Why This Structure?#

Polymorphic Design

Allows different storage backends without changing core logic. The same code can work with disk, S3, or tape storage.

Site Grouping

Enables geographic routing and replication strategies. Data can be replicated across multiple sites for redundancy.

Location Type vs Storage Type
  • location_type captures functional role (where in the workflow)

  • storage_type captures technical implementation

  • Separation allows flexibility: A BUFFER location could be DISK or S3 depending on site infrastructure

Active/Priority Fields

Enable dynamic routing and failover without code changes. Locations can be disabled for maintenance or prioritized based on capacity.

Integration with Physical Copies#

Each PhysicalCopy references a DataLocation. The full_path property combines:

Geographic Distribution#

Storage locations currently span multiple sites:

  • CCAT Observatory (Chile) - SOURCE and BUFFER locations at telescope

  • University of Cologne (Germany) - BUFFER, LONG_TERM_ARCHIVE, and PROCESSING (RAMSES)

  • Cornell University (USA) - Future archive site

Future Expansion: The architecture supports additional sites and multi-tiered transfer routing (e.g., Chile → Cologne → Cornell).