# Overview ```{eval-rst} .. verified:: 2025-11-25 :reviewer: Christof Buchbender ``` ## What is ops-db? ops-db is the core PostgreSQL database schema and SQLAlchemy ORM layer that serves as the **operational brain** of the CCAT Data Center. It tracks everything from observation planning → execution → data movement → archival → publication. The database is implemented using SQLAlchemy ORM, providing a Python interface to all operational data. It serves as the backend for two key interfaces: - **ops-db-api** - RESTful API for programmatic access by instruments and automated systems - **ops-db-ui** - Web interface for browsing observations and monitoring system state ## Design Philosophy **Single Source of Truth** : All operational metadata lives in ops-db, avoiding duplication across systems. This ensures consistency and simplifies data governance. **Polymorphic Models** : Many entity types use SQLAlchemy's polymorphic inheritance to accommodate different subtypes: - {py:class}`~ccat_ops_db.models.Source` has {py:class}`~ccat_ops_db.models.FixedSource`, {py:class}`~ccat_ops_db.models.SolarSystemObject`, and {py:class}`~ccat_ops_db.models.ConstantElevationSource` subtypes - {py:class}`~ccat_ops_db.models.DataLocation` has {py:class}`~ccat_ops_db.models.DiskDataLocation`, {py:class}`~ccat_ops_db.models.S3DataLocation`, and {py:class}`~ccat_ops_db.models.TapeDataLocation` subtypes - {py:class}`~ccat_ops_db.models.PhysicalCopy` has {py:class}`~ccat_ops_db.models.RawDataFilePhysicalCopy`, {py:class}`~ccat_ops_db.models.RawDataPackagePhysicalCopy`, and {py:class}`~ccat_ops_db.models.DataTransferPackagePhysicalCopy` subtypes **Physical Copy Tracking** : The database tracks not just what data exists, but **WHERE** it physically exists through {py:class}`~ccat_ops_db.models.PhysicalCopy` models. This enables safe deletion, staged unpacking, and complete audit trails. **Status-Driven Workflows** : Entities use Status enums ({py:class}`~ccat_ops_db.models.Status`, {py:class}`~ccat_ops_db.models.PackageState`, {py:class}`~ccat_ops_db.models.PhysicalCopyStatus`) to track processing state, enabling automated workflows and retry logic. **Relationship-Rich** : Heavy use of SQLAlchemy relationships maintains referential integrity and enables efficient queries across related entities. ## Major Entity Categories The database organizes data into several major categories: **Observatory Infrastructure** : The physical telescope, instruments, and modules that produce data: {py:class}`~ccat_ops_db.models.Observatory`, {py:class}`~ccat_ops_db.models.Telescope`, {py:class}`~ccat_ops_db.models.Instrument`, {py:class}`~ccat_ops_db.models.InstrumentModule`. See {doc}`observatory_hierarchy` for details. **Scientific Planning** : Observing programs, sub-programs, and observation units that define what to observe: {py:class}`~ccat_ops_db.models.ObservingProgram`, {py:class}`~ccat_ops_db.models.SubObservingProgram`, {py:class}`~ccat_ops_db.models.ObsUnit`, {py:class}`~ccat_ops_db.models.Source`, {py:class}`~ccat_ops_db.models.ObsMode`. See {doc}`observation_model` for details. **Execution Tracking** : Records of actual observations with timing, conditions, and status: {py:class}`~ccat_ops_db.models.ExecutedObsUnit`. See {doc}`observation_model` for details. **Data Management** : Files, packages, and physical copies across multiple storage locations: {py:class}`~ccat_ops_db.models.RawDataFile`, {py:class}`~ccat_ops_db.models.RawDataPackage`, {py:class}`~ccat_ops_db.models.DataTransferPackage`, {py:class}`~ccat_ops_db.models.PhysicalCopy`. See {doc}`data_model` for details. **Transfer Infrastructure** : Sites, locations, routes that define how data moves through the system: {py:class}`~ccat_ops_db.models.Site`, {py:class}`~ccat_ops_db.models.DataLocation`, {py:class}`~ccat_ops_db.models.DataTransferRoute`. See {doc}`location_model` and {doc}`transfer_model` for details. **Archival & Staging** : Long-term archive transfers and staging jobs for processing: {py:class}`~ccat_ops_db.models.LongTermArchiveTransfer`, {py:class}`~ccat_ops_db.models.StagingJob`. See {doc}`transfer_model` for details. **Access Control** : Users, roles, and API tokens for authentication and authorization: {py:class}`~ccat_ops_db.models.User`, {py:class}`~ccat_ops_db.models.Role`, {py:class}`~ccat_ops_db.models.ApiToken`. ## How Data Flows Through ops-db Conceptually, data flows through ops-db as follows: 1. **Planning** - Observing programs and observation units are added prior to observations. 2. **Execution** - Telescope systems create {py:class}`~ccat_ops_db.models.ExecutedObsUnit` records when observations run 3. **Data Registration** - Raw data files are registered and linked to executed observations 4. **Packaging** - Files are bundled into {py:class}`~ccat_ops_db.models.RawDataPackage` for efficient archiving and transfer 5. **Transfer** - Packages are transferred between sites via {py:class}`~ccat_ops_db.models.DataTransferPackage` and {py:class}`~ccat_ops_db.models.DataTransfer` records 6. **Archive** - Packages are archived to long-term storage via {py:class}`~ccat_ops_db.models.LongTermArchiveTransfer` 7. **Physical Copies** - {py:class}`~ccat_ops_db.models.PhysicalCopy` records track where each file/package exists at each stage For detailed workflow documentation, see the {doc}`/data-transfer/docs/index` documentation. ## What ops-db Does NOT Contain ops-db is a **metadata database** - it tracks information about data, not the data itself: - **Actual data files** - Files are stored on disk/S3/tape; ops-db just tracks metadata and locations - **Processing results** - Processed data is equally stored on disk/S3/tape; ops-db just tracks metadata and locations - **Real-time telescope telemetry** - That is tracked in our housekeeping system (InfluxDB) - **Long log files** - Logs are stored on disk; ops-db has references to log file paths ## Integration Points ops-db integrates with several other CCAT components: - **ops-db-api** - Provides RESTful endpoints for programmatic access to the database - **ops-db-ui** - Provides a web interface for browsing and managing database records - **data-transfer** - Reads/writes transfer and archive records, orchestrates actual file movements - **system-integration** - Handles deployment and infrastructure setup For details on integration, see {doc}`../integration/related_components`. ## Entity Relationships ```{eval-rst} .. mermaid:: graph TB subgraph Infrastructure["Observatory Infrastructure"] OBS[Observatory] TEL[Telescope] INST[Instrument] MOD[InstrumentModule] OBS --> TEL TEL --> INST INST --> MOD end subgraph Planning["Scientific Planning"] PROG[ObservingProgram] SUB[SubObservingProgram] OU[ObsUnit] SRC[Source] PROG --> SUB PROG --> OU SUB --> OU SRC --> OU end subgraph Execution["Execution"] EOU[ExecutedObsUnit] OU --> EOU end subgraph Data["Data Management"] RDF[RawDataFile] RDP[RawDataPackage] DTP[DataTransferPackage] PC[PhysicalCopy] EOU --> RDF EOU --> RDP RDF --> RDP RDP --> DTP RDF --> PC RDP --> PC DTP --> PC end subgraph Locations["Storage Locations"] SITE[Site] DL[DataLocation] SITE --> DL PC --> DL end MOD --> RDF MOD --> RDP ``` ## Next Steps Now that you understand the high-level architecture: - **Learn the observatory hierarchy**: See {doc}`observatory_hierarchy` - **Understand observation planning**: See {doc}`observation_model` - **Explore data tracking**: See {doc}`data_model` - **Learn about storage locations**: See {doc}`location_model` - **Understand data transfer**: See {doc}`transfer_model` - **Browse the complete API**: See {doc}`../api_reference/models`