Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Schema][Utilization] Add schema tables for job utilization (#6183)
# Overview Add two tables in misc database for utilization data. - oss_ci_utilization_metadata: metadata - oss_ci_time_series: time-series table to store time-series data Utilization Data Pipeline Steps: 1. Modify monitor script for final data model (Done) 2. Add S3 bucket for ready-to-insert files (Done) 3. **Add Clickhouse database schemas (This Pr)** 4. Setup logic in upload_artifact to process log raw data and insert clean data into the ready-to-insert s3 bucket - notice we will generate two files, one for metadata table, and one for timeseries table. metadata table is single insertion, while time-series table is batch opertaion. 5. set up s3 replicator generator to insert table Doc Design https://docs.google.com/document/d/151uzLPpOTVcfdfDgFHmGqztiyWwHLI8OR-U3W9QH0lA/edit?tab=t.0 # Details TTL (time to live) All records are set time to live for a year using created_at timestamp, this gives us flexibility to re-insert hot data in the future. The data is backed up in S3, Use S3 replicator approach to insert data, see guidance: https://github.com/pytorch/test-infra/wiki/How-to-add-a-new-custom-table-on-ClickHouse See the data pipeline beflow: ![image](https://github.com/user-attachments/assets/87e1792b-6638-48d2-8613-efd7236f6426)
- Loading branch information