Skip to content

Available databases on ClickHouse

Huy Do edited this page Nov 19, 2024 · 11 revisions

The default database

The default database that includes all GitHub events, for example workflow_run. These tables includes the same information as the webhook payload https://docs.github.com/en/webhooks/webhook-events-and-payloads. The list includes:

In addition, it also includes several non-GitHub tables migrated there from Rockset. They are custom tables that are created to serve different use cases:

  • failed_test_runs includes the information about failed tests. It's populated by upload_test_stats.py script.
  • job_annotation is used in HUD to manually annotate a failure into several categories like INFRA_FLAKE, or BROKEN_TRUNK.
  • merge_bases contain the merge base of each pull requests. The information is populated by TD
  • merges contains the information about merges from mergebot. This is used to compute the important % force merges KPI.
  • queue_times_historical stores the historical queue time by different runner types as populated by updateQueueTimes.mjs script.
  • rerun_disabled_tests is used by rerun disabled tests bot to confirm if a disabled test is still failing in trunk.
  • servicelab_torch_dynamo_perf_stats stores the internal service lab benchmark results. This should be on the benchmark database instead. Having it here is a mistake.
  • test_run_s3 keeps the test time for individual tests on, well, S3. This information is used later to build CI features that depends on test times, for example marking slow tests.
  • test_run_summary aggregates the information in test_run_s3 by test class and provide aggregated test time per class when computing CI test shards.

The benchmark database

The benchmark database for all benchmark and metric data. They powers HUD benchmark dashboards. They are being phased out and consolidated into oss_ci_benchmark_v3.