-
Notifications
You must be signed in to change notification settings - Fork 84
Available databases on ClickHouse
If you need a new database, please reach out to us via https://fb.workplace.com/groups/4571909969591489 (for metamates) or create an issue and book an OH with us at https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours (for external partners).
The default
database that includes all GitHub events, for example workflow_run
. These tables includes the same information as the webhook payload https://docs.github.com/en/webhooks/webhook-events-and-payloads. The list includes:
- issues
- issue_comment
- pull_request
- pull_request_review
- pull_request_review_comment
- push
- workflow_job
- workflow_run
In addition, it also includes several non-GitHub tables migrated there from Rockset. They are custom tables that are created to serve different use cases:
-
failed_test_runs
includes the information about failed tests. It's populated by upload_test_stats.py script. -
job_annotation
is used in HUD to manually annotate a failure into several categories like INFRA_FLAKE, or BROKEN_TRUNK. -
merge_bases
contain the merge base of each pull requests. The information is populated by TD. -
merges
contains the information about merges from mergebot. This is used to compute the important % force merges KPI. -
queue_times_historical
stores the historical queue time by different runner types as populated by updateQueueTimes.mjs script. -
rerun_disabled_tests
is used by rerun disabled tests bot to confirm if a disabled test is still failing in trunk. -
servicelab_torch_dynamo_perf_stats
stores the internal service lab benchmark results. This should be on the benchmark database instead. Having it here is a mistake during the migration. -
test_run_s3
keeps the test time for individual tests on, well, S3. This information is used later to build CI features that depends on test times, for example marking slow tests. -
test_run_summary
aggregates the information intest_run_s3
by test class and provide aggregated test time per class when computing CI test shards.
The benchmark database for all benchmark and metric data. They powers HUD benchmark dashboards. They are being consolidated into oss_ci_benchmark_v3
so that all benchmark data can be found in one place. Until that happens, the list of benchmark tables includes:
-
inductor_torch_dynamo_perf_stats
stores inductor benchmark data from inductor-perf-test-nightly.yml -
inductor_torchao_perf_stats
shares the same schema, but comes from torchao.yml. As the name implies, it's built for torchao. -
oss_ci_benchmark_v2
is the generic benchmark database. It will be deprecated soon and be replaced byoss_ci_benchmark_v3
. -
torchbench_userbenchmark
keeps the TorchBench user benchmark results, which is run by workflows like userbenchmark-a100.yml
- aggregated_test_metrics - to be deleted
- aggregated_test_metrics_with_preproc - to be deleted
- external_contribution_stats - powers the weekly external PR count on the KPIs page of HUD
- metrics_ci_wait_time - to be deleted
- ossci_uploaded_metrics - populated by here
- queue_times_24h_stats - populated by pytorch-gha-infra lambda
- rate_limit - used in future PR (maybe)
- runner_cost - powers cost_analysis page, populated by lambda
- stable_pushes - powers historical strict lag on KPIs page
- test_file_to_oncall_mapping - to be deleted
- workflow_ids_from_test_aggregates - to be deleted
This is a special playground database that grants developers write access to the console by default. This can be used for testing database schemas and syntax, as well as insert queries.