feature benchmark: repeated scenario runs #29637

nrainer-materialize · 2024-09-18T15:41:41Z

This implements MaterializeInc/database-issues#8565.

Nightly

https://buildkite.com/materialize/nightly/builds?branch=nrainer-materialize%3Afeature-benchmark%2Frepeated-scenario-runs

misc/python/materialize/feature_benchmark/benchmark_versioning.py

nrainer-materialize · 2024-09-18T15:45:42Z

I still need to validate and test the changes.

misc/python/materialize/feature_benchmark/benchmark_versioning.py

nrainer-materialize · 2024-09-19T14:16:28Z

This causes way more flakes. We need to discuss at the onsite whether we want to proceed with this change and possibly increase thresholds or discard it.

def- · 2024-09-19T14:27:33Z

In my opinion we should keep feature-benchmark as is so we can keep catching regressions, and improve parallel-benchmark to be the one-run-reliable benchmarking framework. First issue about where parallel-benchmark was not entirely consistent: https://github.com/MaterializeInc/database-issues/issues/8571

def- · 2024-09-19T15:00:32Z

I'll rebase this on top of #29664, maybe that helps? https://buildkite.com/materialize/nightly/builds/9644
Edit: Didn't help

…esentative result

…er strategies

…election strategy

…mark version)

nrainer-materialize · 2024-09-20T10:51:44Z

In my opinion we should keep feature-benchmark as is so we can keep catching regressions

I changed this PR to still always conduct three runs per scenario but pick the best outcome (instead of the median outcome) with 7478e73.

def- · 2024-09-20T11:26:07Z

Good compromise

nrainer-materialize added the T-testing Theme: tests or test infrastructure label Sep 18, 2024

nrainer-materialize self-assigned this Sep 18, 2024

nrainer-materialize commented Sep 18, 2024

View reviewed changes

misc/python/materialize/feature_benchmark/benchmark_versioning.py Outdated Show resolved Hide resolved

nrainer-materialize requested a review from def- September 18, 2024 15:45

def- approved these changes Sep 18, 2024

View reviewed changes

misc/python/materialize/feature_benchmark/benchmark_versioning.py Outdated Show resolved Hide resolved

nrainer-materialize force-pushed the feature-benchmark/repeated-scenario-runs branch from f668f0e to 662cceb Compare September 19, 2024 09:22

def- force-pushed the feature-benchmark/repeated-scenario-runs branch 2 times, most recently from e092cff to 53b39ca Compare September 19, 2024 15:12

nrainer-materialize added 5 commits September 20, 2024 09:33

feature benchmark: rename parameter

c74b229

feature benchmark: logic to choose representative report per scenario

1ecdf24

feature benchmark: logic to determine scenarios with regressions

9a60b65

feature benchmark: run each scenario multiple times and choose a repr…

e5fe0f0

…esentative result

feature benchmark: remove unnecessary code

b99472b

nrainer-materialize force-pushed the feature-benchmark/repeated-scenario-runs branch from 53b39ca to 49ddec6 Compare September 20, 2024 07:33

feature benchmark: report: extract has_scenario_regression

0e78938

nrainer-materialize force-pushed the feature-benchmark/repeated-scenario-runs branch from 49ddec6 to 25f1872 Compare September 20, 2024 08:24

nrainer-materialize added 4 commits September 20, 2024 10:36

feature benchmark: refactor benchmark result selection to allow furth…

f5db960

…er strategies

feature benchmark: introduce and use BestBenchmarkResultSelector as s…

7478e73

…election strategy

feature benchmark: update README

0dade43

feature benchmark: update version hash (but do not increase the bench…

087b7bf

…mark version)

nrainer-materialize force-pushed the feature-benchmark/repeated-scenario-runs branch from 25f1872 to 087b7bf Compare September 20, 2024 08:39

nrainer-materialize merged commit 959f401 into MaterializeInc:main Sep 20, 2024
19 of 22 checks passed

nrainer-materialize deleted the feature-benchmark/repeated-scenario-runs branch September 20, 2024 11:44

github-actions bot locked and limited conversation to collaborators Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature benchmark: repeated scenario runs #29637

feature benchmark: repeated scenario runs #29637

nrainer-materialize commented Sep 18, 2024 •

edited

Loading

nrainer-materialize commented Sep 18, 2024

nrainer-materialize commented Sep 19, 2024

def- commented Sep 19, 2024

def- commented Sep 19, 2024 •

edited

Loading

nrainer-materialize commented Sep 20, 2024

def- commented Sep 20, 2024

feature benchmark: repeated scenario runs #29637

feature benchmark: repeated scenario runs #29637

Conversation

nrainer-materialize commented Sep 18, 2024 • edited Loading

Nightly

nrainer-materialize commented Sep 18, 2024

nrainer-materialize commented Sep 19, 2024

def- commented Sep 19, 2024

def- commented Sep 19, 2024 • edited Loading

nrainer-materialize commented Sep 20, 2024

def- commented Sep 20, 2024

nrainer-materialize commented Sep 18, 2024 •

edited

Loading

def- commented Sep 19, 2024 •

edited

Loading