Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Regression Testing Revamp Stage 1 #4602

Merged
merged 166 commits into from
Apr 12, 2022
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
166 commits
Select commit Hold shift + click to select a range
26b0efb
update readme with draft of v2
Jan 20, 2022
848f381
cache runner binary instead of storing it as an artifact
Jan 20, 2022
f765c41
rename job
Jan 20, 2022
a59baf9
add debug lines
Jan 20, 2022
45d9233
checkout source before accessing cache
Jan 20, 2022
ec83913
add temporary pull request trigger for dev
Jan 20, 2022
56dc946
fix path accessing runner
Jan 20, 2022
672e9c9
add more debug steps
Jan 20, 2022
bd281ba
add even more debug steps
Jan 20, 2022
d774fd9
move debug lines to the right place
Jan 20, 2022
d3dd754
change cache path
Jan 20, 2022
38936cf
update cache action
Jan 21, 2022
03e3419
update baseline to 1.0.latest
Jan 21, 2022
92c267d
remove debug steps
Jan 21, 2022
ea0b92b
update formatting
Jan 21, 2022
f75bc01
fix cache hashing
Jan 21, 2022
f51138a
update readme with threshold information
Jan 21, 2022
e5a7fad
runner uses 3sigma to detect regressions
Jan 21, 2022
f066d9d
fmt
Jan 21, 2022
dc565e2
move runner first
Jan 21, 2022
63d34de
minor wording fix
Feb 9, 2022
fc6e512
model output file for baseline measurements
Feb 9, 2022
4c2b4a1
Merge branch 'main' into nate/regression-testing-2
Feb 15, 2022
d86bc29
fleshed out statistics section
Feb 15, 2022
0d280dc
readme fixes
Feb 15, 2022
669903b
newline with breaks instead of spaces
Feb 15, 2022
1b21253
remove lines from svg
Feb 15, 2022
6330650
redefine ranges
Feb 15, 2022
a79c07d
test frame for compare_from
Feb 15, 2022
8a08ba9
more readme work
Feb 16, 2022
ed07f51
version tree with boxes
Feb 16, 2022
383fd1d
version tree with refs
Feb 16, 2022
8ac060a
implemented get for version tree.
Feb 16, 2022
52f3155
finished compare_from
Feb 16, 2022
68ae9e6
fmt
Feb 16, 2022
0a6cf96
update readme
Feb 18, 2022
9bfa6c3
add paragraph on reruns
Feb 21, 2022
5558e68
scaffold some of the big changes. doesn't compile
Feb 21, 2022
717bdab
followed compiler errors to fix a bunch of stuff
Feb 21, 2022
09a282e
added lots of cloning but now it compiles
Feb 21, 2022
12fdaa5
fixed some warnings
Feb 21, 2022
f91f8da
write take_samples and fix some compile errors
Feb 21, 2022
8a347b5
minor refactor
Feb 22, 2022
76a482d
fix tests, found bug
Feb 22, 2022
c4bc58b
struct opt takes Version struct as input now
Feb 22, 2022
5549f5e
renamed to more sensible names
Feb 22, 2022
0a065fd
moved type definitions to separate module
Feb 22, 2022
d2355df
fixed tests
Feb 22, 2022
e71776a
temp name metricc to perm name metric
Feb 22, 2022
309e16b
remove file writing from main
Feb 22, 2022
398cf72
make sample run work
Feb 22, 2022
cd229c0
renamed workflow
Feb 22, 2022
5481a2e
change version input
Feb 22, 2022
0cb7534
temporary new workflow for modeling before it gets put into the relea…
Feb 22, 2022
1fd9168
change name
Feb 23, 2022
a0deb8e
remove unused bits
Feb 23, 2022
1455bc4
develop the modeling action here
Feb 23, 2022
89585b1
stub version input
Feb 23, 2022
04e1503
fmt
Feb 23, 2022
e631594
expose number of runs to model command
Feb 23, 2022
6785761
add debug ls
Feb 23, 2022
c3c41c7
fmt
Feb 23, 2022
28dc0f6
merge two two model jobs into one
Feb 23, 2022
9cefe67
resolve relative paths from cli
Feb 23, 2022
b2e7776
fmt
Feb 23, 2022
d1fc2ee
rename to app
Feb 23, 2022
1838300
merge into one job
Feb 23, 2022
ff0abf5
rename job
Feb 23, 2022
b5a4d9d
remove cache retrieval step
Feb 23, 2022
892aba6
remove cache retrieval step here too
Feb 23, 2022
923bed3
require relative paths to have some prefix
Feb 23, 2022
0cfb691
fix error message
Feb 23, 2022
cefb76c
add debug line
Feb 23, 2022
5dbe170
try trailing slashes
Feb 23, 2022
9c1cb0f
try absolute paths
Feb 23, 2022
dc525c6
stop canonicalizing
Feb 23, 2022
ae18587
fmt
Feb 23, 2022
e9e9e1d
fixed warnings
Feb 23, 2022
21fedee
push baselines dir
Feb 23, 2022
d731edf
print modeling results to stdout
Feb 23, 2022
a772161
catch if a baseline is being built with no models
Feb 23, 2022
1d28405
fix bug
Feb 23, 2022
89a4822
make tmp baseline dir and do run of 20
Feb 23, 2022
80735e7
add version directory
Feb 24, 2022
c695838
reduce modeling work temporarily
Feb 24, 2022
be1d825
update readme with gha data
Feb 24, 2022
b61c35b
fmt
Feb 24, 2022
6328821
seprate jobs makes more sense
Feb 24, 2022
7bb8721
rename
Feb 24, 2022
b563f50
rename calculate error to runner error
Feb 24, 2022
2b9e6af
fixed up comment
Mar 2, 2022
dcea10f
fix caching
Mar 2, 2022
6291070
fixed tests for generated project
Mar 3, 2022
b929ec7
update project config
Mar 3, 2022
38bc1b8
add first manually executed baseline model
Mar 4, 2022
e58e155
run two samples just for now
Mar 4, 2022
0157827
does not compile. saving work.
Mar 4, 2022
8a9deee
fmt
Mar 7, 2022
dcf666f
split out runner cache/build job to pull from main
Mar 7, 2022
902b683
minor cleanup
Mar 7, 2022
05e1bd9
update test to new types
Mar 7, 2022
090fd3c
compiles now
Mar 7, 2022
d812ccb
moved sample-baseline matching up a level
Mar 7, 2022
3eccee7
add BadFilestemError to remove some unwraps
Mar 7, 2022
041a214
better comment wording
Mar 7, 2022
55bece9
forgot to get runner from cache
Mar 7, 2022
40998a5
refactored working with filestems
Mar 7, 2022
fa5a886
tmp pull runner from this branch instead
Mar 7, 2022
08e84cc
what why when comments
Mar 8, 2022
b43eda1
fmt
Mar 8, 2022
b78842d
fix warnings
Mar 8, 2022
47c7351
add debug print=
Mar 8, 2022
7a8a6d0
use absolute paths in workflow
Mar 8, 2022
47b33e2
deal with double measurements
Mar 8, 2022
4ab7a76
rename measure
Mar 8, 2022
e472b54
minor refactor
Mar 8, 2022
38373fb
add latest version function
Mar 8, 2022
83bc505
nicer refactor of latest_version_from
Mar 8, 2022
c8112e1
hook in latest version logic
Mar 8, 2022
0f64ce7
add test for version order
Mar 8, 2022
01667ec
use filename of dir not full path
Mar 8, 2022
81aee63
add nosamplescomputed error
Mar 8, 2022
dad0691
add debug lines to runner
Mar 8, 2022
c84e882
Revert "add debug lines to runner"
Mar 8, 2022
731ef1b
add todo
Mar 8, 2022
1161883
add debug lines
Mar 8, 2022
d35e591
add new exception when samples are missing from the baseline
Mar 8, 2022
fb6f9a5
debug lines
Mar 8, 2022
2ad8cfb
check if baselines are empty
Mar 8, 2022
ec54dc5
fix missing baseline bug
Mar 8, 2022
5b555ac
add sample value to final calculation
Mar 9, 2022
54d7bc2
split out file reading
Mar 9, 2022
5ae4f30
compose pure and impure operations
Mar 9, 2022
fdd23c1
remove formatting tests because they're not that helpful anyway
Mar 9, 2022
b9627e2
remove unused improt
Mar 9, 2022
dc5e946
use asref path interface instead of concrete pathbufs for params
Mar 9, 2022
d4b59dc
remove todo
Mar 9, 2022
ba01919
remove another todo
Mar 9, 2022
bb791c6
fmt
Mar 9, 2022
a414e4d
Merge branch 'main' into nate/regression-testing-2
Mar 9, 2022
2d18c15
changelog entry
Mar 9, 2022
25323f3
complete todo referencing modeling action
Mar 9, 2022
fd31552
first pass at modeling the correct version
Mar 9, 2022
9e6f54f
add version input for manual trigger
Mar 9, 2022
3c1f7b1
get version string from both trigger events properly
Mar 10, 2022
eb8e1c0
add stale info warning to README
Mar 10, 2022
40a53d4
update model comment
Mar 10, 2022
5c656e3
save cache key from runner build
Mar 10, 2022
37a0ca8
update readme
Mar 10, 2022
49388c3
add future work bullet to readme
Mar 10, 2022
f1282b8
commit logic
Mar 10, 2022
2ef524f
remove bad working directory
Mar 10, 2022
c22bc4d
fixed all bugs I could find by manually testing in a private fork
Mar 11, 2022
2475a1f
lots of work in personal fork again.
Mar 15, 2022
d65bf76
update changelog entry
Mar 16, 2022
1568528
remove actions. will be developed in another pr
Mar 16, 2022
8918836
remove baseline and add placeholder for directory
Mar 16, 2022
9b41637
revert project changes. fixed in another pr
Mar 16, 2022
142ec21
clear_dir creates any missing parent dirs as well.
Apr 12, 2022
43e9e32
hyperfine cleans up with dbt clean
Apr 12, 2022
9d0bf9a
add clearer call out for the risk of gha variation to readme
Apr 12, 2022
f0efe54
add concrete example of performance modeling work to readme
Apr 12, 2022
3984024
add paragraph about branching to investigating regressions section of…
Apr 12, 2022
60cde92
add comment about parsing versions
Apr 12, 2022
e2ff7c4
add comment to empty baseline check for clarity
Apr 12, 2022
6fef3ef
improve wording of confusing sentence
Apr 12, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 89 additions & 113 deletions .github/workflows/performance.yml
Original file line number Diff line number Diff line change
@@ -1,175 +1,151 @@
name: Performance Regression Tests
# Schedule triggers
on:
# TODO THIS IS FOR DEV ONLY:
pull_request:
# runs twice a day at 10:05am and 10:05pm
schedule:
- cron: "5 10,22 * * *"
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

env:
RUNNER_CACHE_PATH: performance/runner/target/release/runner

jobs:
# checks fmt of runner code
# purposefully not a dependency of any other job
# will block merging, but not prevent developing
fmt:
name: Cargo fmt
latest-runner:
name: Build Runner or Use Cached
runs-on: ubuntu-latest
env:
RUSTFLAGS: "-D warnings"
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1

- name: Checkout
uses: actions/checkout@v2

# attempts to access a previously cached runner
- uses: actions/cache@v2
id: cache
with:
path: ${{ env.RUNNER_CACHE_PATH }}
key: ${{ runner.os }}-${{ hashFiles('performance/runner/Cargo.toml')}}-${{ hashFiles('performance/runner/src/*') }}

- name: Fetch Rust Toolchain
if: steps.cache.outputs.cache-hit != 'true'
uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- run: rustup component add rustfmt
- uses: actions-rs/cargo@v1

- name: Add fmt
if: steps.cache.outputs.cache-hit != 'true'
run: rustup component add rustfmt

- name: Cargo fmt
if: steps.cache.outputs.cache-hit != 'true'
uses: actions-rs/cargo@v1
with:
command: fmt
args: --manifest-path performance/runner/Cargo.toml --all -- --check

# runs any tests associated with the runner
# these tests make sure the runner logic is correct
test-runner:
name: Test Runner
runs-on: ubuntu-latest
env:
# turns errors into warnings
RUSTFLAGS: "-D warnings"
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- uses: actions-rs/cargo@v1
- name: Test
if: steps.cache.outputs.cache-hit != 'true'
uses: actions-rs/cargo@v1
with:
command: test
args: --manifest-path performance/runner/Cargo.toml

# build an optimized binary to be used as the runner in later steps
build-runner:
needs: [test-runner]
name: Build Runner
runs-on: ubuntu-latest
env:
RUSTFLAGS: "-D warnings"
steps:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- uses: actions-rs/cargo@v1
- name: Build (optimized)
if: steps.cache.outputs.cache-hit != 'true'
uses: actions-rs/cargo@v1
with:
command: build
args: --release --manifest-path performance/runner/Cargo.toml
- uses: actions/upload-artifact@v2
with:
name: runner
path: performance/runner/target/release/runner
# the cache action automatically caches this binary at the end of the job

# run the performance measurements on the current or default branch
measure-dev:
needs: [build-runner]
needs: [latest-runner]
name: Measure Dev Branch
runs-on: ubuntu-latest
steps:
- name: checkout dev

- name: Checkout Dev Branch
uses: actions/checkout@v2

- name: Setup Python
uses: actions/setup-python@v2.2.2
with:
python-version: "3.8"
- name: install dbt

- name: Install dbt
run: pip install -r dev-requirements.txt -r editable-requirements.txt
- name: install hyperfine

- name: Install Hyperfine
run: wget https://github.com/sharkdp/hyperfine/releases/download/v1.11.0/hyperfine_1.11.0_amd64.deb && sudo dpkg -i hyperfine_1.11.0_amd64.deb
- uses: actions/download-artifact@v2

# runner was just accessed or built so it should always be there
- name: Fetch Runner From Cache
uses: actions/cache@v2
id: cache
with:
name: runner
- name: change permissions
path: ${{ env.RUNNER_CACHE_PATH }}
key: ${{ runner.os }}-${{ hashFiles('performance/runner/Cargo.toml')}}-${{ hashFiles('performance/runner/src/*') }}

- name: Move Runner
run: mv performance/runner/target/release/runner ./

- name: Change Runner Permissions
run: chmod +x ./runner
- name: run

# `${{ github.workspace }}` is used to pass the absolute path
- name: Run Measurement
run: ./runner measure -b dev -p ${{ github.workspace }}/performance/projects/
- uses: actions/upload-artifact@v2
with:
name: dev-results
path: performance/results/

# run the performance measurements on the release branch which we use
# as a performance baseline. This part takes by far the longest, so
# we do everything we can first so the job fails fast.
# -----
# we need to checkout dbt twice in this job: once for the baseline dbt
# version, and once to get the latest regression testing projects,
# metrics, and runner code from the develop or current branch so that
# the calculations match for both versions of dbt we are comparing.
measure-baseline:
needs: [build-runner]
name: Measure Baseline Branch
runs-on: ubuntu-latest
steps:
- name: checkout latest
uses: actions/checkout@v2
with:
ref: "0.20.latest"
- name: Setup Python
uses: actions/setup-python@v2.2.2
with:
python-version: "3.8"
- name: move repo up a level
run: mkdir ${{ github.workspace }}/../baseline/ && cp -r ${{ github.workspace }} ${{ github.workspace }}/../baseline
- name: "[debug] ls new dbt location"
run: ls ${{ github.workspace }}/../baseline/dbt/
# installation creates egg-links so we have to preserve source
- name: install dbt from new location
run: cd ${{ github.workspace }}/../baseline/dbt/ && pip install -r dev-requirements.txt -r editable-requirements.txt
# checkout the current branch to get all the target projects
# this deletes the old checked out code which is why we had to copy before
- name: checkout dev
uses: actions/checkout@v2
- name: install hyperfine
run: wget https://github.com/sharkdp/hyperfine/releases/download/v1.11.0/hyperfine_1.11.0_amd64.deb && sudo dpkg -i hyperfine_1.11.0_amd64.deb
- uses: actions/download-artifact@v2
with:
name: runner
- name: change permissions
run: chmod +x ./runner
- name: run runner
run: ./runner measure -b baseline -p ${{ github.workspace }}/performance/projects/
- uses: actions/upload-artifact@v2
- name: Upload Results
uses: actions/upload-artifact@v2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! Confirming that we'll have this data saved to / available from S3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not at first. the first version of this test suite is going to add a commit to the repository with a json file for the full modeling of each release. The samples are printed in github actions but ultimately not saved.

If we want any of the above to be available in s3 we can track that work with a separate ticket.

with:
name: baseline-results
name: dev-results
path: performance/results/

# detect regressions on the output generated from measuring
# the two branches. Exits with non-zero code if a regression is detected.
calculate-regressions:
needs: [measure-dev, measure-baseline]
needs: [measure-dev]
name: Compare Results
runs-on: ubuntu-latest
steps:
- uses: actions/download-artifact@v2

- name: Download Dev Results
uses: actions/download-artifact@v2
with:
name: dev-results
- uses: actions/download-artifact@v2
with:
name: baseline-results
- name: "[debug] ls result files"
run: ls
- uses: actions/download-artifact@v2

# runner was just accessed or built so it should always be there
- uses: actions/cache@v2
id: cache
with:
name: runner
- name: change permissions
path: ${{ env.RUNNER_CACHE_PATH }}
key: ${{ runner.os }}-${{ hashFiles('performance/runner/Cargo.toml')}}-${{ hashFiles('performance/runner/src/*') }}

- name: Move Runner
run: mv performance/runner/target/release/runner ./

- name: Change Runner Permissions
run: chmod +x ./runner
- name: make results directory

- name: Make Results Directory
run: mkdir ./final-output/
- name: run calculation

# TODO compare against baseline somehow

- name: Run Calculation
run: ./runner calculate -r ./ -o ./final-output/
# always attempt to upload the results even if there were regressions found
- uses: actions/upload-artifact@v2

- name: Upload Results
uses: actions/upload-artifact@v2
# makes sure the upload step runs even if a regression was found
if: ${{ always() }}
with:
name: final-calculations
Expand Down
67 changes: 59 additions & 8 deletions performance/README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,69 @@
# Performance Regression Testing
This directory includes dbt project setups to test on and a test runner written in Rust which runs specific dbt commands on each of the projects. Orchestration is done via the GitHub Action workflow in `/.github/workflows/performance.yml`. The workflow is scheduled to run every night, but it can also be triggered manually.
This test suite samples the performance characteristics of individual commits against performance models for prior releases. Performance is measured in project-command pairs which are assumed to conform to a normal distribution. The sampling and comparison is effecient enough to run against PRs.

The github workflow hardcodes our baseline branch for performance metrics as `0.20.latest`. As future versions become faster, this branch will be updated to hold us to those new standards.
This directory includes dbt project setups that are known performance bottlenecks, and a runner written in Rust that runs specific dbt commands on each of the projects. Orchestration is done via the GitHub Action workflow in `/.github/workflows/performance.yml`.

## Adding a new dbt project
Performance baselines measured during our release process are committed to this directory via github action. (TODO make the file and name it here).

## Investigating Regressions

If your commit has failed one of the performance regression tests, it does not necessarily mean your commit has a performance regression. However, the observed runtime value was so much slower than the expected value that it was unlikely to be random noise. This means that any commit after the release it is being compared against through this failing commit might contain the cause. Start by investigating the failing commit and working your way backwards.
nathaniel-may marked this conversation as resolved.
Show resolved Hide resolved

## The Statistics
Particle physicists need to be confident in declaring new discoveries, snack manufacturers need to be sure each snack is within the regulated margin of error for nutrition facts, and weight-rated climbing gear needs to be produced so you can trust your life to every unit that comes off the line. All of these use cases use the same kind of math to meet their needs: sigma-based p-values. This section will peel apart that math with the help of a physicist and walk through how we apply this approach to performance regression testing in this test suite.

You are likely familiar with forming a hypothesis of the form "A and B are correlated" which is known as _the research hypothesis_. Additionally, it follows that the hypothesis "A and B are not correlated" is relevant and is known as _the null hypothesis_. When looking at data, we commonly use a _p-value_ to determine the significance of the data. Formally, a _p-value_ is the probability of obtaining data at least as extreme as the ones observed, if the null hypothesis is true. To refine this definition, The experimental partical physicist [Dr. Tommaso Dorigo](https://userswww.pd.infn.it/~dorigo/#about) has an excellent [glossary](https://www.science20.com/quantum_diaries_survivor/fundamental_glossary_higgs_broadcast-85365) of these terms that helps clarify: "'Extreme' is quite tricky instead: it depends on what is your 'alternate hypothesis' of reference, and what kind of departure it would produce on the studied statistic derived from the data. So 'extreme' will mean 'departing from the typical values expected for the null hypothesis, toward the values expected from the alternate hypothesis.'" In the context of performance regression testing, our research hypothesis is that "after commit A, the codebase includes a performance regression" which means we expect the runtime of our measured processes to be _slower_, not faster than the expected value.

Given this definition of p-value, we need to explicitly call out the common tendancy to apply _probability inversion_ to our observations. To quote [Dr. Tommaso Dorigo](https://www.science20.com/quantum_diaries_survivor/fundamental_glossary_higgs_broadcast-85365) again, "If your ability on the long jump puts you in the 99.99% percentile, that does not mean that you are a kangaroo, and neither can one infer that the probability that you belong to the human race is 0.01%." Using our previously defined terms, the p-value is _not_ the probability that the null hypothesis _is true_.

This brings us to calculating sigma values. Sigma refers to the standard deviation of a statistical model, which is used as a measurement of how far away an observed value is from the expected value. When we say that we have a "3 sigma result" we are saying that if the null hypothesis is true, this is a particularly unlikely observation. Not that the null hypothesis is true. Exactly how unlikely depends on what the expected values from our research hypothesis are. In the context of performance regression testing, if the null hypothesis is false, we are expecting the results to be _slower_ than the expected value not _slower or faster_. Looking at a normal distrubiton below, we can see that we only care about one _half_ of the distribution: the half where the values are slower than the expected value. This means that when we're calculating the p-value we are not including both sides of the normal distribution.

![normal distibution](./images/normal.svg)

Because of this, the following table describes the significance of each sigma level for our _one-sided_ hypothesis:

| σ | p-value | scientific significance |
| --- | -------------- | ----------------------- |
| 1 σ | 1 in 6 | |
| 2 σ | 1 in 44 | |
| 3 σ | 1 in 741 | evidence |
| 4 σ | 1 in 31,574 | |
| 5 σ | 1 in 3,486,914 | discovery |

When detecting performance regressions that trigger alerts, block PRs, or delay releases we want to be conservative enough that detections are infrequently triggered by noise, but not so conservative as to miss most actual regressions. This test suite uses a 3 sigma standard so that only about 1 in every 700 runs is expected to fail the performance regression test suite due to expected variance in our measurements.

### Concrete Example

The following example data was collected locally on a macbook pro using the same tools included in this repository.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we want to collect the sample mean + stddev on the "same machine" as what will be running in CI? I know that's technically impossible, since GHA is a cloud service using VMs — but it's still possible to match the same basic architecture and memory characteristics, right? Versus a macbook pro running locally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're totally correct. The macbook numbers are all I have right now so I used those to give concrete numbers to the abstract concepts. Your comment here tells me that I should replace these with github action numbers once I have them though.


In dbt v1.0.1, we have the following mean and standard deviation when parsing a dbt project with 2000 models:

μ (mean): 49.82<br/>
σ (stddev): 0.5212<br/>

The 2-sided 3 sigma range can be calculated with these two values via:

x < μ - 3 σ or x > μ + 3 σ<br/>
x < 49.82 - 3 * 0.5212 or x > 49.82 + 3 * 0.5212 <br/>
x < 48.26 or x > 51.38<br/>

It follows that the 1-sided 3 sigma range for performance regressions is just:<br/>
x > 51.38

If when we sample a single `dbt parse` of the same project with a commit slated to go into dbt v1.0.2 on the same macbook pro under the same conditions, we observe a 52s parse time, then this observation is so unlikely if there were no code-induced performance regressions, that we should investigate if there is a performance regression in any of the commits between this failure and the commit where the initial distribution was measured.

Observations with 3 sigma significance that are _not_ performance regressions could be due to observing unlikely values (1 in every 741 observations), or variations in the instruments we use to take these measurements such as github actions. At this time we do not measure the variation in the instruments we use to account for these in our calculations.

## Expanding the Tests
Regression tests run pre-defined dbt commands accross a set of source-committed dbt projects that are known to cause performance bottlenecks. This collection of projects and commands should expand over time to reflect user feedback about poorly performing projects to protect against poor performance in these scenarios in future versions.

### Adding a new dbt project
Just make a new directory under `performance/projects/`. It will automatically be picked up by the tests.

## Adding a new dbt command
In `runner/src/measure.rs::measure` add a metric to the `metrics` Vec. The Github Action will handle recompilation if you don't have the rust toolchain installed.
### Adding a new dbt command
TODO

## Future work
- add more projects to test different configurations that have been known bottlenecks
- add more dbt commands to measure
- possibly using the uploaded json artifacts to store these results so they can be graphed over time
- reading new metrics from a file so no one has to edit rust source to add them to the suite
- instead of building the rust every time, we could publish and pull down the latest version.
- instead of manually setting the baseline version of dbt to test, pull down the latest stable version as the baseline.
1 change: 1 addition & 0 deletions performance/images/normal.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading