Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous Benchmarking #4687

Closed
9 tasks
jangorecki opened this issue Aug 25, 2020 · 9 comments · Fixed by #6078 · May be fixed by #4517
Closed
9 tasks

Continuous Benchmarking #4687

jangorecki opened this issue Aug 25, 2020 · 9 comments · Fixed by #6078 · May be fixed by #4517

Comments

@jangorecki
Copy link
Member

jangorecki commented Aug 25, 2020

As already mentioned in multiple issues and over email/slack, we need automated tests that will be able to track performance regression.
This issue is meant to define scope.

Related useful project is planned in conbench. Once it will be working, I think we should use it. Unfortunately it does not seem to happen anytime soon, or even in a more distant future.
Anyway, keeping scope minimal should make it easier to eventually move to conbench later on.
Another related work is my old project macrobenchmarking.
And recent draft PR #4517.


Scope

Dimensions by which we will track timings

  • environment (allow to lookup hardware configuration)
  • R version
  • git sha of data.table (lookup date and version)
  • benchmark script (probably fixed to benchmark.Rraw)
  • query
  • version of a query (in case we modify existing query for some reason)
  • description

Dimensions that for now I propose to not include in scope

  • operating system
  • linux kernal version
  • compiler
  • compiler version
  • R compilation flags
  • data.table compilation flags
  • metrics (memory usage, etc.)
  • datatable.optimize option
  • number of threads (?)
  • multiple runs of single query (?)

Challenges

Store timings

In current infrastructre we do not have any processes that appends artifacts (timings in context of CB). Each CB run has to store results somewhere and re-use them later on.

  • timings storage in csv for simplicity (?)

Signalling a regression

  • Should we compare only to the previous timings, or to an average timings from longer period (last month, etc.)?
  • What is the tolerance threshold? for cheap queries 5-25% variance will be common.
  • In case of exceeding threshold we may want to run benchmark few times and take an average to ensure before signalling regression. Then we need a second threshold for such average.

Environment

To reduce number of false regression signals we need to use private dedicated infrastructure.
Having dedicated machine may not be feasible, so we need to have a mechanism of signalling to jenkins (or other orchestration process) that particular machine is in use in an exclusive mode.

Pipeline

In the most likely case of not having a dedicated machine, CB may ended up being queued for a longer while (up to multiple days). Therefore it make sense to have it in a separate pipeline rather than in our data.table GLCI. Such CB pipeline could be scheduled to run daily or weekly instead of running on each commit.

  • We could eventually move publishing artifacts (package, website, docker images) to dedicated daily pipeline. This is not strictly related to CB but would make it easier to publish CB results as well.

Versioning

  • Should all elements of CB be included as a part of data.table project? or a separate project
    • test script inst/tests/benchmark.Rraw
    • new function benchmark() that meant to be used like test(), and benchmark.data.table() to be used like test.data.table()
    • any extra orchestration could be in .ci/

Example test cases

@surenH2oai
Copy link

Thanks @jangorecki This story need to be broken down into tasks in github to plan deliverables else hard to know progress.
Break it down as:
Automated provisioning of env for performance testing (via jenkins, gitlab etc. scripts and jobs)
regression DT Framework design and development
Automating the regression test (first test)
Automating additional tests into the framework

Ideally the framework should enable anyone to add new regression tests easily. So once the framework is in others can add the needed tests as need arises.

@MichaelChirico
Copy link
Member

I think a simple single-threaded vs multithreaded test should be in scope (maybe 1 thread vs 2 or 4 or 8 threads. not sure what machine we'd be testing on)

@jangorecki jangorecki linked a pull request Oct 9, 2020 that will close this issue
@jangorecki jangorecki mentioned this issue Oct 9, 2020
@assignUser
Copy link

assignUser commented Jan 19, 2022

Hey! if you are still interested in CB I would like to mention {touchstone} which will run benchmarks and comment them on the PR after every push! We are getting ready to submit to rOpenSci and CRAN.
Now is a good time to test it, we would love some feedback from new users!

@tdhock
Copy link
Member

tdhock commented Oct 3, 2023

continuous benchmarking is one goal of the NSF POSE project, that @DorisAmoakohene is working on.

@assignUser
Copy link

assignUser commented Oct 4, 2023

touchstone development has stalled a bit (more or less because it's working for the projects it is used in) but I have tentative plans to add an option to export results to conbench, which is active and being used by high profile projects such as Apache Arrow and Meta's Velox. The data export is done via JSON so it would be very easy to activate once a conbench server has been setup for dt (and I added the export function 😁)

@tdhock
Copy link
Member

tdhock commented Dec 20, 2023

@Anirban166 please see animint/animint2#101 (comment) for an explanation of how to create fine grained token with permission to write a particular repo in the context of a github action for performance testing (idea is to write performance test plots which show up in PRs, in gh-pages branch of Rdatatable/performance_test_PR_comment_plots)

@Anirban166
Copy link
Member

Anirban166 commented Feb 27, 2024

With regards to the issue at hand, I've been working on identifying performance regressions from pull requests (taking into account historical regressions) via a GitHub Action that I created.

@assignUser
Copy link

@Anirban166 You might already be aware of this but a common issue with these type of workflows (do something and comment result on PR) is that the GITHUB_TOKEN for pull_request is different depending on the source of the PR. If the PR comes from the same repo (e.g. your test PR) the token has full rw permissions and has access to secrets. If the PR comes from a fork (which is the case for most PRs in OSS repos like dt) the token is restricted to read access and no secrets.

Commenting on a PR requires the pull_requests:write permission, which means the comment won't go through on fork PRs. There is a trigger that causes a workflow to run with elevated permissions (which should always be minimized with: permissions: ... on a PR but this has to be used with great caution and should never run code from the PR directly for security reasons.

I really wish there was a separate pull_request_comment permission to avoid this issue but currently that doesn't exist :(

@Anirban166
Copy link
Member

@Anirban166 You might already be aware of this but a common issue with these type of workflows (do something and comment result on PR) is that the GITHUB_TOKEN for pull_request is different depending on the source of the PR. If the PR comes from the same repo (e.g. your test PR) the token has full rw permissions and has access to secrets. If the PR comes from a fork (which is the case for most PRs in OSS repos like dt) the token is restricted to read access and no secrets.

Commenting on a PR requires the pull_requests:write permission, which means the comment won't go through on fork PRs. There is a trigger that causes a workflow to run with elevated permissions (which should always be minimized with: permissions: ... on a PR but this has to be used with great caution and should never run code from the PR directly for security reasons.

I really wish there was a separate pull_request_comment permission to avoid this issue but currently that doesn't exist :(

Ditto!
PR contributors (talking in terms of the general mass, and not someone within data.table's core members that have appropriate access permissions) won't be able to authenticate with a GITHUB_TOKEN for commenting on the PR due to lack of write permissions.

I'm aware and that's exactly what I've been telling @tdhock over the past few weeks as well. (In fact, I brought this up to him even before I started on the action)

I wish that too :(

At present, I'll be running the workflows on my repository until we get better ideas on that (and since the tests are to be updated from time to time, it does require some manual configuration). One way is for the user to create another branch (apart from their fork of data.table) and run it between those for comparison but that too would require the user to do a few extra steps and have some working knowledge of setting up the action, so probably not the most ideal approach if we were to leave it to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment