-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Python example #444
Update Python example #444
Conversation
CodSpeed Performance ReportMerging #444 will not alter performanceComparing Summary
Benchmarks breakdown
|
This seems wrong to me. We want to be able to catch regressions in our whole codebase, not just in 6 slow tests, right? |
I think the issue is that the shorter tests have too much variability to be deterministic enough to benchmark. Alternatively we could try a deterministic memory allocator for our benchmarks. There's a thread I link to above which outlines the issue. My sense is that if a change doesn't effect our larger benchmarks it's probably fine to pull in... Like as long as they are representative enough of all the code. |
If the large tests are representative then this is a good change; I just didn't think that they were. |
Looking at the codspeed regressions on a different PR (#448) I'm seeing that a lot of the variance comes from the parser. |
Aren't most of the regressions from short-running programs uninteresting because many are one-off costs like parsing, compilation, and bookkeeping? We care more about performance-critical components like query/apply/rebuild, which only become prominent in longer-running programs. |
I think the old python example is a good stress test for the type checker. Maybe we can still keep it but under a different name? |
I think that we should increase the regression threshold back to 10%: look at this report. Even on a benchmark that takes >10ms, the |
Another option is I could "ignore" all benchmarks that aren't long running: https://docs.codspeed.io/features/ignoring-benchmarks/ That way they still get recorded and can be viewed manually but won't influence the reports. The only downside here is that if we add additional short examples, they will be have to manually added to the ignore list in the UI. |
That last option seems strictly better than the solution in this PR to me: opt-out vs. opt-in. Hopefully it would motivate us to create better benchmarks in the future. |
OK sounds good! I will update this PR to just add the Python benchmark (and keep the old one) and then just ignore the smaller benchmarks for now :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have quite a few nits but no need for re-review
OK I think this is ready to merge |
In this PR we change the benchmarks to only run long-running examples instead of all of them. This is to reduce variance caused by indeterminacy in the memory allocator, which is overly prominent in short benchmarks (oxc-project/backlog#89).It also updates the Python example to more accurately reflect the current workload, including optimizing, and turning into a final Python string, all with CSE (so the size of the file is smaller than before). It should output a final string with something like this: