-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various README improvements #224
Conversation
@@ -673,7 +708,7 @@ Defaults are the same as the corresponding command-line flags. | |||
"root": "./benchmarks", | |||
"sampleSize": 50, | |||
"timeout": 3, | |||
"autoSampleConditions": ["0%", "1%"], | |||
"horizons": ["0%", "1%"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not completely clear on this, but just to check with a possibly naive question - does this mean "0%" to the left and "1%" to the right in the statistical ordering of cases from fastest to slowest? So a singular string or an array of not just two entries wouldn't be valid options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. Adding an additional question. Should this remain autoSampleConditions
?
Edit: Covered in the other PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not completely clear on this, but just to check with a possibly naive question - does this mean "0%" to the left and "1%" to the right in the statistical ordering of cases from fastest to slowest? So a singular string or an array of not just two entries wouldn't be valid options?
The horizon values are an unordered set of thresholds, order doesn't matter. You can set just one value -- in fact the default setting is 0%. To set one value in the JSON config, you set an array with one item.
Updated this section to hopefully make that a little clearer.
But TBH I don't think anybody but me has ever groked this system, it always seems to be confusing, so I think I need to think about a different model for it.
Most of the time the default is fine and this isn't a setting people need to think about. It means "keep sampling until you can say with high confidence which benchmark is faster".
The reason I included the ability to set thresholds other than 0% is because I imagined it would be useful for e.g. regression testing to be able to say e.g. "I will only accept this PR if I'm confident it introduces no less than 1% performance penalty". That is, I don't actually care if it's strictly faster or slower (especially because often the difference is so small that it would take an inordinate amount of time to answer the question with 95% confidence) -- I just want to be sure it's not a big enough regression for me to care about (e.g. 1%).
The reason you'd set two thresholds is simply so you could answer both questions at the same time.
Feedback definitely welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🎉
README.md
Outdated
### Stopping conditions | ||
|
||
You can also control which statistical conditions tachometer should check for | ||
when deciding when to stop auto-sampling by configuring **_horizons_**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see this gets renamed in the horizons renaming PR. Nice nice!
README.md
Outdated
configuration, then while you'll get a slightly different confidence interval | ||
every time, it should be the case that _95% of those confidence intervals will | ||
contain the true value_. See | ||
[Wikipedia](https://en.wikipedia.org/wiki/onfidence_interval#Meaning_and_interpretation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing letter in link:
[Wikipedia](https://en.wikipedia.org/wiki/onfidence_interval#Meaning_and_interpretation) | |
[Wikipedia](https://en.wikipedia.org/wiki/Confidence_interval#Meaning_and_interpretation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks.
README.md
Outdated
|
||
In general, we want narrower confidence intervals. Three knobs can do this: | ||
|
||
1. Dropping the chosen confidence level. _This is not a good idea!_ We want our |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does tachometer expose these 3 knobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, only the 3rd one. The first 2 are included as example of what you should not, and cannot, do. But I think this is probably unnecessarily confusing, so I've just removed all but the 3rd item.
@@ -673,7 +708,7 @@ Defaults are the same as the corresponding command-line flags. | |||
"root": "./benchmarks", | |||
"sampleSize": 50, | |||
"timeout": 3, | |||
"autoSampleConditions": ["0%", "1%"], | |||
"horizons": ["0%", "1%"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. Adding an additional question. Should this remain autoSampleConditions
?
Edit: Covered in the other PR.
README.md
Outdated
contains the _true value_ of that parameter. | ||
|
||
More precisely speaking, the 95% confidence level describes the _long-run | ||
proportion of confidence intervals that will contain the true value_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow that definition, but may that's ok 'cause it's the precise definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, then this paragraph probably isn't adding any value. I removed it and just kept the link to Wikipedia.
│ append.html │ 0.68ms - 0.79ms │ faster │ │ | ||
│ │ │ 90% - 92% │ - │ | ||
│ │ │ 6.49ms - 7.80ms │ │ | ||
└─────────────┴─────────────────┴─────────────────┴─────────────────┘ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great to also show an example where the confidence intervals overlap, so you can mention that either might have a faster true mean. I know that gets into the interpretation section, but it's kind of the key feature and you even mention "tiny differences" a few times, so it'd be good to show that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, something with a much smaller difference would be nice here I agree. I'm trying to think of an example that's equally simple, but struggling at the moment. I'll think of something later and do a followup -- any ideas?
We could do something like the for loop example that's in the "interpreting results" section, though it's pretty contrived. It's nice because it shows faster, slower, and unsure though, as you say. That requires 3 benchmarks, though, so it's a little more complex setup to show.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some DOM APIs that are pretty close, but definitely different. Maybe appendChild vs insertBefore
Improved the README:
Also threw this in:
unit
vse2e
tests inpackage.json
.Fixes #221
Fixes #223