Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various README improvements #224

Merged
merged 10 commits into from
Sep 27, 2021
Merged

Various README improvements #224

merged 10 commits into from
Sep 27, 2021

Conversation

aomarks
Copy link
Member

@aomarks aomarks commented Sep 25, 2021

Improved the README:

  • Added a table of contents.
  • New introduction example that shows comparing two benchmarks, instead of just one.
  • Consolidated and moved around a few sections.
  • Generally improved/made more concise some of the wording.
  • Fixed one or two errors/typos.
  • Fixed the test status badge.

Also threw this in:

  • Separate unit vs e2e tests in package.json.

Fixes #221
Fixes #223

@@ -673,7 +708,7 @@ Defaults are the same as the corresponding command-line flags.
"root": "./benchmarks",
"sampleSize": 50,
"timeout": 3,
"autoSampleConditions": ["0%", "1%"],
"horizons": ["0%", "1%"],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not completely clear on this, but just to check with a possibly naive question - does this mean "0%" to the left and "1%" to the right in the statistical ordering of cases from fastest to slowest? So a singular string or an array of not just two entries wouldn't be valid options?

Copy link
Collaborator

@AndrewJakubowicz AndrewJakubowicz Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. Adding an additional question. Should this remain autoSampleConditions?

Edit: Covered in the other PR.

Copy link
Member Author

@aomarks aomarks Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not completely clear on this, but just to check with a possibly naive question - does this mean "0%" to the left and "1%" to the right in the statistical ordering of cases from fastest to slowest? So a singular string or an array of not just two entries wouldn't be valid options?

The horizon values are an unordered set of thresholds, order doesn't matter. You can set just one value -- in fact the default setting is 0%. To set one value in the JSON config, you set an array with one item.

Updated this section to hopefully make that a little clearer.

But TBH I don't think anybody but me has ever groked this system, it always seems to be confusing, so I think I need to think about a different model for it.

Most of the time the default is fine and this isn't a setting people need to think about. It means "keep sampling until you can say with high confidence which benchmark is faster".

The reason I included the ability to set thresholds other than 0% is because I imagined it would be useful for e.g. regression testing to be able to say e.g. "I will only accept this PR if I'm confident it introduces no less than 1% performance penalty". That is, I don't actually care if it's strictly faster or slower (especially because often the difference is so small that it would take an inordinate amount of time to answer the question with 95% confidence) -- I just want to be sure it's not a big enough regression for me to care about (e.g. 1%).

The reason you'd set two thresholds is simply so you could answer both questions at the same time.

Feedback definitely welcome.

Copy link
Collaborator

@AndrewJakubowicz AndrewJakubowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉

README.md Outdated
### Stopping conditions

You can also control which statistical conditions tachometer should check for
when deciding when to stop auto-sampling by configuring **_horizons_**.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see this gets renamed in the horizons renaming PR. Nice nice!

README.md Outdated
configuration, then while you'll get a slightly different confidence interval
every time, it should be the case that _95% of those confidence intervals will
contain the true value_. See
[Wikipedia](https://en.wikipedia.org/wiki/onfidence_interval#Meaning_and_interpretation)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing letter in link:

Suggested change
[Wikipedia](https://en.wikipedia.org/wiki/onfidence_interval#Meaning_and_interpretation)
[Wikipedia](https://en.wikipedia.org/wiki/Confidence_interval#Meaning_and_interpretation)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

README.md Outdated

In general, we want narrower confidence intervals. Three knobs can do this:

1. Dropping the chosen confidence level. _This is not a good idea!_ We want our
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does tachometer expose these 3 knobs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, only the 3rd one. The first 2 are included as example of what you should not, and cannot, do. But I think this is probably unnecessarily confusing, so I've just removed all but the 3rd item.

@@ -673,7 +708,7 @@ Defaults are the same as the corresponding command-line flags.
"root": "./benchmarks",
"sampleSize": 50,
"timeout": 3,
"autoSampleConditions": ["0%", "1%"],
"horizons": ["0%", "1%"],
Copy link
Collaborator

@AndrewJakubowicz AndrewJakubowicz Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. Adding an additional question. Should this remain autoSampleConditions?

Edit: Covered in the other PR.

README.md Outdated
contains the _true value_ of that parameter.

More precisely speaking, the 95% confidence level describes the _long-run
proportion of confidence intervals that will contain the true value_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow that definition, but may that's ok 'cause it's the precise definition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, then this paragraph probably isn't adding any value. I removed it and just kept the link to Wikipedia.

│ append.html │ 0.68ms - 0.79ms │ faster │ │
│ │ │ 90% - 92% │ - │
│ │ │ 6.49ms - 7.80ms │ │
└─────────────┴─────────────────┴─────────────────┴─────────────────┘
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great to also show an example where the confidence intervals overlap, so you can mention that either might have a faster true mean. I know that gets into the interpretation section, but it's kind of the key feature and you even mention "tiny differences" a few times, so it'd be good to show that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, something with a much smaller difference would be nice here I agree. I'm trying to think of an example that's equally simple, but struggling at the moment. I'll think of something later and do a followup -- any ideas?

We could do something like the for loop example that's in the "interpreting results" section, though it's pretty contrived. It's nice because it shows faster, slower, and unsure though, as you say. That requires 3 benchmarks, though, so it's a little more complex setup to show.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some DOM APIs that are pretty close, but definitely different. Maybe appendChild vs insertBefore

@aomarks aomarks merged commit d110258 into main Sep 27, 2021
@aomarks aomarks deleted the readme-improvements branch September 27, 2021 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disabling autosampling docs: setting horizon via configuration
4 participants