Various README improvements #224

aomarks · 2021-09-25T15:22:43Z

Improved the README:

Added a table of contents.
New introduction example that shows comparing two benchmarks, instead of just one.
Consolidated and moved around a few sections.
Generally improved/made more concise some of the wording.
Fixed one or two errors/typos.
Fixed the test status badge.

Also threw this in:

Separate unit vs e2e tests in package.json.

Fixes #221
Fixes #223

guybedford · 2021-09-26T07:56:26Z

README.md

@@ -673,7 +708,7 @@ Defaults are the same as the corresponding command-line flags.
  "root": "./benchmarks",
  "sampleSize": 50,
  "timeout": 3,
-  "autoSampleConditions": ["0%", "1%"],
+  "horizons": ["0%", "1%"],


I'm still not completely clear on this, but just to check with a possibly naive question - does this mean "0%" to the left and "1%" to the right in the statistical ordering of cases from fastest to slowest? So a singular string or an array of not just two entries wouldn't be valid options?

~~Great question. Adding an additional question. Should this remain autoSampleConditions?~~

Edit: Covered in the other PR.

I'm still not completely clear on this, but just to check with a possibly naive question - does this mean "0%" to the left and "1%" to the right in the statistical ordering of cases from fastest to slowest? So a singular string or an array of not just two entries wouldn't be valid options?

The horizon values are an unordered set of thresholds, order doesn't matter. You can set just one value -- in fact the default setting is 0%. To set one value in the JSON config, you set an array with one item.

Updated this section to hopefully make that a little clearer.

But TBH I don't think anybody but me has ever groked this system, it always seems to be confusing, so I think I need to think about a different model for it.

Most of the time the default is fine and this isn't a setting people need to think about. It means "keep sampling until you can say with high confidence which benchmark is faster".

The reason I included the ability to set thresholds other than 0% is because I imagined it would be useful for e.g. regression testing to be able to say e.g. "I will only accept this PR if I'm confident it introduces no less than 1% performance penalty". That is, I don't actually care if it's strictly faster or slower (especially because often the difference is so small that it would take an inordinate amount of time to answer the question with 95% confidence) -- I just want to be sure it's not a big enough regression for me to care about (e.g. 1%).

The reason you'd set two thresholds is simply so you could answer both questions at the same time.

Feedback definitely welcome.

AndrewJakubowicz

LGTM! 🎉

AndrewJakubowicz · 2021-09-27T17:00:58Z

README.md

+### Stopping conditions
+
+You can also control which statistical conditions tachometer should check for
+when deciding when to stop auto-sampling by configuring **_horizons_**.


Ah I see this gets renamed in the horizons renaming PR. Nice nice!

AndrewJakubowicz · 2021-09-27T17:04:20Z

README.md

+configuration, then while you'll get a slightly different confidence interval
+every time, it should be the case that _95% of those confidence intervals will
+contain the true value_. See
+[Wikipedia](https://en.wikipedia.org/wiki/onfidence_interval#Meaning_and_interpretation)


Missing letter in link:

Suggested change

[Wikipedia](https://en.wikipedia.org/wiki/onfidence_interval#Meaning_and_interpretation)

[Wikipedia](https://en.wikipedia.org/wiki/Confidence_interval#Meaning_and_interpretation)

Fixed, thanks.

AndrewJakubowicz · 2021-09-27T17:07:08Z

README.md

+
+In general, we want narrower confidence intervals. Three knobs can do this:
+
+1. Dropping the chosen confidence level. _This is not a good idea!_ We want our


Does tachometer expose these 3 knobs?

No, only the 3rd one. The first 2 are included as example of what you should not, and cannot, do. But I think this is probably unnecessarily confusing, so I've just removed all but the 3rd item.

AndrewJakubowicz · 2021-09-27T17:11:09Z

README.md

@@ -673,7 +708,7 @@ Defaults are the same as the corresponding command-line flags.
  "root": "./benchmarks",
  "sampleSize": 50,
  "timeout": 3,
-  "autoSampleConditions": ["0%", "1%"],
+  "horizons": ["0%", "1%"],


~~Great question. Adding an additional question. Should this remain autoSampleConditions?~~

Edit: Covered in the other PR.

justinfagnani · 2021-09-27T17:30:36Z

README.md

+contains the _true value_ of that parameter.
+
+More precisely speaking, the 95% confidence level describes the _long-run
+proportion of confidence intervals that will contain the true value_.


I'm not sure I follow that definition, but may that's ok 'cause it's the precise definition?

Hmm, then this paragraph probably isn't adding any value. I removed it and just kept the link to Wikipedia.

justinfagnani · 2021-09-27T17:32:32Z

README.md

+│ append.html │ 0.68ms - 0.79ms │          faster │                 │
+│             │                 │       90% - 92% │        -        │
+│             │                 │ 6.49ms - 7.80ms │                 │
+└─────────────┴─────────────────┴─────────────────┴─────────────────┘


It'd be great to also show an example where the confidence intervals overlap, so you can mention that either might have a faster true mean. I know that gets into the interpretation section, but it's kind of the key feature and you even mention "tiny differences" a few times, so it'd be good to show that.

Yeah, something with a much smaller difference would be nice here I agree. I'm trying to think of an example that's equally simple, but struggling at the moment. I'll think of something later and do a followup -- any ideas?

We could do something like the for loop example that's in the "interpreting results" section, though it's pretty contrived. It's nice because it shows faster, slower, and unsure though, as you say. That requires 3 benchmarks, though, so it's a little more complex setup to show.

There are some DOM APIs that are pretty close, but definitely different. Maybe appendChild vs insertBefore

aomarks added 8 commits September 25, 2021 05:32

Separate test:unit and test:e2e

b2e9990

Fix Tests badge and URL to only show main branch

dc0478f

Use a better starting example in the README

5ad3a57

Move sample size section up

dbaa354

Add a table of contents to README

5d93630

Improve sampling section

4334435

Merge some sections into Interpreting results, and clean up

3b1f353

Fix horizons config name

978be0d

aomarks requested review from justinfagnani and AndrewJakubowicz September 25, 2021 15:22

google-cla bot added the cla: yes label Sep 25, 2021

aomarks requested a review from rictic September 25, 2021 15:23

This was referenced Sep 25, 2021

docs: setting horizon via configuration #221

Closed

Disabling autosampling #223

Closed

guybedford reviewed Sep 26, 2021

View reviewed changes

AndrewJakubowicz approved these changes Sep 27, 2021

View reviewed changes

justinfagnani approved these changes Sep 27, 2021

View reviewed changes

aomarks added 2 commits September 27, 2021 12:48

Clarify horizons

1385d38

Prune down some unnecessary text

05e263b

aomarks merged commit d110258 into main Sep 27, 2021

aomarks deleted the readme-improvements branch September 27, 2021 21:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various README improvements #224

Various README improvements #224

aomarks commented Sep 25, 2021 •

edited

Loading

guybedford Sep 26, 2021

AndrewJakubowicz Sep 27, 2021 •

edited

Loading

aomarks Sep 27, 2021 •

edited

Loading

AndrewJakubowicz left a comment

AndrewJakubowicz Sep 27, 2021

AndrewJakubowicz Sep 27, 2021

aomarks Sep 27, 2021

AndrewJakubowicz Sep 27, 2021

aomarks Sep 27, 2021

AndrewJakubowicz Sep 27, 2021 •

edited

Loading

justinfagnani Sep 27, 2021

aomarks Sep 27, 2021

justinfagnani Sep 27, 2021

aomarks Sep 27, 2021

justinfagnani Sep 27, 2021

	[Wikipedia](https://en.wikipedia.org/wiki/onfidence_interval#Meaning_and_interpretation)
	[Wikipedia](https://en.wikipedia.org/wiki/Confidence_interval#Meaning_and_interpretation)


		In general, we want narrower confidence intervals. Three knobs can do this:

		1. Dropping the chosen confidence level. _This is not a good idea!_ We want our

Various README improvements #224

Various README improvements #224

Conversation

aomarks commented Sep 25, 2021 • edited Loading

Choose a reason for hiding this comment

AndrewJakubowicz Sep 27, 2021 • edited Loading

Choose a reason for hiding this comment

aomarks Sep 27, 2021 • edited Loading

Choose a reason for hiding this comment

AndrewJakubowicz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndrewJakubowicz Sep 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aomarks commented Sep 25, 2021 •

edited

Loading

AndrewJakubowicz Sep 27, 2021 •

edited

Loading

aomarks Sep 27, 2021 •

edited

Loading

AndrewJakubowicz Sep 27, 2021 •

edited

Loading