proposal: testing: add B method for adding stats #26037

josharian · 2018-06-25T02:48:47Z

Package testing has built-in support for three statistics: wall time, allocations, and memory allocated. There is discussion of adding more: #24918.

I propose that we also expose to users the ability to report their own benchmark-specific statistics. An example comes from package sort (proximally via #26031): The number of calls to Swap and Less.

Proposed API:

func (b *B) ReportStats(n int, units string)

This will cause benchmark invocations to include <n/b.N> <units>/op in the benchmark output. Then the sort of output reported in #26031 could be used as input to benchstat.

The text was updated successfully, but these errors were encountered:

josharian · 2018-06-25T17:44:13Z

On a second read, perhaps ReportMetric or just Report would be a better name. And perhaps it should accept an int64.

rsc · 2018-07-09T20:43:03Z

Not a float64?

/cc @aclements

josharian · 2018-07-16T18:15:17Z

Not a float64?

I couldn't think of any non-integral metrics, and an int has the advantage of making it clear that the intended usage is to report a total count, not count/b.N. But I'm not opposed to a float64.

We'd also want to make sure that downstream tools (like benchstat) are prepared to parse floats, on pain of ending up in a weird situation similar to testing.AllocsPerRun: AllocsPerRun returns the average number of allocations during calls to f. Although the return value has type float64, it will always be an integral value.

bradfitz · 2018-07-16T20:10:39Z

Seems fine to me and @ianlancetaylor, @aclements, @rsc?

aclements · 2018-07-20T18:31:22Z

Yes please!

Though I'm not so sure about implicitly dividing by b.N or automatically adding "/op". Many custom benchmarks report metrics that are not per-op. gcbench reports many metrics that are aggregated differently across iterations, such as quantiles and time-rates. x/benchmarks reports things like peak RSS (arguably also a quantile). benchcmd is similar.

Users can easily divide by b.N themselves, or we could provide two methods to make the distinction clear.

In this case, the type should clearly be float64, though I would argue that it should be float64 even if the framework implicitly divides by b.N. We already made this mistake with the most common metric: "time" is not an integral resource, and we have to multiply it by 10⁹ in order to pretend that it is.

aclements · 2018-07-20T18:33:10Z

This may be a dup of #16110, though the discussion there got a bit confusing.

josharian · 2018-07-20T22:21:07Z

This may be a dup of #16110, though the discussion there got a bit confusing.

I agree. I'll close that one in favor of this, since this one seems to be moving along.

Users can easily divide by b.N themselves, or we could provide two methods to make the distinction clear.

I'm inclined to add only a single method, and use examples and docs to get the rest of the way there. About out of laptop time, will update this proposal with something concrete soon.

josharian · 2018-07-22T21:16:18Z

New proposal:

// ReportMetric adds n/unit to the reported benchmark results.
// If the metric is per-iteration, the caller should divide by b.N,
// and by convention units should end in "/op".
// ReportMetric will panic if unit is the empty string
// or if unit contains any whitespace.
func (b *testing.B) ReportMetric(n float64, unit string)

Examples to include something like a swap count (sorting) and a cache hit rate (sync.Pool? other?).

BenchmarkResult should also have a new field:

Extra map[string]float64 // Extra records additional metrics, reported by ReportMetric

Open question: What should we do if ReportMetric is called multiple times with the same unit? (Overwrite? Panic? Ignore?) What if ReportMetric is called with a reserved unit like "ns/op"?

I'm inclined to answer overwrite and panic, respectively, although documenting that is annoying.

Ready for a new round of input, thanks!

aclements · 2018-07-23T14:17:28Z

SGTM.

Open question: What should we do if ReportMetric is called multiple times with the same unit? (Overwrite? Panic? Ignore?) What if ReportMetric is called with a reserved unit like "ns/op"?

I agree that it should overwrite if called multiple times with the same unit (anything else seems more annoying or needlessly constrained).

I worry more about panicking on reserved units. Is "allocs/op" always reserved or only with -benchmem? What if we add more built-in units in the future (e.g., for #21295). Since the set is somewhat dynamic, that would make the panicking behavior somewhat unpredictable. I would lean toward overwriting the built-in measure, but that's a bit messy to implement. And, arguably, if a later call to ReportMetric overwrites the earlier value, then it should just ignore reserved units, since those are being reported by the benchmark framework at the end of the benchmark and hence would "override" any user-reported values.

How does this interact with parallel benchmarks?

rsc · 2018-07-23T20:48:31Z

In the description two messages up, "n/unit" should be "n unit" I think. Otherwise this seems OK. Will leave for Austin to accept the proposal when ready.

rsc · 2018-08-06T20:41:58Z

@josharian, @aclements, any ideas about parallel benchmarks or other details needed to accept this proposal?

aclements · 2018-08-06T20:58:58Z

I think if we're taking the value and unit from the user at face value, parallel benchmarks don't present any particular complication. We'll just report the value and unit in the benchmark line, whether or not it's parallel. (I'm not sure what, if anything, I had in mind when I questioned how this would interact with parallel benchmarks.)

So, tweaking Josh's proposal, how about:

// ReportMetric adds n unit to the reported benchmark results.
// If the metric is per-iteration, the caller should divide by b.N,
// and by convention units should end in "/op".
// If ReportMetric is called multiple times for the same unit,
// later calls override earlier calls.
// If ReportMetric is called for a unit normally reported by the
// benchmark framework itself (such as "ns/op"), n will override
// the value reported by the benchmark framework.
// ReportMetric will panic if unit is the empty string
// or if unit contains any whitespace.
func (b *testing.B) ReportMetric(n float64, unit string)

plus extending BenchmarkResult with

Extra map[string]float64 // Extra records additional metrics, reported by ReportMetric

josharian · 2018-08-07T15:42:14Z

@aclements LGTM, thanks.

aclements · 2018-08-07T16:54:26Z

I started prototyping this and ran into two problems.

// If ReportMetric is called for a unit normally reported by the
// benchmark framework itself (such as "ns/op"), n will override
// the value reported by the benchmark framework.

I'm not sure how to implement this. The built-in units are computed from fields in testing.BenchmarkResult, but aren't actually represented directly in that type, so we can't just fill these fields in from user-reported metrics. This makes me think that maybe it should instead panic on these.

Also, what should B.ResetTimer do with user-reported metrics? I think it should clear them just like it resets the built-in metrics.

josharian · 2018-08-10T03:38:24Z

The built-in units are computed from fields in testing.BenchmarkResult, but aren't actually represented directly in that type, so we can't just fill these fields in from user-reported metrics.

I'm inclined to say that we do what it says on the box: Extra records additional metrics, reported by ReportMetric. So we don't touch the original fields in testing.BenchmarkResult; everything that comes in via ReportMetric goes into BenchmarkResult.Extra, full stop. And then consult Extra to do the right thing in BenchmarkResult.String, BenchmarkResult.NsPerOp, and so on, and document that in those methods.

(I was originally drawn to panicking here. You asked, though: 'Is "allocs/op" always reserved or only with -benchmem? What if we add more built-in units in the future?' I think the answer to the first question is probably "always", but I don't see any good answer to the second question.)

Also, what should B.ResetTimer do with user-reported metrics? I think it should clear them just like it resets the built-in metrics.

Agreed.

mmcloughlin · 2018-10-26T02:41:05Z

Reminds me of "User-defined counters" in Google Benchmark: https://github.com/google/benchmark#user-defined-counters

jfn7 · 2019-01-11T12:58:22Z

What is the status on this work?

josharian · 2019-01-11T16:44:52Z

There are some outstanding questions—see Austin’s last post, above.

Austin, perhaps you could post your WIP CL so someone could see whether my proposed answers above make sense in context and/or suggestion other answers. (That someone might be me, but I can’t commit to doing it in a timely way.)

jfn7 · 2019-01-11T21:56:42Z

Ok, I would love to see progress made on this topic. I might be able to spend some cycles on this.

gopherbot · 2019-01-29T12:04:02Z

Change https://golang.org/cl/160097 mentions this issue: linux/perf: API proposal

aclements · 2019-03-08T21:05:17Z

// If ReportMetric is called for a unit normally reported by the
// benchmark framework itself (such as "ns/op"), n will override
// the value reported by the benchmark framework.

I found a use case for this. I just wrote a benchmark where the only purpose is to measure STW time. The ns/op of the benchmark itself are irrelevant, so it would be nice to override this metric so the "op" is GC STW.

josharian · 2019-03-09T20:55:19Z

I found a use case for this.

Nice. Makes sense. I think that my proposed approach should be able to handle this.

I started prototyping this

Can you put up your prototype, whatever shape it is in? Maybe we can collectively get it finished.

cespare · 2019-03-09T21:19:13Z

I've been following this issue and noticing many times when it would be helpful in my own work. I am also available to work on or review this feature in the next couple of months. Is it feasible to get this in for 1.13?

aclements · 2019-03-11T15:02:32Z

One other detail question: currently, if MB/s is 0 we don't report it (which is probably just an artifact of that being the default value), but we report ns/op even if it's 0. Should we not? That will never happen in a "real" benchmark, and would give a way for users to suppress the ns/op metric if it wasn't meaningful for a particular benchmark.

gopherbot · 2019-03-11T15:08:21Z

Change https://golang.org/cl/166717 mentions this issue: testing: add B.ReportMetric for custom benchmark metrics

josharian · 2019-03-11T15:11:39Z

Should we not? That will never happen in a "real" benchmark, and would give a way for users to suppress the ns/op metric if it wasn't meaningful for a particular benchmark.

Suppressing it when 0 sounds sensible to me.

seebs · 2019-03-17T21:44:52Z

One possible flaw with suppressing a metric when zero: benchcmp or benchstat.

Consider what happens if I try to run comparisons between two benchmark result sets, and one of them actually manages to, say, eliminate allocations from a particular use case. I no longer have a reported allocations for that benchmark, which means the comparison may not work as expected.

josharian · 2019-03-17T22:55:21Z

In the discussion in the CL we (tentatively) settled zero-is-special only for ns/op.

janusdn · 2019-03-20T15:29:02Z

In which release should we expect to see this cool functionality?

aclements · 2019-03-20T15:32:06Z

It's now committed, so it should appear in the next release (Go 1.13).

josharian · 2019-03-20T15:32:13Z

Probably 1.13. (There’s always a chance it gets reverted.)

janusdn · 2019-03-20T15:42:18Z

Thanks, looking forward to using it.

gopherbot added this to the Proposal milestone Jun 25, 2018

gopherbot added the Proposal label Jun 25, 2018

josharian mentioned this issue Jul 20, 2018

testing: reporting non-standard benchmark results #16110

Closed

aclements added the Proposal-Accepted label Aug 7, 2018

josharian mentioned this issue Oct 26, 2018

proposal: testing: custom benchmark labels #28398

Closed

gopherbot closed this as completed in 834d229 Mar 19, 2019

rsc mentioned this issue Jun 27, 2019

api: audit for Go 1.13 go1.13.txt #32813

Closed

toothrot mentioned this issue Jul 29, 2019

proposal: testing: automatically scale benchmark metrics in results output #33328

Closed

golang locked and limited conversation to collaborators Mar 19, 2020

gopherbot added the FrozenDueToAge label Mar 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: testing: add B method for adding stats #26037

proposal: testing: add B method for adding stats #26037

josharian commented Jun 25, 2018

josharian commented Jun 25, 2018

rsc commented Jul 9, 2018

josharian commented Jul 16, 2018

bradfitz commented Jul 16, 2018

aclements commented Jul 20, 2018

aclements commented Jul 20, 2018

josharian commented Jul 20, 2018

josharian commented Jul 22, 2018 •

edited

Loading

aclements commented Jul 23, 2018

rsc commented Jul 23, 2018

rsc commented Aug 6, 2018

aclements commented Aug 6, 2018

josharian commented Aug 7, 2018

aclements commented Aug 7, 2018

josharian commented Aug 10, 2018

mmcloughlin commented Oct 26, 2018

jfn7 commented Jan 11, 2019

josharian commented Jan 11, 2019

jfn7 commented Jan 11, 2019

gopherbot commented Jan 29, 2019

aclements commented Mar 8, 2019

josharian commented Mar 9, 2019

cespare commented Mar 9, 2019

aclements commented Mar 11, 2019

gopherbot commented Mar 11, 2019

josharian commented Mar 11, 2019

seebs commented Mar 17, 2019

josharian commented Mar 17, 2019

janusdn commented Mar 20, 2019

aclements commented Mar 20, 2019

josharian commented Mar 20, 2019

janusdn commented Mar 20, 2019

proposal: testing: add B method for adding stats #26037

proposal: testing: add B method for adding stats #26037

Comments

josharian commented Jun 25, 2018

josharian commented Jun 25, 2018

rsc commented Jul 9, 2018

josharian commented Jul 16, 2018

bradfitz commented Jul 16, 2018

aclements commented Jul 20, 2018

aclements commented Jul 20, 2018

josharian commented Jul 20, 2018

josharian commented Jul 22, 2018 • edited Loading

aclements commented Jul 23, 2018

rsc commented Jul 23, 2018

rsc commented Aug 6, 2018

aclements commented Aug 6, 2018

josharian commented Aug 7, 2018

aclements commented Aug 7, 2018

josharian commented Aug 10, 2018

mmcloughlin commented Oct 26, 2018

jfn7 commented Jan 11, 2019

josharian commented Jan 11, 2019

jfn7 commented Jan 11, 2019

gopherbot commented Jan 29, 2019

aclements commented Mar 8, 2019

josharian commented Mar 9, 2019

cespare commented Mar 9, 2019

aclements commented Mar 11, 2019

gopherbot commented Mar 11, 2019

josharian commented Mar 11, 2019

seebs commented Mar 17, 2019

josharian commented Mar 17, 2019

janusdn commented Mar 20, 2019

aclements commented Mar 20, 2019

josharian commented Mar 20, 2019

janusdn commented Mar 20, 2019

josharian commented Jul 22, 2018 •

edited

Loading