Discussion: making the result table more compact #231

pallosp · 2025-01-28T09:24:51Z

My workflow includes running benchmarks in the VS Code terminal regularly.

Currently I can't give the tasks longer names than 20 characters, because they would make the table overflow and break its layout. Therefore I'm suggesting a couple of changes to make the metric columns narrower.

(index)

Rename the column to #. Saves 5-6 characters.

Latency average

Limit its precision. The digits after the 5th one are just noise. While this won't save more than 1-2 characters, it will make the numbers easier to read.
Optional: rename average to avg.

Latency median

Rename median to 50%.
Drop the ± part from the numbers. I don't think they give any actionable insight.
Drop the trailing zeros. Round to integer when the median has too many digits.
Return numbers, not strings.

These changes together may save 3-7 characters.

Throughput average

Optional: rename average to avg.

Throughput median

Rename median to 50%. Saves 3 characters.
Return numbers, not strings.

Before

Inserting 50k elements into various data structures
┌─────────┬──────────────────────┬──────────────────────┬───────────────────────┬────────────────────────────┬───────────────────────────┬─────────┐
│ (index) │ Task name            │ Latency average (ns) │ Latency median (ns)   │ Throughput average (ops/s) │ Throughput median (ops/s) │ Samples │
├─────────┼──────────────────────┼──────────────────────┼───────────────────────┼────────────────────────────┼───────────────────────────┼─────────┤
│ 0       │ 'float keyed Map'    │ '3218139.39 ± 3.66%' │ '2756229.00 ± 146.00' │ '331 ± 2.19%'              │ '363'                     │ 312     │
│ 1       │ 'int keyed Map'      │ '2211478.20 ± 1.77%' │ '2076166.00'          │ '462 ± 1.06%'              │ '482'                     │ 453     │
│ 2       │ '10 int keyed Maps'  │ '1304029.77 ± 1.12%' │ '1218542.00'          │ '782 ± 0.88%'              │ '821'                     │ 767     │
│ 3       │ 'sparse Array, 0.5'  │ '147581.90 ± 1.37%'  │ '124250.00'           │ '7543 ± 0.44%'             │ '8048'                    │ 6776    │
│ 4       │ 'sparse Array, 0.1'  │ '655357.51 ± 0.44%'  │ '654250.00'           │ '1537 ± 0.43%'             │ '1528'                    │ 1526    │
│ 5       │ 'sparse Array, 0.05' │ '1293428.82 ± 0.42%' │ '1306833.50 ± 0.50'   │ '776 ± 0.51%'              │ '765'                     │ 774     │
└─────────┴──────────────────────┴──────────────────────┴───────────────────────┴────────────────────────────┴───────────────────────────┴─────────┘

After

Inserting 50k elements into various data structures
┌───┬──────────────────────┬──────────────────────┬──────────────────┬────────────────────────────┬────────────────────────┬─────────┐
│ # │ Task name            │ Latency average (ns) │ Latency 50% (ns) │ Throughput average (ops/s) │ Throughput 50% (ops/s) │ Samples │
├───┼──────────────────────┼──────────────────────┼──────────────────┼────────────────────────────┼────────────────────────┼─────────┤
│ 0 │ 'float keyed Map'    │ '3218139 ± 3.66%'    │ 2756229          │ '331 ± 2.19%'              │ 363                    │ 312     │
│ 1 │ 'int keyed Map'      │ '2211478 ± 1.77%'    │ 2076166          │ '462 ± 1.06%'              │ 482                    │ 453     │
│ 2 │ '10 int keyed Maps'  │ '1304029 ± 1.12%'    │ 1218542          │ '782 ± 0.88%'              │ 821                    │ 767     │
│ 3 │ 'sparse Array, 0.5'  │ '147581 ± 1.37%'     │ 124250           │ '7543 ± 0.44%'             │ 8048                   │ 6776    │
│ 4 │ 'sparse Array, 0.1'  │ '655357 ± 0.44%'     │ 654250           │ '1537 ± 0.43%'             │ 1528                   │ 1526    │
│ 5 │ 'sparse Array, 0.05' │ '1293428 ± 0.42%'    │ 1306833          │ '776 ± 0.51%'              │ 765                    │ 774     │
└───┴──────────────────────┴──────────────────────┴──────────────────┴────────────────────────────┴────────────────────────┴─────────┘

If you agree, I'm happy to send a pull request.

The text was updated successfully, but these errors were encountered:

jerome-benoit · 2025-01-28T10:13:54Z

You can use a custom convert arrow function (https://tinylibs.github.io/tinybench/classes/Bench.html#table) to format the results the way you want.
What is currently missing is example usage of it.

pallosp · 2025-01-28T10:58:57Z

Thanks, that's a great workaround.

I think there is still value in improving the default rendering, as it reduces both the time to write benchmarks and the time to interpret the results.

jerome-benoit · 2025-01-28T11:09:49Z

The main issue is that same words are repeated because console.table() is not supporting columns grouping.

You can make a PR to refine the default, but:

median is more explicit than 50% alone. To make it explicit, the term percentile should be added. med is a fine shortcut
the standard deviation is a meaningful information but should only be displayed if upper than a threshold (8%)

pallosp · 2025-01-28T16:18:40Z

It turns out that console.table() is pretty dumb. It doesn't even allow to remove to rename the index column.

Are you suggesting to rename average to avg and median to med?

The standard deviations are indeed useful. I didn't want to remove them. What I found less useful is the distance between the two middle samples when there are even number of them.

jerome-benoit · 2025-01-28T20:32:28Z

Are you suggesting to rename average to avg and median to med?

It was your suggestions ;-) I just think that 50% is less self-explanatory than med.

The standard deviations are indeed useful. I didn't want to remove them. What I found less useful is the distance between the two middle samples when there are even number of them.

It's in fact the confidence interval around the mean that is displayed.

jdmarshall · 2025-03-08T18:49:39Z

The plus or minus needs to be there but one of the first things I did making my own table was drop the quotes around the numeric values. The fact that they’re inconsistent is just weird. Why is throughput in quotes and samples not in quotes?

jerome-benoit · 2025-03-08T18:56:44Z

string vs. number handling in console.table()

jdmarshall · 2025-03-09T00:55:54Z

How about 'mean' instead of 'average'?

And are the cool kids using p50 these days instead of 50%?

I half-snoozed through confidence intervals in physics, but I believe that if you're showing ±2.4% then showing two decimal places as if they are significant figures is wrong.

jdmarshall · 2025-03-09T01:58:02Z

console.table is also responsible for that '(index)' nonsense as well. Hmmm.

But it looks like console.table exists solely in the examples, so if someone tweaks the column names coming from tinybench, any additional cleanup could be represented in the examples by importing one of the console.table replacements.

Based on my forks, it looks like I used https://github.com/ayonious/console-table-printer/ and then had trouble with it trying to put ansi color codes into non-interactive terminals.

@jerome-benoit Do you have a cli table generator you like?

(also none of the ones I can find support nested column names so that might have to be out)

jerome-benoit · 2025-03-09T12:33:39Z

I half-snoozed through confidence intervals in physics, but I believe that if you're showing ±2.4% then showing two decimal places as if they are significant figures is wrong.

Reading the code will tell you. Unless you formally prove Student distribution based confidence interval computation is wrong for mean and MAD for median, they are as accurate as possible given the original Student assumptions.

jdmarshall · 2025-03-09T17:20:55Z

No the code isn’t going to tell me how physicists track significant figures. Even if you copied a physics book into the code it’s a tertiary reference.

jdmarshall · 2025-03-09T17:23:27Z

Drop the ± part from the numbers. I don't think they give any actionable insight.

I disagree with this. All profilers lie, some more than most. The numbers you don’t think are necessary are used by the people who come after you declare this code is as fast as it can get and find another 10%.

A broad interval and similar numbers to a previous run helps indicate that something interfered with this run and you should try it again. And that your change might not have accomplished anything.

jerome-benoit · 2025-03-09T17:34:55Z

No the code isn’t going to tell me how physicists track significant figures. Even if you copied a physics book into the code it’s a tertiary reference.

It's going to tell you how the confidence interval is actually computed, what is the method used and if it's correctly implemented, as a primary reference. Unless you have prove:

the method used had flaws,
the implementation of the method is buggy,
the display of the method outputs is buggy,

You are just making bold claims backed by ... nothing.

jdmarshall · 2025-03-09T18:02:22Z

You’re showing hundredths of a nanoseconds on a VM that is not even accurate to the nanosecond. Two decimal points on a median value is a bold claim. I agree with @pallosp that the decimal points are not helpful. In fact they cannot possibly be correct.

And if multiple runs report values that don’t overlap in their error bars - which happens all the time - then what are the numbers telling a user? It’s like the cosmological crisis. I can’t reproduce my own results let alone someone else’s. Not even on the same hardware.

jerome-benoit · 2025-03-09T18:22:45Z

You’re showing hundredths of a nanoseconds on a VM that is not even accurate to the nanosecond. Two decimal points on a median value is a bold claim. I agree with @pallosp that the decimal points are not helpful. In fact they cannot possibly be correct.

You are basically saying that doing statistics over samples is incorrect, still without bringing any factual proofs of anything.

Unless you are actually able to prove that:

statistics field is not a science (which probably lead you to have a Nobel prize)
decimal on statistical indicators are not representative
statistical indicators of an experiment outputs sample done in a non reproducibille way will not clearly exhibit the experimental condition issues (the scientific publications over the last 100 years have proven the contrary)

There's no valuable point at continuing, with the uncertain hope that it will maybe lead to an interesting outcome for tinybench

jdmarshall · 2025-03-09T20:14:16Z

I don’t understand why you’re being so defensive.

Can you get a timestamp from node in hundredths of a nanosecond? No,

Can you trust the last digits of the nanosecond measure that’s a system call? Also no.

So if I’m using essentially a child’s ruler with big fat millimeter markings on it, it’s inaccurate to try to record tenths of a millimeter with that measuring device.

Median would at most every have one decimal point, if you believe that the 1’s digit in the nanoseconds is accurate, and that value can only ever be .5, or not exist at all.

I’m talking about the measurements, you’re talking about the statistical analysis.

jerome-benoit · 2025-03-09T21:09:17Z

I’m talking about the measurements, you’re talking about the statistical analysis.

Units in a sample used to do statiscals analysis is meaningless to any statistical indicators built on it by mathematic construction. It's so meaningless that any measurements inaccuraccy in the samples are traped by the proper indicators: standard deviation, confidence interval, skew, ...

The decimals on a simple indicator that pass the statiscal significance common checks such as a mean are representative. The proof of it is part of any book good enough on statistics discussing the revelance of them.

Building a meaningful criticism on mathematic tools requires an in-depth understanding of them in the first place.

jdmarshall · 2025-03-09T21:44:06Z

I was going to say this is something I covered in #65, but it turns out my memory is faulty, so here's an update showing how wonky hrtime() still is in Node 22.

#65 (comment)

Also you're being super passive aggressive right now and I don't appreciate it.

pallosp mentioned this issue Jan 28, 2025

feat: improve the latency and throughput formatting in the result table #233

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: making the result table more compact #231

Discussion: making the result table more compact #231

pallosp commented Jan 28, 2025

jerome-benoit commented Jan 28, 2025 •

edited

Loading

pallosp commented Jan 28, 2025

jerome-benoit commented Jan 28, 2025 •

edited

Loading

pallosp commented Jan 28, 2025

jerome-benoit commented Jan 28, 2025

jdmarshall commented Mar 8, 2025 •

edited

Loading

jerome-benoit commented Mar 8, 2025

jdmarshall commented Mar 9, 2025 •

edited

Loading

jdmarshall commented Mar 9, 2025 •

edited

Loading

jerome-benoit commented Mar 9, 2025

jdmarshall commented Mar 9, 2025 •

edited

Loading

jdmarshall commented Mar 9, 2025

jerome-benoit commented Mar 9, 2025

jdmarshall commented Mar 9, 2025

jerome-benoit commented Mar 9, 2025

jdmarshall commented Mar 9, 2025

jerome-benoit commented Mar 9, 2025 •

edited

Loading

jdmarshall commented Mar 9, 2025 •

edited

Loading

Discussion: making the result table more compact #231

Discussion: making the result table more compact #231

Comments

pallosp commented Jan 28, 2025

(index)

Latency average

Latency median

Throughput average

Throughput median

Before

After

jerome-benoit commented Jan 28, 2025 • edited Loading

pallosp commented Jan 28, 2025

jerome-benoit commented Jan 28, 2025 • edited Loading

pallosp commented Jan 28, 2025

jerome-benoit commented Jan 28, 2025

jdmarshall commented Mar 8, 2025 • edited Loading

jerome-benoit commented Mar 8, 2025

jdmarshall commented Mar 9, 2025 • edited Loading

jdmarshall commented Mar 9, 2025 • edited Loading

jerome-benoit commented Mar 9, 2025

jdmarshall commented Mar 9, 2025 • edited Loading

jdmarshall commented Mar 9, 2025

jerome-benoit commented Mar 9, 2025

jdmarshall commented Mar 9, 2025

jerome-benoit commented Mar 9, 2025

jdmarshall commented Mar 9, 2025

jerome-benoit commented Mar 9, 2025 • edited Loading

jdmarshall commented Mar 9, 2025 • edited Loading

jerome-benoit commented Jan 28, 2025 •

edited

Loading

jerome-benoit commented Jan 28, 2025 •

edited

Loading

jdmarshall commented Mar 8, 2025 •

edited

Loading

jdmarshall commented Mar 9, 2025 •

edited

Loading

jdmarshall commented Mar 9, 2025 •

edited

Loading

jdmarshall commented Mar 9, 2025 •

edited

Loading

jerome-benoit commented Mar 9, 2025 •

edited

Loading

jdmarshall commented Mar 9, 2025 •

edited

Loading