Improve SPRT end-of-run message to be more useful #1543

dubslow · 2023-01-15T12:26:05Z

(untested)

vdbergh · 2023-01-15T12:33:58Z

I am not sure about reincluding the bounds information. This is already included in the new_run message (where it logically belongs). It you want to see the history of a run you can click on the events link in the test view page.

dubslow · 2023-01-15T12:36:49Z

as far as im aware, this is the message that now shows up in the events log and in the popup notifications. this is an end-of-run event, which is quite different from the new_run message, which was days prior. at any rate, the game count is most useful, and bounds are at least more useful than LOS or elo (elo is at least a bit useful, LOS is really useless)

…ore useful than elo/los) (untested)

vdbergh · 2023-01-15T12:52:12Z

as far as im aware, this is the message that now shows up in the events log and in the popup notifications. this is an end-of-run event, which is quite different from the new_run message, which was days prior. at any rate, the game count is most useful, and bounds are at least more useful than LOS or elo (elo is at least a bit useful, LOS is really useless)

I am not opposed to including the game count (and the pentanomial frequencies). However I very strongly disagree with you that game count is more useful than Elo and LOS. Game count by itself is totally useless and varies widely between identical tests. Elo and LOS (=1-p) are exactly the statistical information that can be extracted from the game outcome. You cannot beat mathematics.

What I don't want to do is to put duplicate information in the event log. As I told you, the additional information concerning bounds and TC is only two clicks away. The end result looks like this

https://tests.stockfishchess.org/actions?run_id=63bcd915a0bb4b71d3c6d921

I agree the notifications should contain more information. But this is a different issue. The way I want to handle this is to make the browser ask the server for more information when a run is finished. This is a more flexible solution than to overload the event log with duplicate information.

dubslow · 2023-01-15T12:54:32Z

if we're worried about duplicate information in the event log, then why bother having such notifications at all?

yes, the elo error bar is theoretically equivalent to (same information as) the game count + bounds + LLR, however it's harder for a human to guess the bounds from the error bar than vice versa -- green, <0,2>, 200k games is a lot easier for humans to parse than green, 1.15, 0.10, 2.25, and they're both quite short, hardly spamming anybody. (not to mention that we fallible humans tend to read wayyyyyyyy too much into elo bars, so psychologically they're best downplayed as much as possible. see for example tcec chat every time a simplification test is merged.)

im targeting human practicalities here, not theory. (and, theoretically, LOS is quite a lot less info than game count or elo error bar, nevermind it's practical worthlessness as well.)

vdbergh · 2023-01-15T13:07:45Z

if we're worried about duplicate information in the event log, then why bother having such notifications at all?

Notifications are a different issue. They are implemented via the event log but that is an implementation detail. As I told you, they will be extended.

yes, the elo error bar is theoretically equivalent to (same information as) the game count + bounds + LLR, however it's harder for a human to guess the bounds from the error bar than vice versa -- green, <0,2>, 200k games is a lot easier for humans to parse than green, 1.15, 0.10, 2.25, and they're both quite short, hardly spamming anybody. (not to mention that we fallible humans tend to read wayyyyyyyy too much into elo bars, so psychologically they're best downplayed as much as possible. see for example tcec chat every time a simplification test is merged.)

im targeting human practicalities here, not theory. (and, theoretically, LOS is quite a lot less info than game count or elo error bar, nevermind it's practical worthlessness as well.)

As I told you the bounds and TC information is only two clicks away (click on the run and then on the events link).

https://tests.stockfishchess.org/actions?run_id=63bcd915a0bb4b71d3c6d921

I am not opposed to including the game count and pentanomial frequencies although one needs statistics to be able to interpret these (which is what Elo and LOS do). You cannot outsmart statistics.

dubslow · 2023-01-15T13:11:51Z

As I told you the bounds and TC information is only two clicks away.

I don't see why putting the human-readable information two clicks away and the human-opaque information zero clicks away is somehow a good thing. They should be reversed (such as in this PR): human-readable zero clicks away, human-opaque two clicks away.

(The pentanomial frequency isn't immediately useful to humans, not more so than the elo bar. What is most important is the LLR, which is the entire raison d'être of SPRT tests. LLR+game count is exactly equal to the elo bar, except 10x more readable for humans.)

vdbergh · 2023-01-15T13:19:30Z

As I told you the bounds and TC information is only two clicks away.

I don't see why putting the human-readable information two clicks away and the human-opaque information zero clicks away is somehow a good thing. They should be reversed (such as in this PR).

Because pretty soon someone else will want to put something else in and then we can not reject it for the same reason. The end result will be an event log full of duplicate information.

No duplicate information is a simple elegant rule which is standard in database design. Fishtest already provides the information you want in many different ways (including in the event log itself using two clicks). There is zero need to clutter the event log.

vdbergh · 2023-01-15T13:20:50Z

LLR+game count is exactly equal to the elo bar, except 10x more readable for humans.)

This is because you think you can outsmart statistics. Which is an illusion.

vdbergh · 2023-01-15T13:27:19Z

Think: what do you want to know from a patch? Indeed an estimate of how much Elo the patch is. But an Elo estimate is useless without a confidence interval.

LOS is more or less a different way of interpreting the confidence interval. It approximately depends on the position of zero with respect to the confidence interval.

I do not see what there is to argue...

dubslow · 2023-01-15T14:24:04Z

Let me put it this way: what's in the notification summary should match what's in the test page summaries. Those summaries show, first and foremost, the LLR, bounds, and total game count, 1 2 and 3. Next is less useful stuff like the pentanomial frequencies and the TC, while the least useful stuff such as Elo and LOS are hidden (for good reason, since as I said we flesh bags tend to wildly overinterpret error bars and p values -- seriously, there's a whole field of psychology devoted to the misuse of p values).

I only want the end-of-run notifications to match the established format. No matter how much we disagree about the most consumable perspective of the same information, it is a simple fact that the notifications don't match the established presentation, and this PR introduces that consistency.

vdbergh · 2023-01-15T15:05:13Z

There is no such thing as "notification summaries". There are two things:

The event log: this is a database. For the finished_run events, the event log records the information related to that type of event (and as I said repeatedly I am not opposed to including game count). Bounds and TC are things that are related to the new_run event and they are recorded there. No need to repeat them.
Notifications: this is a GUI feature. Here we can of course include any information we want. Currently notifications only extract information from the finished_run events in the event log but that is an implementation detail which is changeable.

dubslow · 2023-01-15T15:35:22Z

ah, that helps. "notification summary" meant "the text content displayed in the GUI notification". as it stands, the text of the notifications (and the event log text) isn't very useful, other than "hey it finished refresh the actual page". but in the same 20 characters of text it is possible to make that notification text much more useful.

i suppose i ought not care as much about what's in the event log (vs gui notification), but it still bothers me a bit that the 4 numbers recorded in the event log are the four most useless-to-humans numbers out of the test. but ill probably get over it if this is voted down. altho adding the game count would go a long way to helping

Improve SPRT end-of-run message to be more useful (games and bounds m…

ef263f7

…ore useful than elo/los) (untested)

dubslow force-pushed the betterResultMsg branch from da43ad5 to ef263f7 Compare January 15, 2023 12:41

dubslow changed the title ~~Improve SPRT finished message to be more useful summary~~ Improve SPRT end-of-run message to be more useful Jan 15, 2023

dubslow marked this pull request as ready for review January 15, 2023 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve SPRT end-of-run message to be more useful #1543

Improve SPRT end-of-run message to be more useful #1543

dubslow commented Jan 15, 2023

vdbergh commented Jan 15, 2023

dubslow commented Jan 15, 2023

vdbergh commented Jan 15, 2023 •

edited

Loading

dubslow commented Jan 15, 2023 •

edited

Loading

vdbergh commented Jan 15, 2023 •

edited

Loading

dubslow commented Jan 15, 2023 •

edited

Loading

vdbergh commented Jan 15, 2023

vdbergh commented Jan 15, 2023

vdbergh commented Jan 15, 2023

dubslow commented Jan 15, 2023 •

edited

Loading

vdbergh commented Jan 15, 2023 •

edited

Loading

dubslow commented Jan 15, 2023

Improve SPRT end-of-run message to be more useful #1543

Are you sure you want to change the base?

Improve SPRT end-of-run message to be more useful #1543

Conversation

dubslow commented Jan 15, 2023

vdbergh commented Jan 15, 2023

dubslow commented Jan 15, 2023

vdbergh commented Jan 15, 2023 • edited Loading

dubslow commented Jan 15, 2023 • edited Loading

vdbergh commented Jan 15, 2023 • edited Loading

dubslow commented Jan 15, 2023 • edited Loading

vdbergh commented Jan 15, 2023

vdbergh commented Jan 15, 2023

vdbergh commented Jan 15, 2023

dubslow commented Jan 15, 2023 • edited Loading

vdbergh commented Jan 15, 2023 • edited Loading

dubslow commented Jan 15, 2023

vdbergh commented Jan 15, 2023 •

edited

Loading

dubslow commented Jan 15, 2023 •

edited

Loading

vdbergh commented Jan 15, 2023 •

edited

Loading

dubslow commented Jan 15, 2023 •

edited

Loading

dubslow commented Jan 15, 2023 •

edited

Loading

vdbergh commented Jan 15, 2023 •

edited

Loading