Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SPRT end-of-run message to be more useful #1543

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dubslow
Copy link
Contributor

@dubslow dubslow commented Jan 15, 2023

(untested)

@vdbergh
Copy link
Contributor

vdbergh commented Jan 15, 2023

I am not sure about reincluding the bounds information. This is already included in the new_run message (where it logically belongs). It you want to see the history of a run you can click on the events link in the test view page.

@dubslow
Copy link
Contributor Author

dubslow commented Jan 15, 2023

as far as im aware, this is the message that now shows up in the events log and in the popup notifications. this is an end-of-run event, which is quite different from the new_run message, which was days prior. at any rate, the game count is most useful, and bounds are at least more useful than LOS or elo (elo is at least a bit useful, LOS is really useless)

@dubslow dubslow changed the title Improve SPRT finished message to be more useful summary Improve SPRT end-of-run message to be more useful Jan 15, 2023
@vdbergh
Copy link
Contributor

vdbergh commented Jan 15, 2023

as far as im aware, this is the message that now shows up in the events log and in the popup notifications. this is an end-of-run event, which is quite different from the new_run message, which was days prior. at any rate, the game count is most useful, and bounds are at least more useful than LOS or elo (elo is at least a bit useful, LOS is really useless)

I am not opposed to including the game count (and the pentanomial frequencies). However I very strongly disagree with you that game count is more useful than Elo and LOS. Game count by itself is totally useless and varies widely between identical tests. Elo and LOS (=1-p) are exactly the statistical information that can be extracted from the game outcome. You cannot beat mathematics.

What I don't want to do is to put duplicate information in the event log. As I told you, the additional information concerning bounds and TC is only two clicks away. The end result looks like this

https://tests.stockfishchess.org/actions?run_id=63bcd915a0bb4b71d3c6d921

I agree the notifications should contain more information. But this is a different issue. The way I want to handle this is to make the browser ask the server for more information when a run is finished. This is a more flexible solution than to overload the event log with duplicate information.

@dubslow
Copy link
Contributor Author

dubslow commented Jan 15, 2023

if we're worried about duplicate information in the event log, then why bother having such notifications at all?

yes, the elo error bar is theoretically equivalent to (same information as) the game count + bounds + LLR, however it's harder for a human to guess the bounds from the error bar than vice versa -- green, <0,2>, 200k games is a lot easier for humans to parse than green, 1.15, 0.10, 2.25, and they're both quite short, hardly spamming anybody. (not to mention that we fallible humans tend to read wayyyyyyyy too much into elo bars, so psychologically they're best downplayed as much as possible. see for example tcec chat every time a simplification test is merged.)

im targeting human practicalities here, not theory. (and, theoretically, LOS is quite a lot less info than game count or elo error bar, nevermind it's practical worthlessness as well.)

@vdbergh
Copy link
Contributor

vdbergh commented Jan 15, 2023

if we're worried about duplicate information in the event log, then why bother having such notifications at all?

Notifications are a different issue. They are implemented via the event log but that is an implementation detail. As I told you, they will be extended.

yes, the elo error bar is theoretically equivalent to (same information as) the game count + bounds + LLR, however it's harder for a human to guess the bounds from the error bar than vice versa -- green, <0,2>, 200k games is a lot easier for humans to parse than green, 1.15, 0.10, 2.25, and they're both quite short, hardly spamming anybody. (not to mention that we fallible humans tend to read wayyyyyyyy too much into elo bars, so psychologically they're best downplayed as much as possible. see for example tcec chat every time a simplification test is merged.)

im targeting human practicalities here, not theory. (and, theoretically, LOS is quite a lot less info than game count or elo error bar, nevermind it's practical worthlessness as well.)

As I told you the bounds and TC information is only two clicks away (click on the run and then on the events link).

https://tests.stockfishchess.org/actions?run_id=63bcd915a0bb4b71d3c6d921

I am not opposed to including the game count and pentanomial frequencies although one needs statistics to be able to interpret these (which is what Elo and LOS do). You cannot outsmart statistics.

@dubslow
Copy link
Contributor Author

dubslow commented Jan 15, 2023

As I told you the bounds and TC information is only two clicks away.

I don't see why putting the human-readable information two clicks away and the human-opaque information zero clicks away is somehow a good thing. They should be reversed (such as in this PR): human-readable zero clicks away, human-opaque two clicks away.

(The pentanomial frequency isn't immediately useful to humans, not more so than the elo bar. What is most important is the LLR, which is the entire raison d'être of SPRT tests. LLR+game count is exactly equal to the elo bar, except 10x more readable for humans.)

@vdbergh
Copy link
Contributor

vdbergh commented Jan 15, 2023

As I told you the bounds and TC information is only two clicks away.

I don't see why putting the human-readable information two clicks away and the human-opaque information zero clicks away is somehow a good thing. They should be reversed (such as in this PR).

Because pretty soon someone else will want to put something else in and then we can not reject it for the same reason. The end result will be an event log full of duplicate information.

No duplicate information is a simple elegant rule which is standard in database design. Fishtest already provides the information you want in many different ways (including in the event log itself using two clicks). There is zero need to clutter the event log.

@vdbergh
Copy link
Contributor

vdbergh commented Jan 15, 2023

LLR+game count is exactly equal to the elo bar, except 10x more readable for humans.)

This is because you think you can outsmart statistics. Which is an illusion.

@vdbergh
Copy link
Contributor

vdbergh commented Jan 15, 2023

Think: what do you want to know from a patch? Indeed an estimate of how much Elo the patch is. But an Elo estimate is useless without a confidence interval.

LOS is more or less a different way of interpreting the confidence interval. It approximately depends on the position of zero with respect to the confidence interval.

I do not see what there is to argue...

@dubslow
Copy link
Contributor Author

dubslow commented Jan 15, 2023

Let me put it this way: what's in the notification summary should match what's in the test page summaries. Those summaries show, first and foremost, the LLR, bounds, and total game count, 1 2 and 3. Next is less useful stuff like the pentanomial frequencies and the TC, while the least useful stuff such as Elo and LOS are hidden (for good reason, since as I said we flesh bags tend to wildly overinterpret error bars and p values -- seriously, there's a whole field of psychology devoted to the misuse of p values).

I only want the end-of-run notifications to match the established format. No matter how much we disagree about the most consumable perspective of the same information, it is a simple fact that the notifications don't match the established presentation, and this PR introduces that consistency.

@vdbergh
Copy link
Contributor

vdbergh commented Jan 15, 2023

There is no such thing as "notification summaries". There are two things:

  • The event log: this is a database. For the finished_run events, the event log records the information related to that type of event (and as I said repeatedly I am not opposed to including game count). Bounds and TC are things that are related to the new_run event and they are recorded there. No need to repeat them.

  • Notifications: this is a GUI feature. Here we can of course include any information we want. Currently notifications only extract information from the finished_run events in the event log but that is an implementation detail which is changeable.

@dubslow
Copy link
Contributor Author

dubslow commented Jan 15, 2023

ah, that helps. "notification summary" meant "the text content displayed in the GUI notification". as it stands, the text of the notifications (and the event log text) isn't very useful, other than "hey it finished refresh the actual page". but in the same 20 characters of text it is possible to make that notification text much more useful.

i suppose i ought not care as much about what's in the event log (vs gui notification), but it still bothers me a bit that the 4 numbers recorded in the event log are the four most useless-to-humans numbers out of the test. but ill probably get over it if this is voted down. altho adding the game count would go a long way to helping

@dubslow dubslow marked this pull request as ready for review January 15, 2023 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants