-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace hint_common_parent_position()
by backwards accumulator updates
#5870
base: master
Are you sure you want to change the base?
Conversation
clang-format 18 needs to be run on this PR. (execution 13316780261 / attempt 1) |
Congratulations! Just FYI, an LTC is not strictly required for non-functional speedups. |
well vondele told me to run LTC on discord, because of the size of the change. It's also not really a non-functional change since it changes the algorithm for computing evaluations, so it is conceivable that this would be slower at higher depth. |
Thread sanitizer seems to have failed. Maybe this has something to do with the shared root state? |
Yeah, I'm looking at it. |
e3c1547
to
ea43cd2
Compare
Congrats! Can you say a bit about why the small accumulator can't be handled the same way as the big one? |
src/position.h
Outdated
int rule50; | ||
int pliesFromNull; | ||
int16_t rule50; | ||
int16_t pliesFromNull; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these 16 bit now?
can you please squash the PR, also we'll need to understand why CI is / was failing, in particular if thread sanitizer is flagging something. |
Yes, that seems to be the issue. Or at least nodes where Incidentally, why do some root nodes not have |
Since there are much fewer smallnet evaluations (only 20% or so I think) the heuristic "computing the accumulator of an interior node has zero effective cost because we will be able to reuse it often" does not work for the small accumulator (The old code that does hints for the small accumulator is also definitively "wrong" in this sense). But the small accumulator is also 24x cheaper to compute so there's basically nothing to optimize there. |
OK I have fixed the SMP issue. Sorry for the mess. This should be ready for merging now. I ran an SMP test on 8074a0b SMP STC I have added two more commits since that test, but the first one (reverting 5aedb70) should be a strict speedup if anything and the second one is just changing the type of two ints and should be elo neutral (cf this test). should I run any tests on the final version? I'm not squashing commits yet for reference purposes. |
src/nnue/nnue_feature_transformer.h
Outdated
@@ -539,12 +529,13 @@ class FeatureTransformer { | |||
const IndexType offsetR0 = HalfDimensions * removed[0]; | |||
auto* columnR0 = reinterpret_cast<const vec_t*>(&weights[offsetR0]); | |||
|
|||
if (removed.size() == 1) | |||
if ((Forward && removed.size() == 1) | |||
|| (Backwards && added.size() == 1)) // added.size() == removed.size() == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move the code from these comments to asserts inside the ifs
src/nnue/nnue_feature_transformer.h
Outdated
{ | ||
if (FeatureSet::requires_refresh(st, Perspective) | ||
|| (!Big && (gain -= FeatureSet::update_cost(st) < 0)) || !st->previous || st->previous->next != st) | ||
goto refresh; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like this goto, as removing it would only involve a boolean flag and break
.
This https://tests.stockfishchess.org/tests/view/67abfe1ca04df5eb8dbebf0b passed. I can open a PR after this gets merged but whatever is easiest or more expedient is fine by me. |
@ces42 will you resolve the |
fyi needs a rebase |
7547498
to
740e201
Compare
Only calls to
evaluate()
now trigger NNUE accumulator updates. To make sure that we are likely to find parent positions from which to update the accumulators we perform a backwards NNUE update whenever we compute the accumulator from scratch for some position.passed STC
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 39680 W: 10474 L: 10164 D: 19042
Ptnml(0-2): 171, 4068, 11042, 4398, 161
https://tests.stockfishchess.org/tests/view/67a27f26eb183d11c65945be
passed LTC
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 337308 W: 86408 L: 85550 D: 165350
Ptnml(0-2): 276, 30551, 106126, 31441, 260
https://tests.stockfishchess.org/tests/view/67a287efeb183d11c65945ee
then simplified:
STC
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 28608 W: 7641 L: 7413 D: 13554
Ptnml(0-2): 132, 3036, 7744, 3256, 136
https://tests.stockfishchess.org/tests/view/67a4703719f522d3866d3345
LTC
LLR: 2.96 (-2.94,2.94) <-1.75,0.25>
Total: 200226 W: 51026 L: 50990 D: 98210
Ptnml(0-2): 170, 18468, 62799, 18508, 168
https://tests.stockfishchess.org/tests/view/67a4f255229c1a170cc08964
The version in this PR is a bit different from the simplified version, but it's compile-time changes only.