Update to use a more efficient power of 2 check. #274

mjp41 · 2021-01-27T11:15:10Z

@achamayou can you test if this shows the same regression in Debug performance.

achamayou · 2021-01-27T11:25:57Z

@mjp41 this does solve the regression, and our test numbers (in the admittedly not very performance sensitive Debug build) are back to normal

73:         ✓ creates numeric polls (824ms)
73:         ✓ creates string polls (825ms)
73:         ✓ rejects creating polls with an existing topic (1668ms)
73:         ✓ rejects creating polls without authorization (757ms)
73:       POST /
73:         ✓ creates multiple polls (814ms)
73:         ✓ rejects creating polls with an existing topic (1722ms)
73:         ✓ rejects creating polls without authorization (726ms)
73:       PUT /{topic}
73:         ✓ stores opinions to a topic (1581ms)
73:         ✓ rejects opinions with mismatching data type (1583ms)
73:         ✓ rejects opinions for unknown topics (770ms)
73:         ✓ rejects opinions without authorization (1447ms)
73:       PUT /
73:         ✓ stores opinions to multiple topics (1581ms)
73:         ✓ rejects opinions with mismatching data type (1535ms)
73:         ✓ rejects opinions for unknown topics (802ms)
73:         ✓ rejects opinions without authorization (1454ms)
73:       GET /{topic}
73:         ✓ returns aggregated numeric poll opinions (9383ms)
73:         ✓ returns aggregated string poll opinions (9359ms)
73:         ✓ rejects returning aggregated opinions below the required opinion count threshold (8525ms)
73:         ✓ rejects returning aggregated opinions for unknown topics (793ms)
73:     /csv
73:       GET|POST /
73:         ✓ stores and returns opinions of authenticated user as CSV (1326ms)

davidchisnall

Does this actually make a difference to codegen? Most of the modified code is in static asserts, and I'm curious whether the comparisons in the dynamic checks actually compile to something different after optimisation (they're interesting compiler test cases if they do).

The modified version is much more readable than the original though, so I'm happy to see this change whether it improves codegen or not.

achamayou · 2021-01-27T11:32:59Z

@davidchisnall even without LVI mitigations, using the previous implementation (d3ecd66, between 0.5.0 and 0.5.1) caused a major slowdown in our builds. With LVI mitigations, some of the tests just time out.

This is all in Debug, so this isn't critical by any means, but it's still quite nice.

nwf

~~The runtime asserts in pointer_align_down (and friends, but especially that one) land on the hottest path, as part of {Superslab,Mediumslab}::get(), for every dealloc().~~ On second thought, maybe that's not true with inlining, since the alignments are constant. Hm.

As @davidchisnall says, tho', this is also a good improvement to readability and the perf effects are a nice bonus. :)

mjp41 · 2021-01-27T11:42:01Z

So in the commit @achamayou mentioned

In align_up and align_down, there was a change from:

 SNMALLOC_ASSERT(next_pow2(alignment) == alignment);

to

 SNMALLOC_ASSERT(next_pow2_const(alignment) == alignment);

This means that the code is not using the intrinsic, and thus uses a loop to calculate the next_pow2. As the issue was in debug, I don't think the loop will be optimised away.

The rest of the changes were from reviewing our usage of the pattern

next_pow2(???) == ???

and

??? == next_pow2(???)

I thought it was nicer to rewrite. I don't expect the majority of the changes to have any codegen impact.

mjp41 requested review from nwf and achamayou January 27, 2021 11:15

Update to use a more efficient power of 2 check.

dd63343

mjp41 force-pushed the check_power_of_2 branch from 44c03da to dd63343 Compare January 27, 2021 11:26

achamayou approved these changes Jan 27, 2021

View reviewed changes

davidchisnall approved these changes Jan 27, 2021

View reviewed changes

nwf approved these changes Jan 27, 2021

View reviewed changes

achamayou mentioned this pull request Jan 27, 2021

Update snmalloc from 0.5.0 to 0.5.2 microsoft/CCF#2114

Closed

mjp41 merged commit a3660c4 into microsoft:master Jan 27, 2021

mjp41 deleted the check_power_of_2 branch January 27, 2021 11:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to use a more efficient power of 2 check. #274

Update to use a more efficient power of 2 check. #274

mjp41 commented Jan 27, 2021

achamayou commented Jan 27, 2021

davidchisnall left a comment

achamayou commented Jan 27, 2021

nwf left a comment •

edited

Loading

mjp41 commented Jan 27, 2021

Update to use a more efficient power of 2 check. #274

Update to use a more efficient power of 2 check. #274

Conversation

mjp41 commented Jan 27, 2021

achamayou commented Jan 27, 2021

davidchisnall left a comment

Choose a reason for hiding this comment

achamayou commented Jan 27, 2021

nwf left a comment • edited Loading

Choose a reason for hiding this comment

mjp41 commented Jan 27, 2021

nwf left a comment •

edited

Loading