-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add __nodiscard__ and DBResidueMask() use cases and cleanup #377
base: master
Are you sure you want to change the base?
Add __nodiscard__ and DBResidueMask() use cases and cleanup #377
Conversation
TileTypeBitMask *DBResidueMask(TileType type); /* NOTE: candidate for using a const return */ Added __nodiscard__ to function for compiler assitance. Main problem fixed via another recent comment to remove excessive DBResidueMask()
Reorder things in this function. Adding 'const' in key places to provide the compiler the extra hint for the purpose of this computation we don't change the value and the value never changes externally even across function calls. I'm sure the compiler (due to macro implementation of TTMaskXxxxx() calls and visibility of data being changes) will optimise the function in exactly the way of the reorder. This also should have the side-effect of making clearer more auto vectorization possibilities to the compiler or potentially replacing the loop with (tail-call or inline) to : simd_TTMaskSetMask_residues(lmask, rmask, TT_TECHDEPBASE, DBNumUserLayers); Which would be a hand optimized form, that probably has an 'l_residues' layout that favours SIMD use (2nd order copy from source of truth just in a different data layout, such as contiguous array of TileTypeBitMask indexed from 0, with the TileType as the index).
database/DBtcontact.c
Outdated
|
||
if (type < DBNumUserLayers) | ||
{ | ||
TTMaskSetMask(rmask, &li->l_residues); | ||
TTMaskSetMask(rmask, lmask); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be TTMaskCopy() rmask = *lmask;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in db3f272
94eeb9a
to
db3f272
Compare
Just some background on why I pick on this. I am looking at code-gen (the assembly we actually get from the compiler, to The rearrangement is to streamline the function to the compiler over what is expected (to try to give compiler less work to do by making the reading of the code a bit more clear over what transform operation is expected, previously it SetZero then did a mask OR (via TTMaskSetMask), but this is obviously just a Copy operation. The introduction of TTMaskCopy() is to provide semantic intention, a struct assign could do, but the label allows to introduction of SIMD inline intrintics that can perform that. So the reordering is trying to push/help the compiler see an auto-vectorization opportunity but if that fails it is possible to force the SIMD intrinsics because the label is present. This is just background interest and many months away from results but it helps if the upstream tree can take a clean patch of just ifdef than a re-arrangement of the function at the same time. |
In case you are wondering what might looks like: SSE4_2 which aligns with new Linux system ABI x86-64-v2 (run
Which comes out like:
|
AVX2 which aligns with new Linux system ABI x86-64-v3 (run /lib64/ld-linux-x86-64.so.2 --help to see if you system supports it)
VZEROUPPER is to reset switching between different SIMD mode penalty as the target ABI is not x86-64-v3 ABI. I believe if it was that instruction would not be there. Note the compiler is clever enough to elide load/store (between calls to inline SIMD operations), re-order SIMD instruction, i.e. it can see through to the described intention and make better decisions for exact instruction ordering and scheduling that works to fill pipelines. So this is reason why I maybe picking on an area such as this to see how I can make use of what is already available with current compilers. |
I would appreciate a review as-is that for all intents and purposes that the change-set here looks ok and no obvious errors were introduced. Merge Status On hold, pending comprehensive testing framework to manage very-high-impact changes, relating to manipulation operations. The purpose of the exercise so far (picking on a function and creating this PR) was to understand exactly what the testing framework needs to look like. Current requirements from me is:
I have some elements of this already (that has been used to validate SIMD SEE4_2/AVX2 against standard macros), but needs to use this PR use-case to formalize into a mechanism that can first validate. This in itself is a bunch of work, but it is the only way I can see testing changes in data manipulation and reaching my own quality bar to consider it usable in production. There is obviously also the run magic twice (non-SIMD and SIMD) to perform complex work and compare the final outputs generated are the same. Merge Status: merge on hold (review status in 2 months) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the assessment that DBResidueMask()
was probably meant to be DBFullResidueMask( )
to begin with, and is redundant and should be removed. Should I go ahead and make that change now?
Yes make a change direct to tree and I shall rebase (drop the commit) from this PR. |
@dlmiles : Done; change will be mirrored to github by tomorrow. |
in github pull request #377.
db3f272
to
ff6bcec
Compare
dropped 1 file change in 1 commit (as 48708c5) resolves this matter. Merge still on hold pending testing to support facilitating SIMD. |
CodeQL picked this up (the function has no side-effect and return value not used, part)