Add nodiscard and DBResidueMask() use cases and cleanup #377

dlmiles · 2025-02-18T11:41:44Z

CodeQL picked this up (the function has no side-effect and return value not used, part)

TileTypeBitMask *DBResidueMask(TileType type); /* NOTE: candidate for using a const return */ Added __nodiscard__ to function for compiler assitance. Main problem fixed via another recent comment to remove excessive DBResidueMask()

Reorder things in this function. Adding 'const' in key places to provide the compiler the extra hint for the purpose of this computation we don't change the value and the value never changes externally even across function calls. I'm sure the compiler (due to macro implementation of TTMaskXxxxx() calls and visibility of data being changes) will optimise the function in exactly the way of the reorder. This also should have the side-effect of making clearer more auto vectorization possibilities to the compiler or potentially replacing the loop with (tail-call or inline) to : simd_TTMaskSetMask_residues(lmask, rmask, TT_TECHDEPBASE, DBNumUserLayers); Which would be a hand optimized form, that probably has an 'l_residues' layout that favours SIMD use (2nd order copy from source of truth just in a different data layout, such as contiguous array of TileTypeBitMask indexed from 0, with the TileType as the index).

dlmiles · 2025-02-19T14:27:05Z

database/DBtcontact.c


    if (type < DBNumUserLayers)
    {
-	TTMaskSetMask(rmask, &li->l_residues);
+	TTMaskSetMask(rmask, lmask);


This should be TTMaskCopy() rmask = *lmask;

Fixed in db3f272

dlmiles · 2025-02-19T14:55:36Z

Just some background on why I pick on this. I am looking at code-gen (the assembly we actually get from the compiler, to
look in the areas of auto-vectorization, but also to consider the introduction of SIMD intrinsics).

The rearrangement is to streamline the function to the compiler over what is expected (to try to give compiler less work to do by making the reading of the code a bit more clear over what transform operation is expected, previously it SetZero then did a mask OR (via TTMaskSetMask), but this is obviously just a Copy operation.

The introduction of TTMaskCopy() is to provide semantic intention, a struct assign could do, but the label allows to introduction of SIMD inline intrintics that can perform that.

So the reordering is trying to push/help the compiler see an auto-vectorization opportunity but if that fails it is possible to force the SIMD intrinsics because the label is present.

This is just background interest and many months away from results but it helps if the upstream tree can take a clean patch of just ifdef than a re-arrangement of the function at the same time.

dlmiles · 2025-02-19T15:00:09Z

In case you are wondering what might looks like:

SSE4_2 which aligns with new Linux system ABI x86-64-v2 (run /lib64/ld-linux-x86-64.so.2 --help to see if you system supports it) this is currently the RHEL9 default ABI.

/* DEST = DEST AND (NOT MASK) */
void __simd_inline__
SIMD_FUNCNAME(simd_TTMaskClearMask)(TileTypeBitMask *dest, const TileTypeBitMask *mask)
{
    ASSERT_ALIGNED(dest);
    ASSERT_ALIGNED(mask);

    __m128i x0 = _mm_load_si128(memaddr0(dest));
    __m128i x1 = _mm_load_si128(memaddr0(mask));
    x0 = _mm_andnot_si128(x1, x0);      // first argument gets NOT applied
    _mm_store_si128(memaddr0(dest), x0);

    __m128i x2 = _mm_load_si128(memaddr1(dest));
    __m128i x3 = _mm_load_si128(memaddr1(mask));
    x2 = _mm_andnot_si128(x3, x2);         // first argument gets NOT applied
    _mm_store_si128(memaddr1(dest), x2);
}

Which comes out like:

00000000000001b0 <simd_TTMaskClearMask_func>:
 1b0:   c5 f9 6f 0e             vmovdqa (%rsi),%xmm1
 1b4:   c5 f1 df 07             vpandn (%rdi),%xmm1,%xmm0
 1b8:   c5 f9 7f 07             vmovdqa %xmm0,(%rdi)
 1bc:   c5 f9 6f 46 10          vmovdqa 0x10(%rsi),%xmm0
 1c1:   c5 f9 df 47 10          vpandn 0x10(%rdi),%xmm0,%xmm0
 1c6:   c5 f9 7f 47 10          vmovdqa %xmm0,0x10(%rdi)
 1cb:   c3                      retq   
 1cc:   0f 1f 40 00             nopl   0x0(%rax)

dlmiles · 2025-02-19T15:07:09Z

AVX2 which aligns with new Linux system ABI x86-64-v3 (run /lib64/ld-linux-x86-64.so.2 --help to see if you system supports it)

/* DEST = DEST AND (NOT MASK) */
void __simd_inline__
SIMD_FUNCNAME(simd_TTMaskClearMask)(TileTypeBitMask *dest, const TileTypeBitMask *mask)
{
    ASSERT_ALIGNED(dest);
    ASSERT_ALIGNED(mask);

    __m256i x0 = _mm256_load_si256(memaddr0(dest));
    __m256i x1 = _mm256_load_si256(memaddr0(mask));
    x0 = _mm256_andnot_si256(x1, x0);   // first argument gets NOT applied
    _mm256_store_si256(memaddr0(dest), x0);
}

0000000000000110 <simd_TTMaskClearMask_func>:
 110:   c5 fd 6f 06             vmovdqa (%rsi),%ymm0
 114:   c5 fd df 07             vpandn (%rdi),%ymm0,%ymm0
 118:   c5 fd 7f 07             vmovdqa %ymm0,(%rdi)
 11c:   c5 f8 77                vzeroupper 
 11f:   c3                      retq

VZEROUPPER is to reset switching between different SIMD mode penalty as the target ABI is not x86-64-v3 ABI. I believe if it was that instruction would not be there.

Note the compiler is clever enough to elide load/store (between calls to inline SIMD operations), re-order SIMD instruction, i.e. it can see through to the described intention and make better decisions for exact instruction ordering and scheduling that works to fill pipelines.

So this is reason why I maybe picking on an area such as this to see how I can make use of what is already available with current compilers.

dlmiles · 2025-02-20T08:26:19Z

I would appreciate a review as-is that for all intents and purposes that the change-set here looks ok and no obvious errors were introduced.

Merge Status On hold, pending comprehensive testing framework to manage very-high-impact changes, relating to manipulation operations.

The purpose of the exercise so far (picking on a function and creating this PR) was to understand exactly what the testing framework needs to look like.

Current requirements from me is:

Needs to isolate a function (or limit group of functions, pluck them from the code base)
Needs to create harness around function (codegen, test runner, manage API call, returned data)
Needs to serialize input/output/argument/conditions data
Needs to allow randomization (and therefore fuzzing), to log output and provide re-creatable failure (from serialize data)
Need to be able to compare arbitrary magic versions on demand (checkout, extract target function, encapsulate in tester, run test)
Needs to test same compiler multiple build options -g -O -O3 -fomit-frame-pointer etc...
Needs to different compilers gcc-11 gcc-13 clang-17 etc..
Needs to allow swapping of original macros, SIMD, -march=target to confirm codegen / compiler output is also good

I have some elements of this already (that has been used to validate SIMD SEE4_2/AVX2 against standard macros), but needs to use this PR use-case to formalize into a mechanism that can first validate.
This is needed to qualify any future SIMD work as being production ready.

This in itself is a bunch of work, but it is the only way I can see testing changes in data manipulation and reaching my own quality bar to consider it usable in production. There is obviously also the run magic twice (non-SIMD and SIMD) to perform complex work and compare the final outputs generated are the same.

Merge Status: merge on hold (review status in 2 months)
Quality: ready to merge (the code as-is is just untested but otherwise sound with no further development on it expected)
Risk: very-high-impact (data manipulation operation change)
Level of Testing: none (needs extensive test as above)

RTimothyEdwards

I agree with the assessment that DBResidueMask() was probably meant to be DBFullResidueMask( ) to begin with, and is redundant and should be removed. Should I go ahead and make that change now?

dlmiles · 2025-02-23T11:12:28Z

I agree with the assessment that DBResidueMask() was probably meant to be DBFullResidueMask( ) to begin with, and is redundant and should be removed. Should I go ahead and make that change now?

Yes make a change direct to tree and I shall rebase (drop the commit) from this PR.

RTimothyEdwards · 2025-02-23T15:17:02Z

@dlmiles : Done; change will be mirrored to github by tomorrow.

in github pull request #377.

dlmiles · 2025-02-24T10:06:25Z

dropped 1 file change in 1 commit (as 48708c5) resolves this matter.

Merge still on hold pending testing to support facilitating SIMD.

dlmiles added 4 commits January 31, 2025 20:13

Compiler support for [[nodiscard]] via __discard__

58c3cb7

CodeQL added __nodiscard__ DBResidueMask()

84203dc

TileTypeBitMask *DBResidueMask(TileType type); /* NOTE: candidate for using a const return */ Added __nodiscard__ to function for compiler assitance. Main problem fixed via another recent comment to remove excessive DBResidueMask()

makedbh.in: database.h add TTMaskCopy() codegen

d0a6675

dlmiles commented Feb 19, 2025

View reviewed changes

dlmiles force-pushed the master-upstream-20250216-nodiscard branch from 94eeb9a to db3f272 Compare February 19, 2025 14:46

RTimothyEdwards reviewed Feb 23, 2025

View reviewed changes

RTimothyEdwards added a commit that referenced this pull request Feb 24, 2025

Removed two unused statements from EFantenna.c, per the discussion

48708c5

in github pull request #377.

dlmiles force-pushed the master-upstream-20250216-nodiscard branch from db3f272 to ff6bcec Compare February 24, 2025 10:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nodiscard and DBResidueMask() use cases and cleanup #377

Add nodiscard and DBResidueMask() use cases and cleanup #377

dlmiles commented Feb 18, 2025 •

edited

Loading

dlmiles Feb 19, 2025

dlmiles Feb 19, 2025

dlmiles commented Feb 19, 2025 •

edited

Loading

dlmiles commented Feb 19, 2025 •

edited

Loading

dlmiles commented Feb 19, 2025 •

edited

Loading

dlmiles commented Feb 20, 2025

RTimothyEdwards left a comment

dlmiles commented Feb 23, 2025

RTimothyEdwards commented Feb 23, 2025

dlmiles commented Feb 24, 2025

Add __nodiscard__ and DBResidueMask() use cases and cleanup #377

Are you sure you want to change the base?

Add __nodiscard__ and DBResidueMask() use cases and cleanup #377

Conversation

dlmiles commented Feb 18, 2025 • edited Loading

dlmiles Feb 19, 2025

Choose a reason for hiding this comment

dlmiles Feb 19, 2025

Choose a reason for hiding this comment

dlmiles commented Feb 19, 2025 • edited Loading

dlmiles commented Feb 19, 2025 • edited Loading

dlmiles commented Feb 19, 2025 • edited Loading

dlmiles commented Feb 20, 2025

RTimothyEdwards left a comment

Choose a reason for hiding this comment

dlmiles commented Feb 23, 2025

RTimothyEdwards commented Feb 23, 2025

dlmiles commented Feb 24, 2025

Add nodiscard and DBResidueMask() use cases and cleanup #377

Add nodiscard and DBResidueMask() use cases and cleanup #377

dlmiles commented Feb 18, 2025 •

edited

Loading

dlmiles commented Feb 19, 2025 •

edited

Loading

dlmiles commented Feb 19, 2025 •

edited

Loading

dlmiles commented Feb 19, 2025 •

edited

Loading