Fix crashes caused by unaligned memory stores in stream_mem kernel #650

fairydreaming · 2024-11-26T13:41:59Z

This PR replaces movntdq instructions in stream_mem kernel with a combination of:

unpcklpd instructions that unpack scalar double precision floating point values from FPR1 and FPR2 into FPR1 register and values from FPR3 and FPR4 into FPR3 register
movntpd instructions that store values from FPR1 and FPR3 registers to memory

The original kernel caused segmentation faults due to unmet memory alignment requirement of movntdq (16 bytes) because the memory offsets in the kernel were increased by 8 bytes.

Also corrected the kernel description, INSTR_LOOP and UOPS values. Please check if the numbers are correct.

Fixes #649.

…ernel.

TomTheBear · 2024-11-26T14:07:09Z

Thx for the PR.

The INSTR_LOOP is correct. You have 16 instructions in the loop plus the loop increment, the compare and the jump instruction. For the UOPS, I'm counting only 22. Each of the instructions is at least one uop. The addsd instructions load from a memory location which adds a uop each. The loop increment is one uop but the compare&jump instructions are merged into one uop. The movnt instructions are one uop on all architectures I checked because we store from an xmm register (AMD Zen partly uses 2 uops if ymm or zmm register).

fairydreaming · 2024-11-26T14:55:22Z

The INSTR_LOOP is correct. You have 16 instructions in the loop plus the loop increment, the compare and the jump instruction. For the UOPS, I'm counting only 22. Each of the instructions is at least one uop. The addsd instructions load from a memory location which adds a uop each. The loop increment is one uop but the compare&jump instructions are merged into one uop. The movnt instructions are one uop on all architectures I checked because we store from an xmm register (AMD Zen partly uses 2 uops if ymm or zmm register).

OK, I corrected it to 22.

Fixed problem with crashes caused by unaligned memory in stream_mem k…

aa3d056

…ernel.

TomTheBear mentioned this pull request Nov 26, 2024

[BUG] Segmentation Fault in likwid-bench when executing stream_mem benchmark on Epyc 9374F #649

Closed

fairydreaming force-pushed the stream-mem-alignment-fix branch from 0fd31b0 to 3805403 Compare November 26, 2024 14:53

Corrected INSTR_LOOP and UOPS values.

5ead1c7

fairydreaming force-pushed the stream-mem-alignment-fix branch from 3805403 to 5ead1c7 Compare November 26, 2024 14:54

TomTheBear merged commit 653455d into RRZE-HPC:master Nov 27, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix crashes caused by unaligned memory stores in stream_mem kernel #650

Fix crashes caused by unaligned memory stores in stream_mem kernel #650

fairydreaming commented Nov 26, 2024

TomTheBear commented Nov 26, 2024

fairydreaming commented Nov 26, 2024

Fix crashes caused by unaligned memory stores in stream_mem kernel #650

Fix crashes caused by unaligned memory stores in stream_mem kernel #650

Conversation

fairydreaming commented Nov 26, 2024

TomTheBear commented Nov 26, 2024

fairydreaming commented Nov 26, 2024