[MLIR] Fix an assert that contains a mistake in conditional operator (#95668) #6

shiltian · 2024-06-18T18:46:30Z

[MLIR] Fix an assert that contains a mistake in conditional operator (llvm#95668)

This is described in (N2) https://pvs-studio.com/en/blog/posts/cpp/1126/
so caught by the PVS Studio analyzer.

Warning message -
V502 Perhaps the '?:' operator works in a different way than it was
expected. The '?:' operator has a lower priority than the '+' operator.
LoopEmitter.cpp 983
V502 Perhaps the '?:' operator works in a different way than it was
expected. The '?:' operator has a lower priority than the '+' operator.
LoopEmitter.cpp 1039

The assert should be
assert(bArgs.size() == reduc.size() + (needsUniv ? 1 : 0));
since + has higher precedence and ? has lower.

This further can be reduce to
assert(aArgs.size() == reduc.size() + needsUniv);
because needUniv is a bool value which is implicitly converted to 0 or

AMDGPU: Remove .v2bf16 buffer atomic fadd intrinsics (llvm#95783)

These are redundant with the unsuffixed versions, and have a name
collision with surprising behavior when the base intrinsic is used with
v2bf16.

The global and flat variants should be removed too, but those are complicated
due to using v2i16 in place of the natural v2bf16. Those cases can soon be
completely deleted in favor of atomicrmw.

The GlobalISel codegen change is broken and substitutes handling as bf16
for handling as f16, but it's a bug that this passed the IRTranslator in the first
place.

AMDGPU: Cleanup selection patterns for buffer loads (llvm#95378)

We should just support these for all register types.

[Offload] Change HSA header search order (llvm#95769)

Summary:
The HSA headers existed previously in include/hsa.h and were moved to
include/hsa/hsa.h in a later ROCm version. The include headers here
were originally designed to favor a newer one. However, this
unintentionally prevented the dyanmic HSA's hsa.h from being used if
both were present. This patch changes the order so it will be found
first.

Related to llvm#95484.

[Flang] Switch to common::visit more call sites (llvm#90018)

Switch to common::visit more call sites.

Test plan: ninja check-all

[flang] Fix comments and formatting. (NFC) (llvm#95786)

As mentioned in
here,
the formatting of the comments have been fixed. Also added comments
before literal arguments.

[CI][format] Explicitly pass extensions to git-clang-format (llvm#95794)

This ensures that the CI script controls which file extensions are
considered instead of letting git-clang-format apply its own filtering
rules. In particular, this properly handles libc++ extension-less
headers which were passed to git-clang-format, but then dropped by that
tool as having an unrecognized extension.

[HWASan] [compiler-rt] support non-4k pages on Android (llvm#95069)

Reapply "[mlir][sparse] implement lowering rules for IterateOp." (llvm#95836)

Reland [mlir][Target] Improve ROCDL gpu serialization API (llvm#95813)

Reland: llvm#95456

This patch improves the ROCDL gpu serialization API by:

Introducing the enum AMDGCNLibraries for specifying the AMD GCN
device code libraries to use during linking.
Removing getCommonBitcodeLibs in favor of AMDGCNLibraries.
Previously getCommonBitcodeLibs would try to load all AMD GCN bitcode
librariesm now it will only load the requested libraries.
Exposing the compileToBinary method and making it virtual, allowing
downstream users to re-use this method.
Exposing moduleToObjectImpl, this method provides a prototype flow
for compiling to binary, allowing downstream users to re-use this
method.
It also avoids constructing the control variables if no device
libraries are being used.
Changes the style of the error messages to be composable, ie no full
stops.
Adds an error message for when the ROCm toolkit can't be found but it
was required.

[libc] Only include getauxval on AARCH64 targets (llvm#95844)

Summary:
Not all platforms support this function or header, but it was being
included by every test. Move it inside of the ifdef for the only user,
which is aarch64.

[libc][stdlib] Only add internal malloc in full build mode. Use the system malloc in overlay mode. (llvm#95845)

This causes an issue in overlay mode:
llvm#95736 (comment)

[scudo] Update error handling for seondary cache entry count (llvm#95595)

Initially, the scudo allocator would return an error if the user
attempted to set the cache capacity
(i.e. the number of possible entries in the cache) above the maximum
cache capacity.
Now the allocator will resort to using the maximum cache capacity in
this event.
An error will still be returned if the user attempts to set the number
of entries to a negative value.

[mlgo] inline for size: add bypass mechanism for perserving performance (llvm#95616)

This allows shrinking for size the cold part of the code, without sacrificing performance.

Revert "Reland [mlir][Target] Improve ROCDL gpu serialization API" (llvm#95847)

Reverts llvm#95813

Revert "[HWASan] [compiler-rt] support non-4k pages on Android" (llvm#95853)

Reverts llvm#95069

Broke windows bot

[mlgo] remove inlining_default - unused feature

The feature was only exposed for training and was immediately dropped on
the training side. It was bulk-copied into the test model generator, where
it had no effect (the generator always returns a constant).

In the AOT + test model case, since the test model returns a constant, all
input features are pruned by the AOT compiler, so its presence/absence
doesn't matter.

[libc][stdlib] Run freelist_heap_test only in full build mode. (llvm#95850)

[mlir][drr] Fix variadic destination emission (llvm#95855)

Its possible for handleResultPattern to emit helpers, these helpers
cannot be interleaved with pushing into the array. Emit into a separate
string to enable helpers to be emitted before the population of vector.

Signed-off-by: Jacques Pienaar jpienaar@google.com

Fix comments in ValueObjectPrinter.h (NFC)

[BOLT] Drop high discrepancy profiles in matching (llvm#95156)

Summary: Functions with high discrepancy
(measured by matched function blocks)
can be ignored with an added command line
argument for better performance.

Test Plan: Added
stale-matching-min-matched-block.test

Co-authored-by: Amir Ayupov aaupov@fb.com

[lldb] Fix Python interpreter workaround (attempt #2)

On macOS, to make DYLD_INSERT_LIBRARIES and the Python shim work
together, we have a workaroud that copies the "real" Python interpreter
into the build directory. This doesn't work when running in a virtual
environment, as the copied interpreter cannot find the packages
installed in the virtual environment relative to itself.

Address this issue by copying the Python interpreter into the virtual
environment's bin folder, rather than the build folder, when the test
suite detects that it's being run inside a virtual environment.

I'm not thrilled about this solution because it puts a file outside the
build directory. However, given virtual environments are considered
disposable, this seems reasonable.

[docs][Security Group] Update Apple representation. (llvm#95491)

Remove Kate; Apple is now represented by Oliver (and myself).

Reapply "[HWASan] [compiler-rt] support non-4k pages on Android" (llvm#95853)

Updated MapDynamicShadow callsite in asan_win.

[Clang][AMDGPU] Add a new builtin type for buffer rsrc

…lvm#95668) This is described in (N2) https://pvs-studio.com/en/blog/posts/cpp/1126/ so caught by the PVS Studio analyzer. Warning message - V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '+' operator. LoopEmitter.cpp 983 V502 Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '+' operator. LoopEmitter.cpp 1039 The assert should be assert(bArgs.size() == reduc.size() + (needsUniv ? 1 : 0)); since + has higher precedence and ? has lower. This further can be reduce to assert(aArgs.size() == reduc.size() + needsUniv); because needUniv is a bool value which is implicitly converted to 0 or

These are redundant with the unsuffixed versions, and have a name collision with surprising behavior when the base intrinsic is used with v2bf16. The global and flat variants should be removed too, but those are complicated due to using v2i16 in place of the natural v2bf16. Those cases can soon be completely deleted in favor of atomicrmw. The GlobalISel codegen change is broken and substitutes handling as bf16 for handling as f16, but it's a bug that this passed the IRTranslator in the first place.

We should just support these for all register types.

Summary: The HSA headers existed previously in `include/hsa.h` and were moved to `include/hsa/hsa.h` in a later ROCm version. The include headers here were originally designed to favor a newer one. However, this unintentionally prevented the dyanmic HSA's `hsa.h` from being used if both were present. This patch changes the order so it will be found first. Related to llvm#95484.

Switch to common::visit more call sites. Test plan: ninja check-all

As mentioned in [here](llvm#95462 (comment)), the formatting of the comments have been fixed. Also added comments before literal arguments.

This ensures that the CI script controls which file extensions are considered instead of letting git-clang-format apply its own filtering rules. In particular, this properly handles libc++ extension-less headers which were passed to git-clang-format, but then dropped by that tool as having an unrecognized extension.

…m#95836)

Reland: llvm#95456 This patch improves the ROCDL gpu serialization API by: - Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN device code libraries to use during linking. - Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`. Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode librariesm now it will only load the requested libraries. - Exposing the `compileToBinary` method and making it virtual, allowing downstream users to re-use this method. - Exposing `moduleToObjectImpl`, this method provides a prototype flow for compiling to binary, allowing downstream users to re-use this method. - It also avoids constructing the control variables if no device libraries are being used. - Changes the style of the error messages to be composable, ie no full stops. - Adds an error message for when the ROCm toolkit can't be found but it was required.

Summary: Not all platforms support this function or header, but it was being included by every test. Move it inside of the `ifdef` for the only user, which is aarch64.

…ystem malloc in overlay mode. (llvm#95845) This causes an issue in overlay mode: llvm#95736 (comment)

) Initially, the scudo allocator would return an error if the user attempted to set the cache capacity (i.e. the number of possible entries in the cache) above the maximum cache capacity. Now the allocator will resort to using the maximum cache capacity in this event. An error will still be returned if the user attempts to set the number of entries to a negative value.

…ce (llvm#95616) This allows shrinking for size the cold part of the code, without sacrificing performance.

…lvm#95847) Reverts llvm#95813

…#95853) Reverts llvm#95069 Broke windows bot

The feature was only exposed for training and was immediately dropped on the training side. It was bulk-copied into the test model generator, where it had no effect (the generator always returns a constant). In the AOT + test model case, since the test model returns a constant, all input features are pruned by the AOT compiler, so its presence/absence doesn't matter.

…95850)

Its possible for handleResultPattern to emit helpers, these helpers cannot be interleaved with pushing into the array. Emit into a separate string to enable helpers to be emitted before the population of vector. Signed-off-by: Jacques Pienaar <jpienaar@google.com>

Summary: Functions with high discrepancy (measured by matched function blocks) can be ignored with an added command line argument for better performance. Test Plan: Added stale-matching-min-matched-block.test --------- Co-authored-by: Amir Ayupov <aaupov@fb.com>

On macOS, to make DYLD_INSERT_LIBRARIES and the Python shim work together, we have a workaroud that copies the "real" Python interpreter into the build directory. This doesn't work when running in a virtual environment, as the copied interpreter cannot find the packages installed in the virtual environment relative to itself. Address this issue by copying the Python interpreter into the virtual environment's `bin` folder, rather than the build folder, when the test suite detects that it's being run inside a virtual environment. I'm not thrilled about this solution because it puts a file outside the build directory. However, given virtual environments are considered disposable, this seems reasonable.

Remove Kate; Apple is now represented by Oliver (and myself).

…m#95853) Updated MapDynamicShadow callsite in asan_win.

shiltian · 2024-06-18T18:46:46Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @shiltian and the rest of your teammates on Graphite

shiltian · 2024-06-18T18:46:49Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @shiltian and the rest of your teammates on Graphite

shiltian · 2024-06-18T18:46:51Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @shiltian and the rest of your teammates on Graphite

…104523) Compilers and language runtimes often use helper functions that are fundamentally uninteresting when debugging anything but the compiler/runtime itself. This patch introduces a user-extensible mechanism that allows for these frames to be hidden from backtraces and automatically skipped over when navigating the stack with `up` and `down`. This does not affect the numbering of frames, so `f <N>` will still provide access to the hidden frames. The `bt` output will also print a hint that frames have been hidden. My primary motivation for this feature is to hide thunks in the Swift programming language, but I'm including an example recognizer for `std::function::operator()` that I wished for myself many times while debugging LLDB. rdar://126629381 Example output. (Yes, my proof-of-concept recognizer could hide even more frames if we had a method that returned the function name without the return type or I used something that isn't based off regex, but it's really only meant as an example). before: ``` (lldb) thread backtrace --filtered=false * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame #3: 0x0000000100003968 a.out`std::__1::__function::__alloc_func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()[abi:se200000](this=0x000000016fdff280, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:171:12 frame #4: 0x00000001000026bc a.out`std::__1::__function::__func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()(this=0x000000016fdff278, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:313:10 frame #5: 0x0000000100003c38 a.out`std::__1::__function::__value_func<int (int, int)>::operator()[abi:se200000](this=0x000000016fdff278, __args=0x000000016fdff224, __args=0x000000016fdff220) const at function.h:430:12 frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame #8: 0x0000000183cdf154 dyld`start + 2476 (lldb) ``` after ``` (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10 frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25 frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12 frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10 frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10 frame #8: 0x0000000183cdf154 dyld`start + 2476 Note: Some frames were hidden by frame recognizers ```

xgupta and others added 25 commits June 18, 2024 01:09

AMDGPU: Cleanup selection patterns for buffer loads (llvm#95378)

8930ac1

We should just support these for all register types.

[Flang] Switch to common::visit more call sites (llvm#90018)

77d8cfb

Switch to common::visit more call sites. Test plan: ninja check-all

[flang] Fix comments and formatting. (NFC) (llvm#95786)

7ddff3a

As mentioned in [here](llvm#95462 (comment)), the formatting of the comments have been fixed. Also added comments before literal arguments.

[HWASan] [compiler-rt] support non-4k pages on Android (llvm#95069)

5b04b6f

Reapply "[mlir][sparse] implement lowering rules for IterateOp." (llv…

d6cc35f

…m#95836)

[libc] Only include getauxval on AARCH64 targets (llvm#95844)

44ca656

Summary: Not all platforms support this function or header, but it was being included by every test. Move it inside of the `ifdef` for the only user, which is aarch64.

[libc][stdlib] Only add internal malloc in full build mode. Use the s…

a50bcc0

…ystem malloc in overlay mode. (llvm#95845) This causes an issue in overlay mode: llvm#95736 (comment)

[mlgo] inline for size: add bypass mechanism for perserving performan…

6037a69

…ce (llvm#95616) This allows shrinking for size the cold part of the code, without sacrificing performance.

Revert "Reland [mlir][Target] Improve ROCDL gpu serialization API" (l…

3a2f7d8

…lvm#95847) Reverts llvm#95813

Revert "[HWASan] [compiler-rt] support non-4k pages on Android" (llvm…

1adf0fa

…#95853) Reverts llvm#95069 Broke windows bot

[libc][stdlib] Run freelist_heap_test only in full build mode. (llvm#…

ca22469

…95850)

Fix comments in ValueObjectPrinter.h (NFC)

6c17f1c

[docs][Security Group] Update Apple representation. (llvm#95491)

cd48335

Remove Kate; Apple is now represented by Oliver (and myself).

Reapply "[HWASan] [compiler-rt] support non-4k pages on Android" (llv…

c6049e6

…m#95853) Updated MapDynamicShadow callsite in asan_win.

[Clang][AMDGPU] Add a new builtin type for buffer rsrc

013a40d

This was referenced Jun 18, 2024

[Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic #7

Closed

[Clang][AMDGPU] Add builtins for instrinsic llvm.amdgcn.raw.buffer.store #8

Closed

shiltian closed this Jun 18, 2024

shiltian deleted the new-builtin-type-buffer-rsrc-t branch June 19, 2024 00:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLIR] Fix an assert that contains a mistake in conditional operator (#95668) #6

[MLIR] Fix an assert that contains a mistake in conditional operator (#95668) #6

shiltian commented Jun 18, 2024

shiltian commented Jun 18, 2024

shiltian commented Jun 18, 2024

shiltian commented Jun 18, 2024

[MLIR] Fix an assert that contains a mistake in conditional operator (#95668) #6

[MLIR] Fix an assert that contains a mistake in conditional operator (#95668) #6

Conversation

shiltian commented Jun 18, 2024

shiltian commented Jun 18, 2024

shiltian commented Jun 18, 2024

shiltian commented Jun 18, 2024