Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLIR] Fix an assert that contains a mistake in conditional operator (#95668) #6

Closed
wants to merge 25 commits into from

Conversation

shiltian
Copy link
Owner

[MLIR] Fix an assert that contains a mistake in conditional operator (llvm#95668)

This is described in (N2) https://pvs-studio.com/en/blog/posts/cpp/1126/
so caught by the PVS Studio analyzer.

Warning message -
V502 Perhaps the '?:' operator works in a different way than it was
expected. The '?:' operator has a lower priority than the '+' operator.
LoopEmitter.cpp 983
V502 Perhaps the '?:' operator works in a different way than it was
expected. The '?:' operator has a lower priority than the '+' operator.
LoopEmitter.cpp 1039

The assert should be
assert(bArgs.size() == reduc.size() + (needsUniv ? 1 : 0));
since + has higher precedence and ? has lower.

This further can be reduce to
assert(aArgs.size() == reduc.size() + needsUniv);
because needUniv is a bool value which is implicitly converted to 0 or

AMDGPU: Remove .v2bf16 buffer atomic fadd intrinsics (llvm#95783)

These are redundant with the unsuffixed versions, and have a name
collision with surprising behavior when the base intrinsic is used with
v2bf16.

The global and flat variants should be removed too, but those are complicated
due to using v2i16 in place of the natural v2bf16. Those cases can soon be
completely deleted in favor of atomicrmw.

The GlobalISel codegen change is broken and substitutes handling as bf16
for handling as f16, but it's a bug that this passed the IRTranslator in the first
place.

AMDGPU: Cleanup selection patterns for buffer loads (llvm#95378)

We should just support these for all register types.

[Offload] Change HSA header search order (llvm#95769)

Summary:
The HSA headers existed previously in include/hsa.h and were moved to
include/hsa/hsa.h in a later ROCm version. The include headers here
were originally designed to favor a newer one. However, this
unintentionally prevented the dyanmic HSA's hsa.h from being used if
both were present. This patch changes the order so it will be found
first.

Related to llvm#95484.

[Flang] Switch to common::visit more call sites (llvm#90018)

Switch to common::visit more call sites.

Test plan: ninja check-all

[flang] Fix comments and formatting. (NFC) (llvm#95786)

As mentioned in
here,
the formatting of the comments have been fixed. Also added comments
before literal arguments.

[CI][format] Explicitly pass extensions to git-clang-format (llvm#95794)

This ensures that the CI script controls which file extensions are
considered instead of letting git-clang-format apply its own filtering
rules. In particular, this properly handles libc++ extension-less
headers which were passed to git-clang-format, but then dropped by that
tool as having an unrecognized extension.

[HWASan] [compiler-rt] support non-4k pages on Android (llvm#95069)

Reapply "[mlir][sparse] implement lowering rules for IterateOp." (llvm#95836)

Reland [mlir][Target] Improve ROCDL gpu serialization API (llvm#95813)

Reland: llvm#95456

This patch improves the ROCDL gpu serialization API by:

  • Introducing the enum AMDGCNLibraries for specifying the AMD GCN
    device code libraries to use during linking.
  • Removing getCommonBitcodeLibs in favor of AMDGCNLibraries.
    Previously getCommonBitcodeLibs would try to load all AMD GCN bitcode
    librariesm now it will only load the requested libraries.
  • Exposing the compileToBinary method and making it virtual, allowing
    downstream users to re-use this method.
  • Exposing moduleToObjectImpl, this method provides a prototype flow
    for compiling to binary, allowing downstream users to re-use this
    method.
  • It also avoids constructing the control variables if no device
    libraries are being used.
  • Changes the style of the error messages to be composable, ie no full
    stops.
  • Adds an error message for when the ROCm toolkit can't be found but it
    was required.

[libc] Only include getauxval on AARCH64 targets (llvm#95844)

Summary:
Not all platforms support this function or header, but it was being
included by every test. Move it inside of the ifdef for the only user,
which is aarch64.

[libc][stdlib] Only add internal malloc in full build mode. Use the system malloc in overlay mode. (llvm#95845)

This causes an issue in overlay mode:
llvm#95736 (comment)

[scudo] Update error handling for seondary cache entry count (llvm#95595)

Initially, the scudo allocator would return an error if the user
attempted to set the cache capacity
(i.e. the number of possible entries in the cache) above the maximum
cache capacity.
Now the allocator will resort to using the maximum cache capacity in
this event.
An error will still be returned if the user attempts to set the number
of entries to a negative value.

[mlgo] inline for size: add bypass mechanism for perserving performance (llvm#95616)

This allows shrinking for size the cold part of the code, without sacrificing performance.

Revert "Reland [mlir][Target] Improve ROCDL gpu serialization API" (llvm#95847)

Reverts llvm#95813

Revert "[HWASan] [compiler-rt] support non-4k pages on Android" (llvm#95853)

Reverts llvm#95069

Broke windows bot

[mlgo] remove inlining_default - unused feature

The feature was only exposed for training and was immediately dropped on
the training side. It was bulk-copied into the test model generator, where
it had no effect (the generator always returns a constant).

In the AOT + test model case, since the test model returns a constant, all
input features are pruned by the AOT compiler, so its presence/absence
doesn't matter.

[libc][stdlib] Run freelist_heap_test only in full build mode. (llvm#95850)

[mlir][drr] Fix variadic destination emission (llvm#95855)

Its possible for handleResultPattern to emit helpers, these helpers
cannot be interleaved with pushing into the array. Emit into a separate
string to enable helpers to be emitted before the population of vector.

Signed-off-by: Jacques Pienaar jpienaar@google.com

Fix comments in ValueObjectPrinter.h (NFC)

[BOLT] Drop high discrepancy profiles in matching (llvm#95156)

Summary: Functions with high discrepancy
(measured by matched function blocks)
can be ignored with an added command line
argument for better performance.

Test Plan: Added
stale-matching-min-matched-block.test


Co-authored-by: Amir Ayupov aaupov@fb.com

[lldb] Fix Python interpreter workaround (attempt #2)

On macOS, to make DYLD_INSERT_LIBRARIES and the Python shim work
together, we have a workaroud that copies the "real" Python interpreter
into the build directory. This doesn't work when running in a virtual
environment, as the copied interpreter cannot find the packages
installed in the virtual environment relative to itself.

Address this issue by copying the Python interpreter into the virtual
environment's bin folder, rather than the build folder, when the test
suite detects that it's being run inside a virtual environment.

I'm not thrilled about this solution because it puts a file outside the
build directory. However, given virtual environments are considered
disposable, this seems reasonable.

[docs][Security Group] Update Apple representation. (llvm#95491)

Remove Kate; Apple is now represented by Oliver (and myself).

Reapply "[HWASan] [compiler-rt] support non-4k pages on Android" (llvm#95853)

Updated MapDynamicShadow callsite in asan_win.

[Clang][AMDGPU] Add a new builtin type for buffer rsrc

xgupta and others added 25 commits June 18, 2024 01:09
…lvm#95668)

This is described in (N2) https://pvs-studio.com/en/blog/posts/cpp/1126/
so caught by the PVS Studio analyzer.

Warning message -
V502 Perhaps the '?:' operator works in a different way than it was
expected. The '?:' operator has a lower priority than the '+' operator.
LoopEmitter.cpp 983
V502 Perhaps the '?:' operator works in a different way than it was
expected. The '?:' operator has a lower priority than the '+' operator.
LoopEmitter.cpp 1039

The assert should be
assert(bArgs.size() == reduc.size() + (needsUniv ? 1 : 0));
since + has higher precedence and ? has lower.

This further can be reduce to
assert(aArgs.size() == reduc.size() + needsUniv);
because needUniv is a bool value which is implicitly converted to 0 or
These are redundant with the unsuffixed versions, and have a name
collision with surprising behavior when the base intrinsic is used with
v2bf16.

The global and flat variants should be removed too, but those are complicated
due to using v2i16 in place of the natural v2bf16. Those cases can soon be
completely deleted in favor of atomicrmw.

The GlobalISel codegen change is broken and substitutes handling as bf16
for handling as f16, but it's a bug that this passed the IRTranslator in the first
place.
We should just support these for all register types.
Summary:
The HSA headers existed previously in `include/hsa.h` and were moved to
`include/hsa/hsa.h` in a later ROCm version. The include headers here
were originally designed to favor a newer one. However, this
unintentionally prevented the dyanmic HSA's `hsa.h` from being used if
both were present. This patch changes the order so it will be found
first.

Related to llvm#95484.
Switch to common::visit more call sites.

Test plan: ninja check-all
As mentioned in
[here](llvm#95462 (comment)),
the formatting of the comments have been fixed. Also added comments
before literal arguments.
This ensures that the CI script controls which file extensions are
considered instead of letting git-clang-format apply its own filtering
rules. In particular, this properly handles libc++ extension-less
headers which were passed to git-clang-format, but then dropped by that
tool as having an unrecognized extension.
Reland: llvm#95456

This patch improves the ROCDL gpu serialization API by:
- Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN
device code libraries to use during linking.
- Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`.
Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode
librariesm now it will only load the requested libraries.
- Exposing the `compileToBinary` method and making it virtual, allowing
downstream users to re-use this method.
- Exposing `moduleToObjectImpl`, this method provides a prototype flow
for compiling to binary, allowing downstream users to re-use this
method.
- It also avoids constructing the control variables if no device
libraries are being used.
- Changes the style of the error messages to be composable, ie no full
stops.
- Adds an error message for when the ROCm toolkit can't be found but it
was required.
Summary:
Not all platforms support this function or header, but it was being
included by every test. Move it inside of the `ifdef` for the only user,
which is aarch64.
…ystem malloc in overlay mode. (llvm#95845)

This causes an issue in overlay mode:
llvm#95736 (comment)
)

Initially, the scudo allocator would return an error if the user
attempted to set the cache capacity
(i.e. the number of possible entries in the cache) above the maximum
cache capacity.
Now the allocator will resort to using the maximum cache capacity in
this event.
An error will still be returned if the user attempts to set the number
of entries to a negative value.
…ce (llvm#95616)

This allows shrinking for size the cold part of the code, without sacrificing performance.
The feature was only exposed for training and was immediately dropped on
the training side. It was bulk-copied into the test model generator, where
it had no effect (the generator always returns a constant).

In the AOT + test model case, since the test model returns a constant, all
input features are pruned by the AOT compiler, so its presence/absence
doesn't matter.
Its possible for handleResultPattern to emit helpers, these helpers
cannot be interleaved with pushing into the array. Emit into a separate
string to enable helpers to be emitted before the population of vector.

Signed-off-by: Jacques Pienaar <jpienaar@google.com>
Summary: Functions with high discrepancy 
(measured by matched function blocks) 
can be ignored with an added command line 
argument for better performance.

Test Plan: Added 
stale-matching-min-matched-block.test

---------

Co-authored-by: Amir Ayupov <aaupov@fb.com>
On macOS, to make DYLD_INSERT_LIBRARIES and the Python shim work
together, we have a workaroud that copies the "real" Python interpreter
into the build directory. This doesn't work when running in a virtual
environment, as the copied interpreter cannot find the packages
installed in the virtual environment relative to itself.

Address this issue by copying the Python interpreter into the virtual
environment's `bin` folder, rather than the build folder, when the test
suite detects that it's being run inside a virtual environment.

I'm not thrilled about this solution because it puts a file outside the
build directory. However, given virtual environments are considered
disposable, this seems reasonable.
Remove Kate; Apple is now represented by Oliver (and myself).
1 similar comment
@shiltian shiltian closed this Jun 18, 2024
@shiltian shiltian deleted the new-builtin-type-buffer-rsrc-t branch June 19, 2024 00:46
shiltian pushed a commit that referenced this pull request Aug 26, 2024
…104523)

Compilers and language runtimes often use helper functions that are
fundamentally uninteresting when debugging anything but the
compiler/runtime itself. This patch introduces a user-extensible
mechanism that allows for these frames to be hidden from backtraces and
automatically skipped over when navigating the stack with `up` and
`down`.

This does not affect the numbering of frames, so `f <N>` will still
provide access to the hidden frames. The `bt` output will also print a
hint that frames have been hidden.

My primary motivation for this feature is to hide thunks in the Swift
programming language, but I'm including an example recognizer for
`std::function::operator()` that I wished for myself many times while
debugging LLDB.

rdar://126629381


Example output. (Yes, my proof-of-concept recognizer could hide even
more frames if we had a method that returned the function name without
the return type or I used something that isn't based off regex, but it's
really only meant as an example).

before:
```
(lldb) thread backtrace --filtered=false
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10
    frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25
    frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12
    frame #3: 0x0000000100003968 a.out`std::__1::__function::__alloc_func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()[abi:se200000](this=0x000000016fdff280, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:171:12
    frame #4: 0x00000001000026bc a.out`std::__1::__function::__func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()(this=0x000000016fdff278, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:313:10
    frame #5: 0x0000000100003c38 a.out`std::__1::__function::__value_func<int (int, int)>::operator()[abi:se200000](this=0x000000016fdff278, __args=0x000000016fdff224, __args=0x000000016fdff220) const at function.h:430:12
    frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10
    frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10
    frame #8: 0x0000000183cdf154 dyld`start + 2476
(lldb) 
```

after

```
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10
    frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25
    frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12
    frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10
    frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10
    frame #8: 0x0000000183cdf154 dyld`start + 2476
Note: Some frames were hidden by frame recognizers
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.