Skip to content

[SYCL] Re-use OpenCL sampler in SYCL mode #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from
Closed

Conversation

bader
Copy link
Owner

@bader bader commented May 21, 2019

This is an attempt to re-use OpenCL sampler for SYCL.

  • sampler_t type name is replaced with __ocl_sampler_t to avoid
    potential collisions with user types
  • selectively enabled a couple of OpenCL diagnostics for this type

Notes:

  • errors are reported in the SYCL headers rather than in the user's code
  • some diagnostics trigger on SYCL use cases. E.g. SYCL kernel
    initializes lambda members with the values from kernel arguments. This
    initialization code for sampler emits errors because OpenCL allows
    sampler initialization only with integer constants. Another potential
    issue is the lambda object itself - captured sampler is a member of the
    lambda object and OpenCL doesn't allow composite types with samplers.
    SPIR-V produced from SYCL should be okay as lambda object can be removed
    by standard LLVM transformation passes.

Signed-off-by: Alexey Bader alexey.bader@intel.com

Fix bug with not found alloca in case of two accessors are created for
the same memory object. The problem is that during searching for
dependencies for the first accessor list of leaf command is modified so
it doesn't contain commands that are considered as dependencies. As a
result search for dependencies for the second accessor fails.
The fix is to update list of leafs after dependencies for all accessors
are found.

Signed-off-by: Vlad Romanov <vlad.romanov@intel.com>
@bader bader force-pushed the sampler-experiment branch 2 times, most recently from 61858d8 to d93afb1 Compare May 21, 2019 16:20
When sampler was created on host with enums and passed to device we
released null sampler because in GetOrCreateSampler we set m_ReleaseSampler to
true and stored created sampler not into m_Sampler field which is released in
destructor.

Signed-off-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
@bader
Copy link
Owner Author

bader commented May 21, 2019

Error reported by OpenCL diagnostics:

$ "/home/bader/build-sycl/bin/clang" "-I/home/bader/OpenCL-Headers" "-std=c++11" "-fsycl" "/home/bader/sycl/sycl/test/sub_group/shuffle.cpp" "-o" "/home/bader/build-sycl/projects/sycl/test/sub_group/Output/shuffle.cpp.tmp.out" "-lstdc++"
 "-lOpenCL" "-lsycl"
# command stderr:
In file included from /home/bader/sycl/sycl/test/sub_group/shuffle.cpp:14:
In file included from /home/bader/sycl/sycl/test/sub_group/helper.hpp:8:
In file included from /home/bader/build-sycl/lib/clang/9.0.0/include/CL/sycl.hpp:15:
In file included from /home/bader/build-sycl/lib/clang/9.0.0/include/CL/sycl/accessor.hpp:11:
In file included from /home/bader/build-sycl/lib/clang/9.0.0/include/CL/sycl/accessor2.hpp:12:
In file included from /home/bader/build-sycl/lib/clang/9.0.0/include/CL/sycl/buffer.hpp:11:
In file included from /home/bader/build-sycl/lib/clang/9.0.0/include/CL/sycl/detail/buffer_impl2.hpp:20:
In file included from /home/bader/build-sycl/lib/clang/9.0.0/include/CL/sycl/handler.hpp:10:
In file included from /home/bader/build-sycl/lib/clang/9.0.0/include/CL/sycl/handler2.hpp:28:
/home/bader/build-sycl/lib/clang/9.0.0/include/CL/sycl/sampler.hpp:63:57: error: invalid operands to binary expression ('__ocl_sampler_t' (aka 'sampler_t') and '__ocl_sampler_t')
  void __init(__ocl_sampler_t Sampler) { impl.m_Sampler = Sampler; }
                                         ~~~~~~~~~~~~~~ ^ ~~~~~~~
1 error generated.

MrSidims and others added 13 commits May 22, 2019 13:22
Following attributes/pragmas are supported:

ivdep
ivdep safelen (N)
ii(N)
max_concurrency_loop(N)
LLVM IR ->SPIRV translation (appropriately the set order):

!"llvm.loop.ivdep.enable ->
4 LoopMerge {{[0-9]+}} {{[0-9]+}} 4
!"llvm.loop.ivdep.safelen", i32 N ->
5 LoopMerge {{[0-9]+}} {{[0-9]+}} 8 N
!"llvm.loop.ii.count", i32 N ->
6 LoopMerge {{[0-9]+}} {{[0-9]+}} 0x80000000 5889 N
!"llvm.loop.max_concurrency.count", i32 N ->
6 LoopMerge {{[0-9]+}} {{[0-9]+}} 0x80000000 5890 N
Also a case of multiple bit masks set for LoopControl is
now supported.

SPEC: KhronosGroup/SPIRV-Registry#28

Signed-off-by: Dmitry Sidorov dmitry.sidorov@intel.com
All non-CMake build files were removed from the ICD Loader
project. SYCL building scripts now run cmake to generate
make build scripts.

Signed-off-by: Vladimir Lazarev <vladimir.lazarev@intel.com>
Signed-off-by: Alexey Bader <alexey.bader@intel.com>
The workaround is required for the GCC 5+ version.

Signed-off-by: Alexey Bader <alexey.bader@intel.com>
ifdef checks if CL_TARGET_OPENCL_VERSION is defined or not and ignores
the rest of the expression checking exact value.

Signed-off-by: Alexey Bader <alexey.bader@intel.com>
Also updated results checking using ULP due half type precision
limitations.

Signed-off-by: Mariya Podchishchaeva <mariya.podchishchaeva@intel.com>
Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
Incorrect assertion was introduced in ee585e9

Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
kernel object

Using the types of the parameter variables of __init method instead

Signed-off-by: Sindhu Chittireddy <sindhu.chittireddy@intel.com>
This is an attempt to re-use OpenCL sampler for SYCL.

- `sampler_t` type name is replaced with `__ocl_sampler_t` to avoid
potential collisions with user types
- selectively enabled a couple of OpenCL diagnostics for this type

Notes:
- errors are reported in the SYCL headers rather than in the user's code
- some diagnostics trigger on SYCL use cases. E.g. SYCL kernel
initializes lambda members with the values from kernel arguments. This
initialization code for sampler emits errors because OpenCL disallows
sampler on the right hand side of the binary operators - these are
supposed to be used only by built-in functions. Another potential issue
is the lambda object itself - captured sampler is a member of the lambda
object and OpenCL doesn't allow composite types with samplers. SPIR-V
produced from SYCL should be okay as lambda object can be removed by
standard LLVM transformation passes.

Signed-off-by: Alexey Bader <alexey.bader@intel.com>
Signed-off-by: Alexey Bader <alexey.bader@intel.com>
Signed-off-by: Alexey Bader <alexey.bader@intel.com>
@bader bader force-pushed the sampler-experiment branch from d93afb1 to a36ebb2 Compare May 24, 2019 15:18
@bader
Copy link
Owner Author

bader commented May 24, 2019

Improved version of this patch posted here: intel#167

@bader bader closed this May 24, 2019
bader pushed a commit that referenced this pull request May 26, 2019
Looks like just a minor oversight in the parsing code.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41504.

Differential Revision: https://reviews.llvm.org/D60840

llvm-svn: 359855
bader pushed a commit that referenced this pull request May 26, 2019
Summary:
First part of a fix for JITed code debugging. This has been a regression from 5.0 to 6.0 and it's is still reproducible on current master: https://bugs.llvm.org/show_bug.cgi?id=36209

The address of the breakpoint site is corrupt: the 0x4 value we end up with, looks like an offset on a zero base address. When we parse the ELF section headers from the JIT descriptor, the load address for the text section we find in `header.sh_addr` is correct.

The bug manifests in `VMAddressProvider::GetVMRange(const ELFSectionHeader &)` (follow it from `ObjectFileELF::CreateSections()`). Here we think the object type was `eTypeObjectFile` and unleash some extra logic [1] which essentially overwrites the address with a zero value.

The object type is deduced from the ELF header's `e_type` in `ObjectFileELF::CalculateType()`. It never returns `eTypeJIT`, because the ELF header has no representation for it [2]. Instead the in-memory ELF object states `ET_REL`, which leads to `eTypeObjectFile`. This is what we get from `lli` at least since 3.x. (Might it be better to write `ET_EXEC` on the JIT side instead? In fact, relocations were already applied at this point, so "Relocatable" is not quite exact.)

So, this patch proposes to set `eTypeJIT` explicitly whenever we read from a JIT descriptor. In `ObjectFileELF::CreateSections()` we can then call `GetType()`, which returns the explicit value or otherwise falls back to `CalculateType()`.

LLDB then sets the breakpoint successfully. Next step: debug info.
```
Process 1056 stopped
* thread #1, name = 'lli', stop reason = breakpoint 1.2
    frame #0: 0x00007ffff7ff7000 JIT(0x3ba2030)`jitbp()
JIT(0x3ba2030)`jitbp:
->  0x7ffff7ff7000 <+0>:  pushq  %rbp
    0x7ffff7ff7001 <+1>:  movq   %rsp, %rbp
    0x7ffff7ff7004 <+4>:  movabsq $0x7ffff7ff6000, %rdi     ; imm = 0x7FFFF7FF6000
    0x7ffff7ff700e <+14>: movabsq $0x7ffff6697e80, %rcx     ; imm = 0x7FFFF6697E80
```

[1] It was first introduced with https://reviews.llvm.org/D38142#change-lF6csxV8HdlL, which has also been the original breaking change. The code has changed a lot since then.

[2] ELF object types: https://github.com/llvm/llvm-project/blob/2d2277f5/llvm/include/llvm/BinaryFormat/ELF.h#L110

Reviewers: labath, JDevlieghere, bkoropoff, clayborg, espindola, alexshap, stella.stamenova

Reviewed By: labath, clayborg

Subscribers: probinson, emaste, aprantl, arichardson, MaskRay, AlexDenisov, yurydelendik, lldb-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D61611

llvm-svn: 360354
bader pushed a commit that referenced this pull request May 26, 2019
This is the first in a set of patches I have to improve testing of
llvm-objdump. This patch targets --all-headers, --section, and
--full-contents. In the --section case, it deletes a pre-canned binary
which is only used by the one test and replaces it with yaml.

Reviewed by: grimar, MaskRay

Differential Revision: https://reviews.llvm.org/D61941

llvm-svn: 360893
bader pushed a commit that referenced this pull request Jun 7, 2019
Looks like just a minor oversight in the parsing code.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41504.

Differential Revision: https://reviews.llvm.org/D60840

llvm-svn: 359855
bader pushed a commit that referenced this pull request Jun 7, 2019
Summary:
First part of a fix for JITed code debugging. This has been a regression from 5.0 to 6.0 and it's is still reproducible on current master: https://bugs.llvm.org/show_bug.cgi?id=36209

The address of the breakpoint site is corrupt: the 0x4 value we end up with, looks like an offset on a zero base address. When we parse the ELF section headers from the JIT descriptor, the load address for the text section we find in `header.sh_addr` is correct.

The bug manifests in `VMAddressProvider::GetVMRange(const ELFSectionHeader &)` (follow it from `ObjectFileELF::CreateSections()`). Here we think the object type was `eTypeObjectFile` and unleash some extra logic [1] which essentially overwrites the address with a zero value.

The object type is deduced from the ELF header's `e_type` in `ObjectFileELF::CalculateType()`. It never returns `eTypeJIT`, because the ELF header has no representation for it [2]. Instead the in-memory ELF object states `ET_REL`, which leads to `eTypeObjectFile`. This is what we get from `lli` at least since 3.x. (Might it be better to write `ET_EXEC` on the JIT side instead? In fact, relocations were already applied at this point, so "Relocatable" is not quite exact.)

So, this patch proposes to set `eTypeJIT` explicitly whenever we read from a JIT descriptor. In `ObjectFileELF::CreateSections()` we can then call `GetType()`, which returns the explicit value or otherwise falls back to `CalculateType()`.

LLDB then sets the breakpoint successfully. Next step: debug info.
```
Process 1056 stopped
* thread #1, name = 'lli', stop reason = breakpoint 1.2
    frame #0: 0x00007ffff7ff7000 JIT(0x3ba2030)`jitbp()
JIT(0x3ba2030)`jitbp:
->  0x7ffff7ff7000 <+0>:  pushq  %rbp
    0x7ffff7ff7001 <+1>:  movq   %rsp, %rbp
    0x7ffff7ff7004 <+4>:  movabsq $0x7ffff7ff6000, %rdi     ; imm = 0x7FFFF7FF6000
    0x7ffff7ff700e <+14>: movabsq $0x7ffff6697e80, %rcx     ; imm = 0x7FFFF6697E80
```

[1] It was first introduced with https://reviews.llvm.org/D38142#change-lF6csxV8HdlL, which has also been the original breaking change. The code has changed a lot since then.

[2] ELF object types: https://github.com/llvm/llvm-project/blob/2d2277f5/llvm/include/llvm/BinaryFormat/ELF.h#L110

Reviewers: labath, JDevlieghere, bkoropoff, clayborg, espindola, alexshap, stella.stamenova

Reviewed By: labath, clayborg

Subscribers: probinson, emaste, aprantl, arichardson, MaskRay, AlexDenisov, yurydelendik, lldb-commits

Tags: #lldb

Differential Revision: https://reviews.llvm.org/D61611

llvm-svn: 360354
bader pushed a commit that referenced this pull request Jun 7, 2019
This is the first in a set of patches I have to improve testing of
llvm-objdump. This patch targets --all-headers, --section, and
--full-contents. In the --section case, it deletes a pre-canned binary
which is only used by the one test and replaces it with yaml.

Reviewed by: grimar, MaskRay

Differential Revision: https://reviews.llvm.org/D61941

llvm-svn: 360893
bader pushed a commit that referenced this pull request Oct 5, 2019
Summary:
This reverts commit r372204.

This change causes build bot failures under msan:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/35236/steps/check-llvm%20msan/logs/stdio:

```
FAIL: LLVM :: DebugInfo/AArch64/asan-stack-vars.mir (19531 of 33579)
******************** TEST 'LLVM :: DebugInfo/AArch64/asan-stack-vars.mir' FAILED ********************
Script:
--
: 'RUN: at line 1';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc -O0 -start-before=livedebugvalues -filetype=obj -o - /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/DebugInfo/AArch64/asan-stack-vars.mir | /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llvm-dwarfdump -v - | /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/DebugInfo/AArch64/asan-stack-vars.mir
--
Exit Code: 2

Command Output (stderr):
--
==62894==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0xdfcafb in llvm::AArch64FrameLowering::resolveFrameOffsetReference(llvm::MachineFunction const&, int, bool, unsigned int&, bool, bool) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1658:3
    #1 0xdfae8a in resolveFrameIndexReference /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1580:10
    #2 0xdfae8a in llvm::AArch64FrameLowering::getFrameIndexReference(llvm::MachineFunction const&, int, unsigned int&) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1536
    #3 0x46642c1 in (anonymous namespace)::LiveDebugValues::extractSpillBaseRegAndOffset(llvm::MachineInstr const&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:582:21
    #4 0x4647cb3 in transferSpillOrRestoreInst /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:883:11
    #5 0x4647cb3 in process /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:1079
    #6 0x4647cb3 in (anonymous namespace)::LiveDebugValues::ExtendRanges(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:1361
    #7 0x463ac0e in (anonymous namespace)::LiveDebugValues::runOnMachineFunction(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:1415:18
    #8 0x4854ef0 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:73:13
    #9 0x53b0b01 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1648:27
    #10 0x53b15f6 in llvm::FPPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1685:16
    #11 0x53b298d in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1750:27
    #12 0x53b298d in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1863
    #13 0x905f21 in compileModule(char**, llvm::LLVMContext&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:601:8
    #14 0x8fdc4e in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:355:22
    #15 0x7f67673632e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
    #16 0x882369 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc+0x882369)

MemorySanitizer: use-of-uninitialized-value /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1658:3 in llvm::AArch64FrameLowering::resolveFrameOffsetReference(llvm::MachineFunction const&, int, bool, unsigned int&, bool, bool) const
Exiting
error: -: The file was not recognized as a valid object file
FileCheck error: '-' is empty.
FileCheck command line:  /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/DebugInfo/AArch64/asan-stack-vars.mir
```

Reviewers: bkramer

Reviewed By: bkramer

Subscribers: sdardis, aprantl, kristof.beyls, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67710

llvm-svn: 372228
bader added a commit that referenced this pull request Oct 5, 2019
  CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/Clang.cpp
bader pushed a commit that referenced this pull request Sep 1, 2021
bader pushed a commit that referenced this pull request Sep 1, 2021
  CONFLICT (content): Merge conflict in clang/tools/clang-offload-wrapper/ClangOffloadWrapper.cpp
  CONFLICT (content): Merge conflict in clang/test/Driver/clang-offload-wrapper.c
bader pushed a commit that referenced this pull request Sep 1, 2021
A libfuzzer run has discovered some inputs for which the demangler does
not terminate. When minimized, it looks like this: _Zcv1BIRT_EIS1_E

Deciphered:

_Z
cv    - conversion operator

      * result type
 1B   - "B"
 I    - template args begin
  R   - reference type              <.
   T_ - forward template reference   |  *
 E    - template args end            |  |
                                     |  |
      * parameter type               |  |
 I    - template args begin          |  |
  S1_ - substitution #1              * <'
 E    - template args end

The reason is: template-parameter refs in conversion operator result type
create forward-references, while substitutions are instantly resolved via
back-references. Together these can create a reference loop. It causes an
infinite loop in ReferenceType::collapse().

I see three possible ways to avoid these loops:

1. check if resolving a forward reference creates a loop and reject the
   invalid input (hard to traverse AST at this point)
2. check if a substitution contains a malicious forward reference and
   reject the invalid input (hard to traverse AST at this point;
   substitutions are quite common: may affect performance; hard to
   clearly detect loops at this point)
3. detect loops in ReferenceType::collapse() (cannot reject the input)

This patch implements (3) as seemingly the least-impact change. As a
side effect, such invalid input strings are not rejected and produce
garbage, however there are already similar guards in
`if (Printing) return;` checks.

Fixes https://llvm.org/PR51407

Differential Revision: https://reviews.llvm.org/D107712
bader pushed a commit that referenced this pull request Sep 1, 2021
bader pushed a commit that referenced this pull request Sep 1, 2021
bader pushed a commit that referenced this pull request Sep 1, 2021
Change 636428c enabled BlockingRegion hooks for pthread_once().
Unfortunately this seems to cause crashes on Mac OS X which uses
pthread_once() from locations that seem to result in crashes:

| ThreadSanitizer:DEADLYSIGNAL
| ==31465==ERROR: ThreadSanitizer: stack-overflow on address 0x7ffee73fffd8 (pc 0x00010807fd2a bp 0x7ffee7400050 sp 0x7ffee73fffb0 T93815)
|     #0 __tsan::MetaMap::GetSync(__tsan::ThreadState*, unsigned long, unsigned long, bool, bool) tsan_sync.cpp:195 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x78d2a)
|     #1 __tsan::MutexPreLock(__tsan::ThreadState*, unsigned long, unsigned long, unsigned int) tsan_rtl_mutex.cpp:143 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x6cefc)
|     #2 wrap_pthread_mutex_lock sanitizer_common_interceptors.inc:4240 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x3dae0)
|     #3 flockfile <null>:2 (libsystem_c.dylib:x86_64+0x38a69)
|     #4 puts <null>:2 (libsystem_c.dylib:x86_64+0x3f69b)
|     #5 wrap_puts sanitizer_common_interceptors.inc (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x34d83)
|     #6 __tsan::OnPotentiallyBlockingRegionBegin() cxa_guard_acquire.cpp:8 (foo:x86_64+0x100000e48)
|     #7 wrap_pthread_once tsan_interceptors_posix.cpp:1512 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x2f6e6)

From the stack trace it can be seen that the caller is unknown, and the
resulting stack-overflow seems to indicate that whoever the caller is
does not have enough stack space or otherwise is running in a limited
environment not yet ready for full instrumentation.

Fix it by reverting behaviour on Mac OS X to not call BlockingRegion
hooks from pthread_once().

Reported-by: azharudd

Reviewed By: glider

Differential Revision: https://reviews.llvm.org/D108305
bader pushed a commit that referenced this pull request Jan 24, 2022
Segmentation fault in ompt_tsan_dependences function due to an unchecked NULL pointer dereference is as follows:

```
ThreadSanitizer:DEADLYSIGNAL
	==140865==ERROR: ThreadSanitizer: SEGV on unknown address 0x000000000050 (pc 0x7f217c2d3652 bp 0x7ffe8cfc7e00 sp 0x7ffe8cfc7d90 T140865)
	==140865==The signal is caused by a READ memory access.
	==140865==Hint: address points to the zero page.
	/usr/bin/addr2line: DWARF error: could not find variable specification at offset 1012a
	/usr/bin/addr2line: DWARF error: could not find variable specification at offset 133b5
	/usr/bin/addr2line: DWARF error: could not find variable specification at offset 1371a
	/usr/bin/addr2line: DWARF error: could not find variable specification at offset 13a58
	#0 ompt_tsan_dependences(ompt_data_t*, ompt_dependence_t const*, int) /ptmp/bhararit/llvm-project/openmp/tools/archer/ompt-tsan.cpp:1004 (libarcher.so+0x15652)
	#1 __kmpc_doacross_post /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_csupport.cpp:4280 (libomp.so+0x74d98)
	#2 .omp_outlined. for_ordered_01.c:? (for_ordered_01.exe+0x5186cb)
	#3 __kmp_invoke_microtask /ptmp/bhararit/llvm-project/openmp/runtime/src/z_Linux_asm.S:1166 (libomp.so+0x14e592)
	#4 __kmp_invoke_task_func /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_runtime.cpp:7556 (libomp.so+0x909ad)
	#5 __kmp_fork_call /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_runtime.cpp:2284 (libomp.so+0x8461a)
	#6 __kmpc_fork_call /ptmp/bhararit/llvm-project/openmp/runtime/src/kmp_csupport.cpp:308 (libomp.so+0x6db55)
	#7 main ??:? (for_ordered_01.exe+0x51828f)
	#8 __libc_start_main ??:? (libc.so.6+0x24349)
	#9 _start /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120 (for_ordered_01.exe+0x4214e9)

	ThreadSanitizer can not provide additional info.
	SUMMARY: ThreadSanitizer: SEGV /ptmp/bhararit/llvm-project/openmp/tools/archer/ompt-tsan.cpp:1004 in ompt_tsan_dependences(ompt_data_t*, ompt_dependence_t const*, int)
	==140865==ABORTING
```

	To reproduce the error, use the following openmp code snippet:

```
/* initialise  testMatrixInt Matrix, cols, r and c */
	  #pragma omp parallel private(r,c) shared(testMatrixInt)
	    {
	      #pragma omp for ordered(2)
	      for (r=1; r < rows; r++) {
	        for (c=1; c < cols; c++) {
	          #pragma omp ordered depend(sink:r-1, c+1) depend(sink:r-1,c-1)
	          testMatrixInt[r][c] = (testMatrixInt[r-1][c] + testMatrixInt[r-1][c-1]) % cols ;
	          #pragma omp ordered depend (source)
	        }
	      }
	    }
```

	Compilation:
```
clang -g -stdlib=libc++ -fsanitize=thread -fopenmp -larcher test_case.c
```

	It seems like the changes introduced by the commit https://reviews.llvm.org/D114005 causes this particular SEGV while using Archer.

Reviewed By: protze.joachim

Differential Revision: https://reviews.llvm.org/D115328
bader pushed a commit that referenced this pull request Jan 24, 2022
This reverts commit ea75be3 and
1eb5b6e.

That commit caused crashes with compilation e.g. like this
(not fixed by the follow-up commit):

$ cat sqrt.c
float a;
b() { sqrt(a); }
$ clang -target x86_64-linux-gnu -c -O2 sqrt.c
Attributes 'readnone and writeonly' are incompatible!
  %sqrtf = tail call float @sqrtf(float %0) #1
in function b
fatal error: error in backend: Broken function found, compilation aborted!
bader pushed a commit that referenced this pull request Feb 14, 2022
We experienced some deadlocks when we used multiple threads for logging
using `scan-builds` intercept-build tool when we used multiple threads by
e.g. logging `make -j16`

```
(gdb) bt
#0  0x00007f2bb3aff110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f2bb3af70a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f2bb3d152e4 in ?? ()
#3  0x00007ffcc5f0cc80 in ?? ()
#4  0x00007f2bb3d2bf5b in ?? () from /lib64/ld-linux-x86-64.so.2
#5  0x00007f2bb3b5da27 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007f2bb3b5dbe0 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007f2bb3d144ee in ?? ()
#8  0x746e692f706d742f in ?? ()
#9  0x692d747065637265 in ?? ()
#10 0x2f653631326b3034 in ?? ()
#11 0x646d632e35353532 in ?? ()
#12 0x0000000000000000 in ?? ()
```

I think the gcc's exit call caused the injected `libear.so` to be unloaded
by the `ld`, which in turn called the `void on_unload() __attribute__((destructor))`.
That tried to acquire an already locked mutex which was left locked in the
`bear_report_call()` call, that probably encountered some error and
returned early when it forgot to unlock the mutex.

All of these are speculation since from the backtrace I could not verify
if frames 2 and 3 are in fact corresponding to the `libear.so` module.
But I think it's a fairly safe bet.

So, hereby I'm releasing the held mutex on *all paths*, even if some failure
happens.

PS: I would use lock_guards, but it's C.

Reviewed-by: NoQ

Differential Revision: https://reviews.llvm.org/D118439
bader pushed a commit that referenced this pull request Feb 17, 2022
llvm.insertvalue and llvm.extractvalue need LLVM primitive type
for the indexing operands. While upstreaming the TargetRewrite pass the change
was made from i32 to index without knowing this restriction. This patch reverts
back the types used for indexing in the two ops created in this pass.

the error you will receive when lowering to LLVM IR with the current code
is the following:

```
 'llvm.insertvalue' op operand #1 must be primitive LLVM type, but got 'index'
```

Reviewed By: jeanPerier, schweitz

Differential Revision: https://reviews.llvm.org/D119253
bader pushed a commit that referenced this pull request Feb 17, 2022
There is a clangd crash at `__memcmp_avx2_movbe`. Short problem description is below.

The method `HeaderIncludes::addExistingInclude` stores `Include` objects by reference at 2 places: `ExistingIncludes` (primary storage) and `IncludesByPriority` (pointer to the object's location at ExistingIncludes). `ExistingIncludes` is a map where value is a `SmallVector`. A new element is inserted by `push_back`. The operation might do resize. As result pointers stored at `IncludesByPriority` might become invalid.

Typical stack trace
```
    frame #0: 0x00007f11460dcd94 libc.so.6`__memcmp_avx2_movbe + 308
    frame #1: 0x00000000004782b8 clangd`llvm::StringRef::compareMemory(Lhs="
\"t2.h\"", Rhs="", Length=6) at StringRef.h:76:22
    frame #2: 0x0000000000701253 clangd`llvm::StringRef::compare(this=0x0000
7f10de7d8610, RHS=(Data = "", Length = 7166742329480737377)) const at String
Ref.h:206:34
  * frame #3: 0x00000000007603ab clangd`llvm::operator<(llvm::StringRef, llv
m::StringRef)(LHS=(Data = "\"t2.h\"", Length = 6), RHS=(Data = "", Length =
7166742329480737377)) at StringRef.h:907:23
    frame #4: 0x0000000002d0ad9f clangd`clang::tooling::HeaderIncludes::inse
rt(this=0x00007f10de7fb1a0, IncludeName=(Data = "t2.h\"", Length = 4), IsAng
led=false) const at HeaderIncludes.cpp:365:22
    frame #5: 0x00000000012ebfdd clangd`clang::clangd::IncludeInserter::inse
rt(this=0x00007f10de7fb148, VerbatimHeader=(Data = "\"t2.h\"", Length = 6))
const at Headers.cpp:262:70
```

A unit test test for the crash was created (`HeaderIncludesTest.RepeatedIncludes`). The proposed solution is to use std::list instead of llvm::SmallVector

Test Plan
```
./tools/clang/unittests/Tooling/ToolingTests --gtest_filter=HeaderIncludesTest.RepeatedIncludes
```

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D118755
bader pushed a commit that referenced this pull request Jun 26, 2022
… perf conversion in the client

- Add logging for when the live state of the process is refreshed
- Move error handling of the live state refreshing to Trace from TraceIntelPT. This allows refreshing to fail either at the plug-in level or at the base class level. The error is cached and it can be gotten every time RefreshLiveProcessState is invoked.
- Allow DoRefreshLiveProcessState to handle plugin-specific parameters.
- Add some encapsulation to prevent TraceIntelPT from accessing variables belonging to Trace.

Test done via logging:

```
(lldb) b main
Breakpoint 1: where = a.out`main + 20 at main.cpp:27:20, address = 0x00000000004023d9
(lldb) r
Process 2359706 launched: '/home/wallace/a.out' (x86_64)
Process 2359706 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x00000000004023d9 a.out`main at main.cpp:27:20
   24   };
   25
   26   int main() {
-> 27     std::vector<int> vvv;
   28     for (int i = 0; i < 100000; i++)
   29       vvv.push_back(i);
   30
(lldb) process trace start                                                                                        (lldb) log enable lldb target -F(lldb) n
Process 2359706 stopped
* thread #1, name = 'a.out', stop reason = step over
    frame #0: 0x00000000004023e8 a.out`main at main.cpp:28:12
   25
   26   int main() {
   27     std::vector<int> vvv;
-> 28     for (int i = 0; i < 100000; i++)
   29       vvv.push_back(i);
   30
   31     std::deque<int> dq1 = {1, 2, 3};
(lldb) thread trace dump instructions -c 2 -t                                                                     Trace.cpp:RefreshLiveProcessState                            Trace::RefreshLiveProcessState invoked
TraceIntelPT.cpp:DoRefreshLiveProcessState                   TraceIntelPT found tsc conversion information
thread #1: tid = 2359706
  a.out`std::vector<int, std::allocator<int>>::vector() + 26 at stl_vector.h:395:19
    54: [tsc=unavailable] 0x0000000000403a7c    retq
```

See the logging lines at the end of the dump. They indicate that refreshing happened and that perf conversion information was found.

Differential Revision: https://reviews.llvm.org/D125943
bader pushed a commit that referenced this pull request Jun 26, 2022
…X86 following the psABI"""

This reverts commit e1c5afa.

This introduces crashes in the JAX backend on CPU. A reproducer in LLVM is
below. Let me know if you have trouble reproducing this.

; ModuleID = '__compute_module'
source_filename = "__compute_module"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

@0 = private unnamed_addr constant [4 x i8] c"\00\00\00?"
@1 = private unnamed_addr constant [4 x i8] c"\1C}\908"
@2 = private unnamed_addr constant [4 x i8] c"?\00\\4"
@3 = private unnamed_addr constant [4 x i8] c"%ci1"
@4 = private unnamed_addr constant [4 x i8] zeroinitializer
@5 = private unnamed_addr constant [4 x i8] c"\00\00\00\C0"
@6 = private unnamed_addr constant [4 x i8] c"\00\00\00B"
@7 = private unnamed_addr constant [4 x i8] c"\94\B4\C22"
@8 = private unnamed_addr constant [4 x i8] c"^\09B6"
@9 = private unnamed_addr constant [4 x i8] c"\15\F3M?"
@10 = private unnamed_addr constant [4 x i8] c"e\CC\\;"
@11 = private unnamed_addr constant [4 x i8] c"d\BD/>"
@12 = private unnamed_addr constant [4 x i8] c"V\F4I="
@13 = private unnamed_addr constant [4 x i8] c"\10\CB,<"
@14 = private unnamed_addr constant [4 x i8] c"\AC\E3\D6:"
@15 = private unnamed_addr constant [4 x i8] c"\DC\A8E9"
@16 = private unnamed_addr constant [4 x i8] c"\C6\FA\897"
@17 = private unnamed_addr constant [4 x i8] c"%\F9\955"
@18 = private unnamed_addr constant [4 x i8] c"\B5\DB\813"
@19 = private unnamed_addr constant [4 x i8] c"\B4W_\B2"
@20 = private unnamed_addr constant [4 x i8] c"\1Cc\8F\B4"
@21 = private unnamed_addr constant [4 x i8] c"~3\94\B6"
@22 = private unnamed_addr constant [4 x i8] c"3Yq\B8"
@23 = private unnamed_addr constant [4 x i8] c"\E9\17\17\BA"
@24 = private unnamed_addr constant [4 x i8] c"\F1\B2\8D\BB"
@25 = private unnamed_addr constant [4 x i8] c"\F8t\C2\BC"
@26 = private unnamed_addr constant [4 x i8] c"\82[\C2\BD"
@27 = private unnamed_addr constant [4 x i8] c"uB-?"
@28 = private unnamed_addr constant [4 x i8] c"^\FF\9B\BE"
@29 = private unnamed_addr constant [4 x i8] c"\00\00\00A"

; Function Attrs: uwtable
define void @main.158(ptr %retval, ptr noalias %run_options, ptr noalias %params, ptr noalias %buffer_table, ptr noalias %status, ptr noalias %prof_counters) #0 {
entry:
  %fusion.invar_address.dim.1 = alloca i64, align 8
  %fusion.invar_address.dim.0 = alloca i64, align 8
  %0 = getelementptr inbounds ptr, ptr %buffer_table, i64 1
  %Arg_0.1 = load ptr, ptr %0, align 8, !invariant.load !0, !dereferenceable !1, !align !2
  %1 = getelementptr inbounds ptr, ptr %buffer_table, i64 0
  %fusion = load ptr, ptr %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2
  store i64 0, ptr %fusion.invar_address.dim.0, align 8
  br label %fusion.loop_header.dim.0

return:                                           ; preds = %fusion.loop_exit.dim.0
  ret void

fusion.loop_header.dim.0:                         ; preds = %fusion.loop_exit.dim.1, %entry
  %fusion.indvar.dim.0 = load i64, ptr %fusion.invar_address.dim.0, align 8
  %2 = icmp uge i64 %fusion.indvar.dim.0, 3
  br i1 %2, label %fusion.loop_exit.dim.0, label %fusion.loop_body.dim.0

fusion.loop_body.dim.0:                           ; preds = %fusion.loop_header.dim.0
  store i64 0, ptr %fusion.invar_address.dim.1, align 8
  br label %fusion.loop_header.dim.1

fusion.loop_header.dim.1:                         ; preds = %fusion.loop_body.dim.1, %fusion.loop_body.dim.0
  %fusion.indvar.dim.1 = load i64, ptr %fusion.invar_address.dim.1, align 8
  %3 = icmp uge i64 %fusion.indvar.dim.1, 1
  br i1 %3, label %fusion.loop_exit.dim.1, label %fusion.loop_body.dim.1

fusion.loop_body.dim.1:                           ; preds = %fusion.loop_header.dim.1
  %4 = getelementptr inbounds [3 x [1 x half]], ptr %Arg_0.1, i64 0, i64 %fusion.indvar.dim.0, i64 0
  %5 = load half, ptr %4, align 2, !invariant.load !0, !noalias !3
  %6 = fpext half %5 to float
  %7 = call float @llvm.fabs.f32(float %6)
  %constant.121 = load float, ptr @29, align 4
  %compare.2 = fcmp ole float %7, %constant.121
  %8 = zext i1 %compare.2 to i8
  %constant.120 = load float, ptr @0, align 4
  %multiply.95 = fmul float %7, %constant.120
  %constant.119 = load float, ptr @5, align 4
  %add.82 = fadd float %multiply.95, %constant.119
  %constant.118 = load float, ptr @4, align 4
  %multiply.94 = fmul float %add.82, %constant.118
  %constant.117 = load float, ptr @19, align 4
  %add.81 = fadd float %multiply.94, %constant.117
  %multiply.92 = fmul float %add.82, %add.81
  %constant.116 = load float, ptr @18, align 4
  %add.79 = fadd float %multiply.92, %constant.116
  %multiply.91 = fmul float %add.82, %add.79
  %subtract.87 = fsub float %multiply.91, %add.81
  %constant.115 = load float, ptr @20, align 4
  %add.78 = fadd float %subtract.87, %constant.115
  %multiply.89 = fmul float %add.82, %add.78
  %subtract.86 = fsub float %multiply.89, %add.79
  %constant.114 = load float, ptr @17, align 4
  %add.76 = fadd float %subtract.86, %constant.114
  %multiply.88 = fmul float %add.82, %add.76
  %subtract.84 = fsub float %multiply.88, %add.78
  %constant.113 = load float, ptr @21, align 4
  %add.75 = fadd float %subtract.84, %constant.113
  %multiply.86 = fmul float %add.82, %add.75
  %subtract.83 = fsub float %multiply.86, %add.76
  %constant.112 = load float, ptr @16, align 4
  %add.73 = fadd float %subtract.83, %constant.112
  %multiply.85 = fmul float %add.82, %add.73
  %subtract.81 = fsub float %multiply.85, %add.75
  %constant.111 = load float, ptr @22, align 4
  %add.72 = fadd float %subtract.81, %constant.111
  %multiply.83 = fmul float %add.82, %add.72
  %subtract.80 = fsub float %multiply.83, %add.73
  %constant.110 = load float, ptr @15, align 4
  %add.70 = fadd float %subtract.80, %constant.110
  %multiply.82 = fmul float %add.82, %add.70
  %subtract.78 = fsub float %multiply.82, %add.72
  %constant.109 = load float, ptr @23, align 4
  %add.69 = fadd float %subtract.78, %constant.109
  %multiply.80 = fmul float %add.82, %add.69
  %subtract.77 = fsub float %multiply.80, %add.70
  %constant.108 = load float, ptr @14, align 4
  %add.68 = fadd float %subtract.77, %constant.108
  %multiply.79 = fmul float %add.82, %add.68
  %subtract.75 = fsub float %multiply.79, %add.69
  %constant.107 = load float, ptr @24, align 4
  %add.67 = fadd float %subtract.75, %constant.107
  %multiply.77 = fmul float %add.82, %add.67
  %subtract.74 = fsub float %multiply.77, %add.68
  %constant.106 = load float, ptr @13, align 4
  %add.66 = fadd float %subtract.74, %constant.106
  %multiply.76 = fmul float %add.82, %add.66
  %subtract.72 = fsub float %multiply.76, %add.67
  %constant.105 = load float, ptr @25, align 4
  %add.65 = fadd float %subtract.72, %constant.105
  %multiply.74 = fmul float %add.82, %add.65
  %subtract.71 = fsub float %multiply.74, %add.66
  %constant.104 = load float, ptr @12, align 4
  %add.64 = fadd float %subtract.71, %constant.104
  %multiply.73 = fmul float %add.82, %add.64
  %subtract.69 = fsub float %multiply.73, %add.65
  %constant.103 = load float, ptr @26, align 4
  %add.63 = fadd float %subtract.69, %constant.103
  %multiply.71 = fmul float %add.82, %add.63
  %subtract.67 = fsub float %multiply.71, %add.64
  %constant.102 = load float, ptr @11, align 4
  %add.62 = fadd float %subtract.67, %constant.102
  %multiply.70 = fmul float %add.82, %add.62
  %subtract.66 = fsub float %multiply.70, %add.63
  %constant.101 = load float, ptr @28, align 4
  %add.61 = fadd float %subtract.66, %constant.101
  %multiply.68 = fmul float %add.82, %add.61
  %subtract.65 = fsub float %multiply.68, %add.62
  %constant.100 = load float, ptr @27, align 4
  %add.60 = fadd float %subtract.65, %constant.100
  %subtract.64 = fsub float %add.60, %add.62
  %multiply.66 = fmul float %subtract.64, %constant.120
  %constant.99 = load float, ptr @6, align 4
  %divide.4 = fdiv float %constant.99, %7
  %add.59 = fadd float %divide.4, %constant.119
  %multiply.65 = fmul float %add.59, %constant.118
  %constant.98 = load float, ptr @3, align 4
  %add.58 = fadd float %multiply.65, %constant.98
  %multiply.64 = fmul float %add.59, %add.58
  %constant.97 = load float, ptr @7, align 4
  %add.57 = fadd float %multiply.64, %constant.97
  %multiply.63 = fmul float %add.59, %add.57
  %subtract.63 = fsub float %multiply.63, %add.58
  %constant.96 = load float, ptr @2, align 4
  %add.56 = fadd float %subtract.63, %constant.96
  %multiply.62 = fmul float %add.59, %add.56
  %subtract.62 = fsub float %multiply.62, %add.57
  %constant.95 = load float, ptr @8, align 4
  %add.55 = fadd float %subtract.62, %constant.95
  %multiply.61 = fmul float %add.59, %add.55
  %subtract.61 = fsub float %multiply.61, %add.56
  %constant.94 = load float, ptr @1, align 4
  %add.54 = fadd float %subtract.61, %constant.94
  %multiply.60 = fmul float %add.59, %add.54
  %subtract.60 = fsub float %multiply.60, %add.55
  %constant.93 = load float, ptr @10, align 4
  %add.53 = fadd float %subtract.60, %constant.93
  %multiply.59 = fmul float %add.59, %add.53
  %subtract.59 = fsub float %multiply.59, %add.54
  %constant.92 = load float, ptr @9, align 4
  %add.52 = fadd float %subtract.59, %constant.92
  %subtract.58 = fsub float %add.52, %add.54
  %multiply.58 = fmul float %subtract.58, %constant.120
  %9 = call float @llvm.sqrt.f32(float %7)
  %10 = fdiv float 1.000000e+00, %9
  %multiply.57 = fmul float %multiply.58, %10
  %11 = trunc i8 %8 to i1
  %12 = select i1 %11, float %multiply.66, float %multiply.57
  %13 = fptrunc float %12 to half
  %14 = getelementptr inbounds [3 x [1 x half]], ptr %fusion, i64 0, i64 %fusion.indvar.dim.0, i64 0
  store half %13, ptr %14, align 2, !alias.scope !3
  %invar.inc1 = add nuw nsw i64 %fusion.indvar.dim.1, 1
  store i64 %invar.inc1, ptr %fusion.invar_address.dim.1, align 8
  br label %fusion.loop_header.dim.1

fusion.loop_exit.dim.1:                           ; preds = %fusion.loop_header.dim.1
  %invar.inc = add nuw nsw i64 %fusion.indvar.dim.0, 1
  store i64 %invar.inc, ptr %fusion.invar_address.dim.0, align 8
  br label %fusion.loop_header.dim.0

fusion.loop_exit.dim.0:                           ; preds = %fusion.loop_header.dim.0
  br label %return
}

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare float @llvm.fabs.f32(float %0) #1

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare float @llvm.sqrt.f32(float %0) #1

attributes #0 = { uwtable "denormal-fp-math"="preserve-sign" "no-frame-pointer-elim"="false" }
attributes #1 = { nocallback nofree nosync nounwind readnone speculatable willreturn }

!0 = !{}
!1 = !{i64 6}
!2 = !{i64 8}
!3 = !{!4}
!4 = !{!"buffer: {index:0, offset:0, size:6}", !5}
!5 = !{!"XLA global AA domain"}
bader pushed a commit that referenced this pull request Jun 26, 2022
…h decoding

- Add the logic that parses all cpu context switch traces and produces blocks of continuous executions, which will be later used to assign intel pt subtraces to threads and to identify gaps. This logic can also identify if the context switch trace is malformed.
- The continuous executions blocks are able to indicate when there were some contention issues when producing the context switch trace. See the inline comments for more information.
- Update the 'dump info' command to show information and stats related to the multicore decoding flow, including timing about context switch decoding.
- Add the logic to conver nanoseconds to TSCs.
- Fix a bug when returning the context switches. Now they data returned makes sense and even empty traces can be returned from lldb-server.
- Finish the necessary bits for loading and saving a multi-core trace bundle from disk.
- Change some size_t to uint64_t for compatibility with 32 bit systems.

Tested by saving a trace session of a program that sleeps 100 times, it was able to produce the following 'dump info' text:

```
(lldb) trace load /tmp/trace3/trace.json                                                                   (lldb) thread trace dump info                                                                              Trace technology: intel-pt

thread #1: tid = 4192415
  Total number of instructions: 1

  Memory usage:
    Total approximate memory usage (excluding raw trace): 2.51 KiB
    Average memory usage per instruction (excluding raw trace): 2573.00 bytes

  Timing for this thread:

  Timing for global tasks:
    Context switch trace decoding: 0.00s

  Events:
    Number of instructions with events: 0
    Number of individual events: 0

  Multi-core decoding:
    Total number of continuous executions found: 2499
    Number of continuous executions for this thread: 102

  Errors:
    Number of TSC decoding errors: 0
```

Differential Revision: https://reviews.llvm.org/D126267
bader pushed a commit that referenced this pull request Jul 11, 2022
This fixes the combining of constant vector GEP operands in the
optimization of MVE gather/scatter addresses, when opaque pointers are
enabled. As opaque pointers reduce the number of bitcasts between geps,
more can be folded than before. This can cause problems if the index
types are now different between the two geps.

This fixes that by making sure each constant is scaled appropriately,
which has the effect of transforming the geps to have a scale of 1,
changing [r0, q0, uxtw #1] gathers to [r0, q0] with a larger q0. This
helps use a simpler instruction that doesn't need the extra uxtw.

Differential Revision: https://reviews.llvm.org/D127733
bader pushed a commit that referenced this pull request Sep 12, 2022
This diff uncovers an ASAN leak in getOrCreateJumpTable:
```
Indirect leak of 264 byte(s) in 1 object(s) allocated from:
    #1 0x4f6e48c in llvm::bolt::BinaryContext::getOrCreateJumpTable ...
```
The removal of an assertion needs to be accompanied by proper deallocation of
a `JumpTable` object for which `analyzeJumpTable` was unsuccessful.

This reverts commit 52cd00c.
bader pushed a commit that referenced this pull request May 19, 2023
…callback

The `TypeSystemMap::m_mutex` guards against concurrent modifications
of members of `TypeSystemMap`. In particular, `m_map`.

`TypeSystemMap::ForEach` iterates through the entire `m_map` calling
a user-specified callback for each entry. This is all done while
`m_mutex` is locked. However, there's nothing that guarantees that
the callback itself won't call back into `TypeSystemMap` APIs on the
same thread. This lead to double-locking `m_mutex`, which is undefined
behaviour. We've seen this cause a deadlock in the swift plugin with
following backtrace:

```

int main() {
    std::unique_ptr<int> up = std::make_unique<int>(5);

    volatile int val = *up;
    return val;
}

clang++ -std=c++2a -g -O1 main.cpp

./bin/lldb -o “br se -p return” -o run -o “v *up” -o “expr *up” -b
```

```
frame #4: std::lock_guard<std::mutex>::lock_guard
frame #5: lldb_private::TypeSystemMap::GetTypeSystemForLanguage <<<< Lock #2
frame #6: lldb_private::TypeSystemMap::GetTypeSystemForLanguage
frame #7: lldb_private::Target::GetScratchTypeSystemForLanguage
...
frame intel#26: lldb_private::SwiftASTContext::LoadLibraryUsingPaths
frame intel#27: lldb_private::SwiftASTContext::LoadModule
frame intel#30: swift::ModuleDecl::collectLinkLibraries
frame intel#31: lldb_private::SwiftASTContext::LoadModule
frame intel#34: lldb_private::SwiftASTContext::GetCompileUnitImportsImpl
frame intel#35: lldb_private::SwiftASTContext::PerformCompileUnitImports
frame intel#36: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetSwiftASTContext
frame intel#37: lldb_private::TypeSystemSwiftTypeRefForExpressions::GetPersistentExpressionState
frame intel#38: lldb_private::Target::GetPersistentSymbol
frame intel#41: lldb_private::TypeSystemMap::ForEach                 <<<< Lock #1
frame intel#42: lldb_private::Target::GetPersistentSymbol
frame intel#43: lldb_private::IRExecutionUnit::FindInUserDefinedSymbols
frame intel#44: lldb_private::IRExecutionUnit::FindSymbol
frame intel#45: lldb_private::IRExecutionUnit::MemoryManager::GetSymbolAddressAndPresence
frame intel#46: lldb_private::IRExecutionUnit::MemoryManager::findSymbol
frame intel#47: non-virtual thunk to lldb_private::IRExecutionUnit::MemoryManager::findSymbol
frame intel#48: llvm::LinkingSymbolResolver::findSymbol
frame intel#49: llvm::LegacyJITSymbolResolver::lookup
frame intel#50: llvm::RuntimeDyldImpl::resolveExternalSymbols
frame intel#51: llvm::RuntimeDyldImpl::resolveRelocations
frame intel#52: llvm::MCJIT::finalizeLoadedModules
frame intel#53: llvm::MCJIT::finalizeObject
frame intel#54: lldb_private::IRExecutionUnit::ReportAllocations
frame intel#55: lldb_private::IRExecutionUnit::GetRunnableInfo
frame intel#56: lldb_private::ClangExpressionParser::PrepareForExecution
frame intel#57: lldb_private::ClangUserExpression::TryParse
frame intel#58: lldb_private::ClangUserExpression::Parse
```

Our solution is to simply iterate over a local copy of `m_map`.

**Testing**

* Confirmed on manual reproducer (would reproduce 100% of the time
  before the patch)

Differential Revision: https://reviews.llvm.org/D149949
bader pushed a commit that referenced this pull request Jun 1, 2023
…est unittest

Need to finalize the DIBuilder to avoid leak sanitizer errors
like this:

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x55c99ea1761d in operator new(unsigned long)
    #1 0x55c9a518ae49 in operator new
    #2 0x55c9a518ae49 in llvm::MDTuple::getImpl(...)
    #3 0x55c9a4f1b1ec in getTemporary
    #4 0x55c9a4f1b1ec in llvm::DIBuilder::createFunction(...)
bader pushed a commit that referenced this pull request Jun 12, 2023
The motivation for this change is a workload generated by the XLA compiler
targeting nvidia GPUs.

This kernel has a few hundred i8 loads and stores.  Merging is critical for
performance.

The current LSV doesn't merge these well because it only considers instructions
within a block of 64 loads+stores.  This limit is necessary to contain the
O(n^2) behavior of the pass.  I'm hesitant to increase the limit, because this
pass is already one of the slowest parts of compiling an XLA program.

So we rewrite basically the whole thing to use a new algorithm.  Before, we
compared every load/store to every other to see if they're consecutive.  The
insight (from tra@) is that this is redundant.  If we know the offset from PtrA
to PtrB, then we don't need to compare PtrC to both of them in order to tell
whether C may be adjacent to A or B.

So that's what we do.  When scanning a basic block, we maintain a list of
chains, where we know the offset from every element in the chain to the first
element in the chain.  Each instruction gets compared only to the leaders of
all the chains.

In the worst case, this is still O(n^2), because all chains might be of length
1.  To prevent compile time blowup, we only consider the 64 most recently used
chains.  Thus we do no more comparisons than before, but we have the potential
to make much longer chains.

This rewrite affects many tests.  The changes to tests fall into two
categories.

1. The old code had what appears to be a bug when deciding whether a misaligned
   vectorized load is fast.  Suppose TTI reports that load <i32 x 4> align 4
   has relative speed 1, and suppose that load i32 align 4 has relative speed
   32.

   The intent of the code seems to be that we prefer the scalar load, because
   it's faster.  But the old code would choose the vectorized load.
   accessIsMisaligned would set RelativeSpeed to 0 for the scalar load (and not
   even call into TTI to get the relative speed), because the scalar load is
   aligned.

   After this patch, we will prefer the scalar load if it's faster.

2. This patch changes the logic for how we vectorize.  Usually this results in
   vectorizing more.

Explanation of changes to tests:

 - AMDGPU/adjust-alloca-alignment.ll: #1
 - AMDGPU/flat_atomic.ll: #2, we vectorize more.
 - AMDGPU/int_sideeffect.ll: #2, there are two possible locations for the call to @foo, and the pass is brittle to this.  Before, we'd vectorize in case 1 and not case 2.  Now we vectorize in case 2 and not case 1.  So we just move the call.
 - AMDGPU/adjust-alloca-alignment.ll: #2, we vectorize more
 - AMDGPU/insertion-point.ll: #2 we vectorize more
 - AMDGPU/merge-stores-private.ll: #1 (undoes changes from git rev 86f9117, which appear to have hit the bug from #1)
 - AMDGPU/multiple_tails.ll: #1
 - AMDGPU/vect-ptr-ptr-size-mismatch.ll: Fix alignment (I think related to #1 above).
 - AMDGPU CodeGen: I have difficulty commenting on these changes, but many of them look like #2, we vectorize more.
 - NVPTX/4x2xhalf.ll: Fix alignment (I think related to #1 above).
 - NVPTX/vectorize_i8.ll: We don't generate <3 x i8> vectors on NVPTX because they're not legal (and eventually get split)
 - X86/correct-order.ll: #2, we vectorize more, probably because of changes to the chain-splitting logic.
 - X86/subchain-interleaved.ll: #2, we vectorize more
 - X86/vector-scalar.ll: #2, we can now vectorize scalar float + <1 x float>
 - X86/vectorize-i8-nested-add-inseltpoison.ll: Deleted the nuw test because it was nonsensical.  It was doing `add nuw %v0, -1`, but this is equivalent to `add nuw %v0, 0xffff'ffff`, which is equivalent to asserting that %v0 == 0.
 - X86/vectorize-i8-nested-add.ll: Same as nested-add-inseltpoison.ll

Differential Revision: https://reviews.llvm.org/D149893
bader pushed a commit that referenced this pull request Dec 21, 2023
… on (#74207)

lld string tail merging interacts badly with ASAN on Windows, as is
reported in llvm/llvm-project#62078.
A similar error was found when building LLVM with
`-DLLVM_USE_SANITIZER=Address`:
```console
[2/2] Building GenVT.inc...
FAILED: include/llvm/CodeGen/GenVT.inc C:/Dev/llvm-project/Build_asan/include/llvm/CodeGen/GenVT.inc
cmd.exe /C "cd /D C:\Dev\llvm-project\Build_asan && C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe -gen-vt -I C:/Dev/llvm-project/llvm/include/llvm/CodeGen -IC:/Dev/llvm-project/Build_asan/include -IC:/Dev/llvm-project/llvm/include C:/Dev/llvm-project/llvm/include/llvm/CodeGen/ValueTypes.td --write-if-changed -o include/llvm/CodeGen/GenVT.inc -d include/llvm/CodeGen/GenVT.inc.d"       
=================================================================
==31944==ERROR: AddressSanitizer: global-buffer-overflow on address 0x7ff6cff80d20 at pc 0x7ff6cfcc7378 bp 0x00e8bcb8e990 sp 0x00e8bcb8e9d8
READ of size 1 at 0x7ff6cff80d20 thread T0
    #0 0x7ff6cfcc7377 in strlen (C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe+0x1400a7377)
    #1 0x7ff6cfde50c2 in operator delete(void *, unsigned __int64) (C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe+0x1401c50c2)
    #2 0x7ff6cfdd75ef in operator delete(void *, unsigned __int64) (C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe+0x1401b75ef)
    #3 0x7ff6cfde59f9 in operator delete(void *, unsigned __int64) (C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe+0x1401c59f9)
    #4 0x7ff6cff03f6c in operator delete(void *, unsigned __int64) (C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe+0x1402e3f6c)
    #5 0x7ff6cfefbcbc in operator delete(void *, unsigned __int64) (C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe+0x1402dbcbc)
    #6 0x7ffb7f247343  (C:\WINDOWS\System32\KERNEL32.DLL+0x180017343)
    #7 0x7ffb800826b0  (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800526b0)

0x7ff6cff80d20 is located 31 bytes after global variable '"#error \"ArgKind is not defined\"\n"...' defined in 'C:\Dev\llvm-project\llvm\utils\TableGen\IntrinsicEmitter.cpp' (0x7ff6cff80ce0) of size 33
  '"#error \"ArgKind is not defined\"\n"...' is ascii string '#error "ArgKind is not defined"
'
0x7ff6cff80d20 is located 0 bytes inside of global variable '""' defined in 'C:\Dev\llvm-project\llvm\utils\TableGen\IntrinsicEmitter.cpp' (0x7ff6cff80d20) of size 1
  '""' is ascii string ''
SUMMARY: AddressSanitizer: global-buffer-overflow (C:\Dev\llvm-project\Build_asan\bin\llvm-min-tblgen.exe+0x1400a7377) in strlen
Shadow bytes around the buggy address:
  0x7ff6cff80a80: 01 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 01 f9 f9 f9
  0x7ff6cff80b00: f9 f9 f9 f9 00 00 00 00 00 00 00 00 01 f9 f9 f9
  0x7ff6cff80b80: f9 f9 f9 f9 00 00 00 00 01 f9 f9 f9 f9 f9 f9 f9
  0x7ff6cff80c00: 00 00 00 00 01 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
  0x7ff6cff80c80: 00 00 00 00 01 f9 f9 f9 f9 f9 f9 f9 00 00 00 00
=>0x7ff6cff80d00: 01 f9 f9 f9[f9]f9 f9 f9 00 00 00 00 00 00 00 00
  0x7ff6cff80d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ff6cff80e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ff6cff80e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ff6cff80f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ff6cff80f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==31944==ABORTING
```
This is reproducible with the 17.0.3 release:
```console
$ clang-cl --version
clang version 17.0.3
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files\LLVM\bin
$ cmake -S llvm -B Build -G Ninja -DLLVM_USE_SANITIZER=Address -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DCMAKE_MSVC_RUNTIME_LIBRARY=MultiThreaded -DCMAKE_BUILD_TYPE=Release
$ cd Build
$ ninja all
```
bader pushed a commit that referenced this pull request May 29, 2024
…e exception specification of a function (#90760)

[temp.deduct.general] p6 states:
> At certain points in the template argument deduction process it is
necessary to take a function type that makes use of template parameters
and replace those template parameters with the corresponding template
arguments.
This is done at the beginning of template argument deduction when any
explicitly specified template arguments are substituted into the
function type, and again at the end of template argument deduction when
any template arguments that were deduced or obtained from default
arguments are substituted.

[temp.deduct.general] p7 goes on to say:
> The _deduction substitution loci_ are
> - the function type outside of the _noexcept-specifier_,
> - the explicit-specifier,
> - the template parameter declarations, and
> - the template argument list of a partial specialization
>
> The substitution occurs in all types and expressions that are used in
the deduction substitution loci. [...]

Consider the following:
```cpp
struct A
{
    static constexpr bool x = true;
};

template<typename T, typename U>
void f(T, U) noexcept(T::x); // #1

template<typename T, typename U>
void f(T, U*) noexcept(T::y); // #2

template<>
void f<A>(A, int*) noexcept; // clang currently accepts, GCC and EDG reject
```

Currently, `Sema::SubstituteExplicitTemplateArguments` will substitute
into the _noexcept-specifier_ when deducing template arguments from a
function declaration or when deducing template arguments for taking the
address of a function template (and the substitution is treated as a
SFINAE context). In the above example, `#1` is selected as the primary
template because substitution of the explicit template arguments into
the _noexcept-specifier_ of `#2` failed, which resulted in the candidate
being ignored.

This behavior is incorrect ([temp.deduct.general] note 4 says as much), and
this patch corrects it by deferring all substitution into the
_noexcept-specifier_ until it is instantiated.

As part of the necessary changes to make this patch work, the
instantiation of the exception specification of a function template
specialization when taking the address of a function template is changed
to only occur for the function selected by overload resolution per
[except.spec] p13.1 (as opposed to being instantiated for every candidate).
bader pushed a commit that referenced this pull request May 29, 2024
…ined member functions & member function templates (#88963)

Consider the following snippet from the discussion of CWG2847 on the core reflector:
```
template<typename T>
concept C = sizeof(T) <= sizeof(long);

template<typename T>
struct A 
{
    template<typename U>
    void f(U) requires C<U>; // #1, declares a function template 

    void g() requires C<T>; // #2, declares a function

    template<>
    void f(char);  // #3, an explicit specialization of a function template that declares a function
};

template<>
template<typename U>
void A<short>::f(U) requires C<U>; // #4, an explicit specialization of a function template that declares a function template

template<>
template<>
void A<int>::f(int); // #5, an explicit specialization of a function template that declares a function

template<>
void A<long>::g(); // #6, an explicit specialization of a function that declares a function
```

A number of problems exist:
- Clang rejects `#4` because the trailing _requires-clause_ has `U`
substituted with the wrong template parameter depth when
`Sema::AreConstraintExpressionsEqual` is called to determine whether it
matches the trailing _requires-clause_ of the implicitly instantiated
function template.
- Clang rejects `#5` because the function template specialization
instantiated from `A<int>::f` has a trailing _requires-clause_, but `#5`
does not (nor can it have one as it isn't a templated function).
- Clang rejects `#6` for the same reasons it rejects `#5`.

This patch resolves these issues by making the following changes:
- To fix `#4`, `Sema::AreConstraintExpressionsEqual` is passed
`FunctionTemplateDecl`s when comparing the trailing _requires-clauses_
of `#4` and the function template instantiated from `#1`.
- To fix `#5` and `#6`, the trailing _requires-clauses_ are not compared
for explicit specializations that declare functions.

In addition to these changes, `CheckMemberSpecialization` now considers
constraint satisfaction/constraint partial ordering when determining
which member function is specialized by an explicit specialization of a
member function for an implicit instantiation of a class template (we
previously would select the first function that has the same type as the
explicit specialization). With constraints taken under consideration, we
match EDG's behavior for these declarations.
bader pushed a commit that referenced this pull request May 29, 2024
...which caused issues like

> ==42==ERROR: AddressSanitizer failed to deallocate 0x32 (50) bytes at
address 0x117e0000 (error code: 28)
> ==42==Cannot dump memory map on emscriptenAddressSanitizer: CHECK
failed: sanitizer_common.cpp:81 "((0 && "unable to unmmap")) != (0)"
(0x0, 0x0) (tid=288045824)
> #0 0x14f73b0c in __asan::CheckUnwind()+0x14f73b0c
(this.program+0x14f73b0c)
> #1 0x14f8a3c2 in __sanitizer::CheckFailed(char const*, int, char
const*, unsigned long long, unsigned long long)+0x14f8a3c2
(this.program+0x14f8a3c2)
> #2 0x14f7d6e1 in __sanitizer::ReportMunmapFailureAndDie(void*,
unsigned long, int, bool)+0x14f7d6e1 (this.program+0x14f7d6e1)
> #3 0x14f81fbd in __sanitizer::UnmapOrDie(void*, unsigned
long)+0x14f81fbd (this.program+0x14f81fbd)
> #4 0x14f875df in __sanitizer::SuppressionContext::ParseFromFile(char
const*)+0x14f875df (this.program+0x14f875df)
> #5 0x14f74eab in __asan::InitializeSuppressions()+0x14f74eab
(this.program+0x14f74eab)
> #6 0x14f73a1a in __asan::AsanInitInternal()+0x14f73a1a
(this.program+0x14f73a1a)

when trying to use an ASan suppressions file under Emscripten: Even
though it would be considered OK by SUSv4, the Emscripten runtime states
"We don't support partial munmapping" (see

<emscripten-core/emscripten@f4115eb>
"Implement MAP_ANONYMOUS on top of malloc in STANDALONE_WASM mode
(intel#16289)").

Co-authored-by: Stephan Bergmann <stephan.bergmann@allotropia.de>
bader pushed a commit that referenced this pull request May 29, 2024
…ication as used during partial ordering (#91534)

We do not deduce template arguments from the exception specification
when determining the primary template of a function template
specialization or when taking the address of a function template.
Therefore, this patch changes `isAtLeastAsSpecializedAs` such that we do
not mark template parameters in the exception specification as 'used'
during partial ordering (per [temp.deduct.partial]
p12) to prevent the following from being ambiguous:

```
template<typename T, typename U>
void f(U) noexcept(noexcept(T())); // #1

template<typename T>
void f(T*) noexcept; // #2

template<>
void f<int>(int*) noexcept; // currently ambiguous, selects #2 with this patch applied 
```

Although there is no corresponding wording in the standard (see core issue filed here
cplusplus/CWG#537), this seems
to be the intended behavior given the definition of _deduction
substitution loci_ in [temp.deduct.general] p7 (and EDG does the same thing).
bader pushed a commit that referenced this pull request Jul 11, 2024
Without it, ASAN complains as follows :
==1451==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x602000016310 in thread T0:
  object passed to delete has wrong type:
  size of the allocated type:   16 bytes;
  size of the deallocated type: 1 bytes.
    #0 0x7fac9f23224f in operator delete(void*, unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:172
    #1 0x7fac84f60895 in igcc::driver::DriverImpl::compile() (/opt/workspace/build/lib/libigc.so.1+0xe9d895)
    #2 0x7fac8501a20c in IGC::IgcOclTranslationCtx<0ul>::Impl::Translate(unsigned long, CIF::Builtins::Buffer<1ul>*, CIF:
:Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Buil
tins::Buffer<1ul>*, unsigned int, void*) const (/opt/workspace/build/lib/libigc.so.1+0xf5720c)
    #3 0x7fac8501c8ce in IGC::IgcOclTranslationCtx<1ul>::TranslateImpl(unsigned long, CIF::Builtins::Buffer<1ul>*, CIF::B
uiltins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, unsigned int) [clone .localalias] (/opt/
workspace/build/lib/libigc.so.1+0xf598ce)
    #4 0x7fac9f01f317  (/lib/x86_64-linux-gnu/libocloc.so+0xa3317)
    #5 0x7fac9f01f844  (/lib/x86_64-linux-gnu/libocloc.so+0xa3844)
    #6 0x7fac9f05085e  (/lib/x86_64-linux-gnu/libocloc.so+0xd485e)
    #7 0x7fac9f04dbe7  (/lib/x86_64-linux-gnu/libocloc.so+0xd1be7)
    #8 0x7fac9f011381 in oclocInvoke (/lib/x86_64-linux-gnu/libocloc.so+0x95381)
    #9 0x559f99e3b786 in main (/usr/bin/ocloc+0x786)
    #10 0x7fac9ed7cd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #11 0x7fac9ed7ce3f in __libc_start_main_impl ../csu/libc-start.c:392
    #12 0x559f99e3b7b4 in _start (/usr/bin/ocloc+0x7b4)

0x602000016310 is located 0 bytes inside of 16-byte region [0x602000016310,0x602000016320)
allocated by thread T0 here:
    #0 0x7fac9f2311e7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x7fac84f601c2 in igcc::driver::DriverImpl::compile() (/opt/workspace/build/lib/libigc.so.1+0xe9d1c2)
bader pushed a commit that referenced this pull request Sep 27, 2024
…104148)

`hasOperands` does not always execute matchers in the order they are
written. This can cause issue in code using bindings when one operand
matcher is relying on a binding set by the other. With this change, the
first matcher present in the code is always executed first and any
binding it sets are available to the second matcher.

Simple example with current version (1 match) and new version (2
matches):
```bash
> cat tmp.cpp
int a = 13;
int b = ((int) a) - a;
int c = a - ((int) a);

> clang-query tmp.cpp
clang-query> set traversal IgnoreUnlessSpelledInSource
clang-query> m binaryOperator(hasOperands(cStyleCastExpr(has(declRefExpr(hasDeclaration(valueDecl().bind("d"))))), declRefExpr(hasDeclaration(valueDecl(equalsBoundNode("d"))))))

Match #1:

tmp.cpp:1:1: note: "d" binds here
int a = 13;
^~~~~~~~~~
tmp.cpp:2:9: note: "root" binds here
int b = ((int)a) - a;
        ^~~~~~~~~~~~
1 match.

> ./build/bin/clang-query tmp.cpp
clang-query> set traversal IgnoreUnlessSpelledInSource
clang-query> m binaryOperator(hasOperands(cStyleCastExpr(has(declRefExpr(hasDeclaration(valueDecl().bind("d"))))), declRefExpr(hasDeclaration(valueDecl(equalsBoundNode("d"))))))

Match #1:

tmp.cpp:1:1: note: "d" binds here
    1 | int a = 13;
      | ^~~~~~~~~~
tmp.cpp:2:9: note: "root" binds here
    2 | int b = ((int)a) - a;
      |         ^~~~~~~~~~~~

Match #2:

tmp.cpp:1:1: note: "d" binds here
    1 | int a = 13;
      | ^~~~~~~~~~
tmp.cpp:3:9: note: "root" binds here
    3 | int c = a - ((int)a);
      |         ^~~~~~~~~~~~
2 matches.
```

If this should be documented or regression tested anywhere please let me
know where.
bader pushed a commit that referenced this pull request Sep 27, 2024
Compilers and language runtimes often use helper functions that are
fundamentally uninteresting when debugging anything but the
compiler/runtime itself. This patch introduces a user-extensible
mechanism that allows for these frames to be hidden from backtraces and
automatically skipped over when navigating the stack with `up` and
`down`.

This does not affect the numbering of frames, so `f <N>` will still
provide access to the hidden frames. The `bt` output will also print a
hint that frames have been hidden.

My primary motivation for this feature is to hide thunks in the Swift
programming language, but I'm including an example recognizer for
`std::function::operator()` that I wished for myself many times while
debugging LLDB.

rdar://126629381


Example output. (Yes, my proof-of-concept recognizer could hide even
more frames if we had a method that returned the function name without
the return type or I used something that isn't based off regex, but it's
really only meant as an example).

before:
```
(lldb) thread backtrace --filtered=false
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10
    frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25
    frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12
    frame #3: 0x0000000100003968 a.out`std::__1::__function::__alloc_func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()[abi:se200000](this=0x000000016fdff280, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:171:12
    frame #4: 0x00000001000026bc a.out`std::__1::__function::__func<int (*)(int, int), std::__1::allocator<int (*)(int, int)>, int (int, int)>::operator()(this=0x000000016fdff278, __arg=0x000000016fdff224, __arg=0x000000016fdff220) at function.h:313:10
    frame #5: 0x0000000100003c38 a.out`std::__1::__function::__value_func<int (int, int)>::operator()[abi:se200000](this=0x000000016fdff278, __args=0x000000016fdff224, __args=0x000000016fdff220) const at function.h:430:12
    frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10
    frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10
    frame #8: 0x0000000183cdf154 dyld`start + 2476
(lldb) 
```

after

```
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000100001f04 a.out`foo(x=1, y=1) at main.cpp:4:10
    frame #1: 0x0000000100003a00 a.out`decltype(std::declval<int (*&)(int, int)>()(std::declval<int>(), std::declval<int>())) std::__1::__invoke[abi:se200000]<int (*&)(int, int), int, int>(__f=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:149:25
    frame #2: 0x000000010000399c a.out`int std::__1::__invoke_void_return_wrapper<int, false>::__call[abi:se200000]<int (*&)(int, int), int, int>(__args=0x000000016fdff280, __args=0x000000016fdff224, __args=0x000000016fdff220) at invoke.h:216:12
    frame #6: 0x0000000100002038 a.out`std::__1::function<int (int, int)>::operator()(this= Function = foo(int, int) , __arg=1, __arg=1) const at function.h:989:10
    frame #7: 0x0000000100001f64 a.out`main(argc=1, argv=0x000000016fdff4f8) at main.cpp:9:10
    frame #8: 0x0000000183cdf154 dyld`start + 2476
Note: Some frames were hidden by frame recognizers
```
bader pushed a commit that referenced this pull request Sep 27, 2024
`JITDylibSearchOrderResolver` local variable can be destroyed before
completion of all callbacks. Capture it together with `Deps` in
`OnEmitted` callback.

Original error:

```
==2035==ERROR: AddressSanitizer: stack-use-after-return on address 0x7bebfa155b70 at pc 0x7ff2a9a88b4a bp 0x7bec08d51980 sp 0x7bec08d51978
READ of size 8 at 0x7bebfa155b70 thread T87 (tf_xla-cpu-llvm)
    #0 0x7ff2a9a88b49 in operator() llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:58
    #1 0x7ff2a9a88b49 in __invoke<(lambda at llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:9) &, const llvm::DenseMap<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> >, llvm::DenseMapInfo<llvm::orc::JITDylib *, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> > > > &> libcxx/include/__type_traits/invoke.h:149:25
    #2 0x7ff2a9a88b49 in __call<(lambda at llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:55:9) &, const llvm::DenseMap<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> >, llvm::DenseMapInfo<llvm::orc::JITDylib *, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib *, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void> > > > &> libcxx/include/__type_traits/invoke.h:224:5
    #3 0x7ff2a9a88b49 in operator() libcxx/include/__functional/function.h:210:12
    #4 0x7ff2a9a88b49 in void std::__u::__function::__policy_invoker<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr,
```
bader pushed a commit that referenced this pull request Sep 27, 2024
Static destructor can race with calls to notify and trigger tsan
warning.

```
WARNING: ThreadSanitizer: data race (pid=5787)
  Write of size 1 at 0x55bec9df8de8 by thread T23:
    #0 pthread_mutex_destroy [third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1344](third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp?l=1344&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x1b12affb) (BuildId: ff25ace8b17d9863348bb1759c47246c)
    #1 __libcpp_recursive_mutex_destroy [third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h:91](third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h?l=91&cl=669089572):10 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d4e9) (BuildId: ff25ace8b17d9863348bb1759c47246c)
    #2 std::__tsan::recursive_mutex::~recursive_mutex() [third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp:52](third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp?l=52&cl=669089572):11 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d4e9)
    #3 ~SmartMutex [third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h:28](third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h?l=28&cl=669089572):11 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaedfe) (BuildId: ff25ace8b17d9863348bb1759c47246c)
    #4 (anonymous namespace)::PerfJITEventListener::~PerfJITEventListener() [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp:65](third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp?l=65&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaedfe)
    #5 cxa_at_exit_callback_installed_at(void*) [third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:437](third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp?l=437&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x1b172cb9) (BuildId: ff25ace8b17d9863348bb1759c47246c)
    #6 llvm::JITEventListener::createPerfJITEventListener() [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp:496](third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp?l=496&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcad8f5) (BuildId: ff25ace8b17d9863348bb1759c47246c)
```
```
Previous atomic read of size 1 at 0x55bec9df8de8 by thread T192 (mutexes: write M0, write M1):
    #0 pthread_mutex_unlock [third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1387](third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp?l=1387&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x1b12b6bb) (BuildId: ff25ace8b17d9863348bb1759c47246c)
    #1 __libcpp_recursive_mutex_unlock [third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h:87](third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h?l=87&cl=669089572):10 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d589) (BuildId: ff25ace8b17d9863348bb1759c47246c)
    #2 std::__tsan::recursive_mutex::unlock() [third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp:64](third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp?l=64&cl=669089572):11 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d589)
    #3 unlock [third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h:47](third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h?l=47&cl=669089572):16 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaf968) (BuildId: ff25ace8b17d9863348bb1759c47246c)
    #4 ~lock_guard [third_party/crosstool/v18/stable/src/libcxx/include/__mutex/lock_guard.h:39](third_party/crosstool/v18/stable/src/libcxx/include/__mutex/lock_guard.h?l=39&cl=669089572):101 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaf968)
    #5 (anonymous namespace)::PerfJITEventListener::notifyObjectLoaded(unsigned long, llvm::object::ObjectFile const&, llvm::RuntimeDyld::LoadedObjectInfo const&) [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp:290](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp?l=290&cl=669089572):1 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaf968)
    #6 llvm::orc::RTDyldObjectLinkingLayer::onObjEmit(llvm::orc::MaterializationResponsibility&, llvm::object::OwningBinary<llvm::object::ObjectFile>, std::__tsan::unique_ptr<llvm::RuntimeDyld::MemoryManager, std::__tsan::default_delete<llvm::RuntimeDyld::MemoryManager>>, std::__tsan::unique_ptr<llvm::RuntimeDyld::LoadedObjectInfo, std::__tsan::default_delete<llvm::RuntimeDyld::LoadedObjectInfo>>, std::__tsan::unique_ptr<llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>, llvm::DenseMapInfo<llvm::orc::JITDylib*, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>>>, std::__tsan::default_delete<llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>, llvm::DenseMapInfo<llvm::orc::JITDylib*, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>>>>>, llvm::Error) [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:386](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp?l=386&cl=669089572):10 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bc404a8) (BuildId: ff25ace8b17d9863348bb1759c47246c)
```
bader pushed a commit that referenced this pull request Jan 14, 2025
According to the documentation described at
https://github.com/loongson/la-abi-specs/blob/release/ladwarf.adoc, the
dwarf numbers for floating-point registers range from 32 to 63.

An incorrect dwarf number will prevent the register values from being
properly restored during unwinding.

This test reflects this problem:

```
loongson@linux:~$ cat test.c

void foo() {
  asm volatile ("movgr2fr.d $fs2, $ra":::"$fs2");
}
int main() {
  asm volatile ("movgr2fr.d $fs2, $sp":::"$fs2");
  foo();
  return 0;
}

loongson@linux:~$ clang -g test.c  -o test

```
Without this patch:
```
loongson@linux:~$ ./_build/bin/lldb ./t
(lldb) target create "./t"
Current executable set to
'/home/loongson/llvm-project/_build_lldb/t' (loongarch64).
(lldb) b foo
Breakpoint 1: where = t`foo + 20 at test.c:4:1, address =
0x0000000000000714
(lldb) r
Process 2455626 launched: '/home/loongson/llvm-project/_build_lldb/t' (loongarch64)
Process 2455626 stopped
* thread #1, name = 't', stop reason = breakpoint 1.1
    frame #0: 0x0000555555554714 t`foo at test.c:4:1
   1    #include <stdio.h>
   2
   3    void foo() {
-> 4    asm volatile ("movgr2fr.d $fs2, $ra":::"$fs2");
   5    }
   6    int main() {
   7    asm volatile ("movgr2fr.d $fs2, $sp":::"$fs2");
(lldb) si
Process 2455626 stopped
* thread #1, name = 't', stop reason = instruction step into
    frame #0: 0x0000555555554718 t`foo at test.c:4:1
   1    #include <stdio.h>
   2
   3    void foo() {
-> 4    asm volatile ("movgr2fr.d $fs2, $ra":::"$fs2");
   5    }
   6    int main() {
   7    asm volatile ("movgr2fr.d $fs2, $sp":::"$fs2");
(lldb) f 1
frame #1: 0x0000555555554768 t`main at test.c:8:1
   5    }
   6    int main() {
   7    asm volatile ("movgr2fr.d $fs2, $sp":::"$fs2");
-> 8    foo();
   9    return 0;
   10   }
(lldb) register read -a
General Purpose Registers:
        r1 = 0x0000555555554768  t`main + 40 at test.c:8:1
        r3 = 0x00007ffffffef780
       r22 = 0x00007ffffffef7b0
       r23 = 0x00007ffffffef918
       r24 = 0x0000000000000001
       r25 = 0x0000000000000000
       r26 = 0x000055555555be08  t`__do_global_dtors_aux_fini_array_entry
       r27 = 0x0000555555554740  t`main at test.c:6
       r28 = 0x00007ffffffef928
       r29 = 0x00007ffff7febc88  ld-linux-loongarch-lp64d.so.1`_rtld_global_ro
       r30 = 0x000055555555be08  t`__do_global_dtors_aux_fini_array_entry
        pc = 0x0000555555554768  t`main + 40 at test.c:8:1
33 registers were unavailable.

Floating Point Registers:
       f13 = 0x00007ffffffef780 !!!!! wrong register
       f24 = 0xffffffffffffffff
       f25 = 0xffffffffffffffff
       f26 = 0x0000555555554768  t`main + 40 at test.c:8:1
       f27 = 0xffffffffffffffff
       f28 = 0xffffffffffffffff
       f29 = 0xffffffffffffffff
       f30 = 0xffffffffffffffff
       f31 = 0xffffffffffffffff
32 registers were unavailable.
```
With this patch:
```
The previous operations are the same.
(lldb) register read -a
General Purpose Registers:
        r1 = 0x0000555555554768  t`main + 40 at test.c:8:1
        r3 = 0x00007ffffffef780
       r22 = 0x00007ffffffef7b0
       r23 = 0x00007ffffffef918
       r24 = 0x0000000000000001
       r25 = 0x0000000000000000
       r26 = 0x000055555555be08  t`__do_global_dtors_aux_fini_array_entry
       r27 = 0x0000555555554740  t`main at test.c:6
       r28 = 0x00007ffffffef928
       r29 = 0x00007ffff7febc88  ld-linux-loongarch-lp64d.so.1`_rtld_global_ro
       r30 = 0x000055555555be08  t`__do_global_dtors_aux_fini_array_entry
        pc = 0x0000555555554768  t`main + 40 at test.c:8:1
33 registers were unavailable.

Floating Point Registers:
       f24 = 0xffffffffffffffff
       f25 = 0xffffffffffffffff
       f26 = 0x00007ffffffef780
       f27 = 0xffffffffffffffff
       f28 = 0xffffffffffffffff
       f29 = 0xffffffffffffffff
       f30 = 0xffffffffffffffff
       f31 = 0xffffffffffffffff
33 registers were unavailable.
```

Reviewed By: SixWeining

Pull Request: llvm/llvm-project#120391
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants