Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure KOKKOS_FUNCTION on functions called in a parallel context #3

Closed
calewis opened this issue Jun 11, 2020 · 3 comments
Closed

Ensure KOKKOS_FUNCTION on functions called in a parallel context #3

calewis opened this issue Jun 11, 2020 · 3 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@calewis
Copy link

calewis commented Jun 11, 2020

This check will ensure that functions called inside Kokkos::parallel_* functors and functions annotated with KOKKOS_FUNCTION are annotated with KOKKOS_FUNCTION.

At a minimum should detect the calls to oops in both not_oops and the KOKKOS_LAMBDA, this check will be made significantly easier if new versions of Kokkos are required due to an annotation attribute added to KOKKOS_FUNCTION.

int oops(int x){ return x;}

KOKKOS_FUNCTION
int not_oops(int x){return oops(x);}

Kokkos::parallel_for(7, KOKKOS_LAMBA(int x){printf("%d", oops(x));});

Also we need to be sure not to flag functions such as printf

@calewis calewis added the enhancement New feature or request label Jun 11, 2020
@calewis calewis added this to the Intial Checks milestone Jun 11, 2020
@calewis calewis self-assigned this Jun 11, 2020
@calewis
Copy link
Author

calewis commented Jun 24, 2020

Need to make sure that nested lambdas don't trigger this warning since they don't need annotations

Example that currently triggers a warning when it shouldn't.

Kokkos::parallel_for(7, KOKKOS_LAMBA(int x){
     [&]{x;}()
});

@calewis
Copy link
Author

calewis commented Jun 25, 2020

It turns out that the issue wasn't the previous post, it was that Kokkos::parallel_for internally was generating a CallExpr to our lambda operator(). I fixed this by only inspecting FunctionDecls that are in user space.

@calewis
Copy link
Author

calewis commented Aug 26, 2020

The check is being PRd right now, new problems should just be filed as new bugs.

@calewis calewis closed this as completed Aug 26, 2020
crtrott pushed a commit that referenced this issue Jul 13, 2023
…tput

The crash happens in clang::driver::tools::SplitDebugName when Output is
InputInfo::Nothing. It doesn't happen with standalone clang driver because
output is created in Driver::BuildJobsForActionNoCache.

Example backtrace:
```
* thread #1, name = 'clangd', stop reason = hit program assert
  * frame #0: 0x00007ffff5c4eacf libc.so.6`raise + 271
    frame #1: 0x00007ffff5c21ea5 libc.so.6`abort + 295
    frame #2: 0x00007ffff5c21d79 libc.so.6`__assert_fail_base.cold.0 + 15
    frame #3: 0x00007ffff5c47426 libc.so.6`__assert_fail + 70
    frame #4: 0x000055555dc0923c clangd`clang::driver::InputInfo::getFilename(this=0x00007fffffff9398) const at InputInfo.h:84:5
    frame #5: 0x000055555dcd0d8d clangd`clang::driver::tools::SplitDebugName(JA=0x000055555f6c6a50, Args=0x000055555f6d0b80, Input=0x00007fffffff9678, Output=0x00007fffffff9398) at CommonArgs.cpp:1275:40
    frame #6: 0x000055555dc955a5 clangd`clang::driver::tools::Clang::ConstructJob(this=0x000055555f6c69d0, C=0x000055555f6c64a0, JA=0x000055555f6c6a50, Output=0x00007fffffff9398, Inputs=0x00007fffffff9668, Args=0x000055555f6d0b80, LinkingOutput=0x0000000000000000) const at Clang.cpp:5690:33
    frame #7: 0x000055555dbf6b54 clangd`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fffffffb5e0, C=0x000055555f6c64a0, A=0x000055555f6c6a50, TC=0x000055555f6c4be0, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=1, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:5618:10
    frame #8: 0x000055555dbf4ef0 clangd`clang::driver::Driver::BuildJobsForAction(this=0x00007fffffffb5e0, C=0x000055555f6c64a0, A=0x000055555f6c6a50, TC=0x000055555f6c4be0, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=1, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:5306:26
    frame #9: 0x000055555dbeb590 clangd`clang::driver::Driver::BuildJobs(this=0x00007fffffffb5e0, C=0x000055555f6c64a0) const at Driver.cpp:4844:5
    frame #10: 0x000055555dbe6b0f clangd`clang::driver::Driver::BuildCompilation(this=0x00007fffffffb5e0, ArgList=ArrayRef<const char *> @ 0x00007fffffffb268) at Driver.cpp:1496:3
    frame #11: 0x000055555b0cc0d9 clangd`clang::createInvocation(ArgList=ArrayRef<const char *> @ 0x00007fffffffbb38, Opts=CreateInvocationOptions @ 0x00007fffffffbb90) at CreateInvocationFromCommandLine.cpp:53:52
    frame #12: 0x000055555b378e7b clangd`clang::clangd::buildCompilerInvocation(Inputs=0x00007fffffffca58, D=0x00007fffffffc158, CC1Args=size=0) at Compiler.cpp:116:44
    frame #13: 0x000055555895a6c8 clangd`clang::clangd::(anonymous namespace)::Checker::buildInvocation(this=0x00007fffffffc760, TFS=0x00007fffffffe570, Contents= Has Value=false ) at Check.cpp:212:9
    frame #14: 0x0000555558959cec clangd`clang::clangd::check(File=(Data = "build/test.cpp", Length = 64), TFS=0x00007fffffffe570, Opts=0x00007fffffffe600) at Check.cpp:486:34
    frame #15: 0x000055555892164a clangd`main(argc=4, argv=0x00007fffffffecd8) at ClangdMain.cpp:993:12
    frame #16: 0x00007ffff5c3ad85 libc.so.6`__libc_start_main + 229
    frame #17: 0x00005555585bbe9e clangd`_start + 46
```

Test Plan: ninja ClangDriverTests && tools/clang/unittests/Driver/ClangDriverTests

Differential Revision: https://reviews.llvm.org/D154602
crtrott pushed a commit that referenced this issue Aug 8, 2023
TSan reports the following data race:

  Write of size 4 at 0x000109e0b160 by thread T2 (mutexes: write M0, write M1):
    #0 NativeFile::Close() File.cpp:329
    #1 ConnectionFileDescriptor::Disconnect(lldb_private::Status*) ConnectionFileDescriptorPosix.cpp:232
    #2 Communication::Disconnect(lldb_private::Status*) Communication.cpp:61
    #3 process_gdb_remote::ProcessGDBRemote::DidExit() ProcessGDBRemote.cpp:1164
    #4 Process::SetExitStatus(int, char const*) Process.cpp:1097
    #5 process_gdb_remote::ProcessGDBRemote::MonitorDebugserverProcess(...) ProcessGDBRemote.cpp:3387

  Previous read of size 4 at 0x000109e0b160 by main thread (mutexes: write M2):
    #0 NativeFile::IsValid() const File.h:393
    #1 ConnectionFileDescriptor::IsConnected() const ConnectionFileDescriptorPosix.cpp:121
    #2 Communication::IsConnected() const Communication.cpp:79
    #3 process_gdb_remote::GDBRemoteCommunication::WaitForPacketNoLock(...) GDBRemoteCommunication.cpp:256
    #4 process_gdb_remote::GDBRemoteCommunication::WaitForPacketNoLock(...l) GDBRemoteCommunication.cpp:244
    #5 process_gdb_remote::GDBRemoteClientBase::SendPacketAndWaitForResponseNoLock(llvm::StringRef, StringExtractorGDBRemote&) GDBRemoteClientBase.cpp:246

The problem is that in WaitForPacketNoLock's run loop, it checks that
the connection is still connected. This races with the
ConnectionFileDescriptor disconnecting. Most (but not all) access to the
IOObject in ConnectionFileDescriptorPosix is already gated by the mutex.
This patch just protects IsConnected in the same way.

Differential revision: https://reviews.llvm.org/D157347
crtrott pushed a commit that referenced this issue Aug 16, 2023
TSan reports the following race:

  Write of size 8 at 0x000107707ee8 by main thread:
    #0 lldb_private::ThreadedCommunication::StartReadThread(...) ThreadedCommunication.cpp:175
    #1 lldb_private::Process::SetSTDIOFileDescriptor(...) Process.cpp:4533
    #2 lldb_private::Platform::DebugProcess(...) Platform.cpp:1121
    #3 lldb_private::PlatformDarwin::DebugProcess(...) PlatformDarwin.cpp:711
    #4 lldb_private::Target::Launch(...) Target.cpp:3235
    #5 CommandObjectProcessLaunch::DoExecute(...) CommandObjectProcess.cpp:256
    #6 lldb_private::CommandObjectParsed::Execute(...) CommandObject.cpp:751
    #7 lldb_private::CommandInterpreter::HandleCommand(...) CommandInterpreter.cpp:2054

  Previous read of size 8 at 0x000107707ee8 by thread T5:
    #0 lldb_private::HostThread::IsJoinable(...) const HostThread.cpp:30
    #1 lldb_private::ThreadedCommunication::StopReadThread(...) ThreadedCommunication.cpp:192
    #2 lldb_private::Process::ShouldBroadcastEvent(...) Process.cpp:3420
    #3 lldb_private::Process::HandlePrivateEvent(...) Process.cpp:3728
    #4 lldb_private::Process::RunPrivateStateThread(...) Process.cpp:3914
    #5 std::__1::__function::__func<lldb_private::Process::StartPrivateStateThread(...) function.h:356
    #6 lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(...) HostNativeThreadBase.cpp:62
    #7 lldb_private::HostThreadMacOSX::ThreadCreateTrampoline(...) HostThreadMacOSX.mm:18

The problem is the lack of synchronization between starting and stopping
the read thread. This patch fixes that by protecting those operations
with a mutex.

Differential revision: https://reviews.llvm.org/D157361
crtrott pushed a commit that referenced this issue Aug 16, 2023
TSan reports the following data race:

  Write of size 4 at 0x000109e0b160 by thread T2 (...):
    #0 lldb_private::NativeFile::Close() File.cpp:329
    #1 lldb_private::ConnectionFileDescriptor::Disconnect(...) ConnectionFileDescriptorPosix.cpp:232
    #2 lldb_private::Communication::Disconnect(...) Communication.cpp:61
    #3 lldb_private::process_gdb_remote::ProcessGDBRemote::DidExit() ProcessGDBRemote.cpp:1164
    #4 lldb_private::Process::SetExitStatus(...) Process.cpp:1097
    #5 lldb_private::process_gdb_remote::ProcessGDBRemote::MonitorDebugserverProcess(...) ProcessGDBRemote.cpp:3387

  Previous read of size 4 at 0x000109e0b160 by main thread (...):
    #0 lldb_private::NativeFile::IsValid() const File.h:393
    #1 lldb_private::ConnectionFileDescriptor::IsConnected() const ConnectionFileDescriptorPosix.cpp:121
    #2 lldb_private::Communication::IsConnected() const Communication.cpp:79
    #3 lldb_private::process_gdb_remote::GDBRemoteCommunication::WaitForPacketNoLock(...) GDBRemoteCommunication.cpp:256
    #4 lldb_private::process_gdb_remote::GDBRemoteCommunication::WaitForPacketNoLock(...) GDBRemoteCommunication.cpp:244
    #5 lldb_private::process_gdb_remote::GDBRemoteClientBase::SendPacketAndWaitForResponseNoLock(...) GDBRemoteClientBase.cpp:246

I originally tried fixing the problem at the ConnectionFileDescriptor
level, but that operates on an IOObject which can have different thread
safety guarantees depending on its implementation.

For this particular issue, the problem is specific to NativeFile.
NativeFile can hold a file descriptor and/or a file stream. Throughout
its implementation, it checks if the descriptor or stream is valid and
do some operation on it if it is. While that works in a single threaded
environment, nothing prevents another thread from modifying the
descriptor or stream between the IsValid check and when it's actually
being used.

This patch prevents such issues by returning a ValueGuard RAII object.
As long as the object is in scope, the value is guaranteed by a lock.

Differential revision: https://reviews.llvm.org/D157347
crtrott pushed a commit that referenced this issue Aug 16, 2023
Thread sanitizer reports the following data race:

```
WARNING: ThreadSanitizer: data race (pid=43201)
  Write of size 4 at 0x00010520c474 by thread T1 (mutexes: write M0, write M1):
    #0 lldb_private::PipePosix::CloseWriteFileDescriptor() PipePosix.cpp:242 (liblldb.18.0.0git.dylib:arm64+0x414700) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00)
    #1 lldb_private::PipePosix::Close() PipePosix.cpp:217 (liblldb.18.0.0git.dylib:arm64+0x4144e8) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00)
    #2 lldb_private::ConnectionFileDescriptor::Disconnect(lldb_private::Status*) ConnectionFileDescriptorPosix.cpp:239 (liblldb.18.0.0git.dylib:arm64+0x40a620) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00)
    #3 lldb_private::Communication::Disconnect(lldb_private::Status*) Communication.cpp:61 (liblldb.18.0.0git.dylib:arm64+0x2a9318) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00)
    #4 lldb_private::process_gdb_remote::ProcessGDBRemote::DidExit() ProcessGDBRemote.cpp:1167 (liblldb.18.0.0git.dylib:arm64+0x8ed984) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00)

  Previous read of size 4 at 0x00010520c474 by main thread (mutexes: write M2, write M3):
    #0 lldb_private::PipePosix::CanWrite() const PipePosix.cpp:229 (liblldb.18.0.0git.dylib:arm64+0x4145e4) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00)
    #1 lldb_private::ConnectionFileDescriptor::Disconnect(lldb_private::Status*) ConnectionFileDescriptorPosix.cpp:212 (liblldb.18.0.0git.dylib:arm64+0x40a4a8) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00)
    #2 lldb_private::Communication::Disconnect(lldb_private::Status*) Communication.cpp:61 (liblldb.18.0.0git.dylib:arm64+0x2a9318) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00)
    #3 lldb_private::process_gdb_remote::GDBRemoteCommunication::WaitForPacketNoLock(StringExtractorGDBRemote&, lldb_private::Timeout<std::__1::ratio<1l, 1000000l>>, bool) GDBRemoteCommunication.cpp:373 (liblldb.18.0.0git.dylib:arm64+0x8b9c48) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00)
    #4 lldb_private::process_gdb_remote::GDBRemoteCommunication::WaitForPacketNoLock(StringExtractorGDBRemote&, lldb_private::Timeout<std::__1::ratio<1l, 1000000l>>, bool) GDBRemoteCommunication.cpp:243 (liblldb.18.0.0git.dylib:arm64+0x8b9904) (BuildId: 2983976beb2637b5943bff32fd12eb8932000000200000000100000000000e00)
```

Fix this by adding a mutex to PipePosix.

Differential Revision: https://reviews.llvm.org/D157654
crtrott pushed a commit that referenced this issue Aug 31, 2023
This reverts commit 0e63f1a.

clang-format started to crash with contents like:
a.h:
```
```
$ clang-format a.h
```
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: ../llvm/build/bin/clang-format a.h
 #0 0x0000560b689fe177 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Support/Unix/Signals.inc:723:13
 #1 0x0000560b689fbfbe llvm::sys::RunSignalHandlers() /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Support/Signals.cpp:106:18
 #2 0x0000560b689feaca SignalHandler(int) /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Support/Unix/Signals.inc:413:1
 #3 0x00007f030405a540 (/lib/x86_64-linux-gnu/libc.so.6+0x3c540)
 #4 0x0000560b68a9a980 is /usr/local/google/home/kadircet/repos/llvm/clang/include/clang/Lex/Token.h:98:44
 #5 0x0000560b68a9a980 is /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/FormatToken.h:562:51
 #6 0x0000560b68a9a980 startsSequenceInternal<clang::tok::TokenKind, clang::tok::TokenKind> /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/FormatToken.h:831:9
 #7 0x0000560b68a9a980 startsSequence<clang::tok::TokenKind, clang::tok::TokenKind> /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/FormatToken.h:600:12
 #8 0x0000560b68a9a980 getFunctionName /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/TokenAnnotator.cpp:3131:17
 #9 0x0000560b68a9a980 clang::format::TokenAnnotator::annotate(clang::format::AnnotatedLine&) /usr/local/google/home/kadircet/repos/llvm/clang/lib/Format/TokenAnnotator.cpp:3191:17
Segmentation fault
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant