Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThreadSanitizer reports #52

Closed
Pagghiu opened this issue Jan 17, 2020 · 9 comments
Closed

ThreadSanitizer reports #52

Pagghiu opened this issue Jan 17, 2020 · 9 comments
Assignees

Comments

@Pagghiu
Copy link

Pagghiu commented Jan 17, 2020

I have been testing latest master (4f9941b).
ThreadSanitizer, enabled under XCode 11.0, is reporting some data races when running unmodified samples.

I am reporting the output of one Data race report for ParallelSum as an example.

==================
WARNING: ThreadSanitizer: data race (pid=56086)
  Read of size 4 at 0x7ffeefbff49c by thread T4:
    #0 enki::TaskScheduler::TryRunTask(unsigned int, unsigned int, unsigned int&) TaskScheduler.cpp:412 (ParallelSum:x86_64+0x100007204)
    #1 enki::TaskScheduler::TryRunTask(unsigned int, unsigned int&) TaskScheduler.cpp:377 (ParallelSum:x86_64+0x1000050d0)
    #2 enki::TaskScheduler::TaskingThreadFunction(enki::ThreadArgs const&) TaskScheduler.cpp:236 (ParallelSum:x86_64+0x100004e04)
    #3 decltype(std::__1::forward<void (*)(enki::ThreadArgs const&)>(fp)(std::__1::forward<enki::ThreadArgs>(fp0))) std::__1::__invoke<void (*)(enki::ThreadArgs const&), enki::ThreadArgs>(void (*&&)(enki::ThreadArgs const&), enki::ThreadArgs&&) type_traits:4361 (ParallelSum:x86_64+0x10000d06d)
    #4 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(enki::ThreadArgs const&), enki::ThreadArgs, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(enki::ThreadArgs const&), enki::ThreadArgs>&, std::__1::__tuple_indices<2ul>) thread:342 (ParallelSum:x86_64+0x10000ceb1)
    #5 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(enki::ThreadArgs const&), enki::ThreadArgs> >(void*) thread:352 (ParallelSum:x86_64+0x10000bf09)

  Previous write of size 4 at 0x7ffeefbff49c by main thread:
    #0 enki::ITaskSet::ITaskSet() TaskScheduler.h:122 (ParallelSum:x86_64+0x100003288)
    #1 ParallelReductionSumTaskSet::ParallelReductionSumTaskSet(unsigned int) ParallelSum.cpp:81 (ParallelSum:x86_64+0x100003ba8)
    #2 ParallelReductionSumTaskSet::ParallelReductionSumTaskSet(unsigned int) ParallelSum.cpp:82 (ParallelSum:x86_64+0x100002e04)
    #3 main ParallelSum.cpp:146 (ParallelSum:x86_64+0x100002390)

  Location is stack of main thread.

  Thread T4 (tid=3398714, running) created by main thread at:
    #0 pthread_create <null>:2673040 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x2aa2d)
    #1 std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:328 (ParallelSum:x86_64+0x10000be4e)
    #2 std::__1::thread::thread<void (&)(enki::ThreadArgs const&), enki::ThreadArgs, void>(void (&&&)(enki::ThreadArgs const&), enki::ThreadArgs&&) thread:368 (ParallelSum:x86_64+0x10000ba71)
    #3 std::__1::thread::thread<void (&)(enki::ThreadArgs const&), enki::ThreadArgs, void>(void (&&&)(enki::ThreadArgs const&), enki::ThreadArgs&&) thread:360 (ParallelSum:x86_64+0x100006238)
    #4 enki::TaskScheduler::StartThreads() TaskScheduler.cpp:298 (ParallelSum:x86_64+0x100005901)
    #5 enki::TaskScheduler::Initialize(unsigned int) TaskScheduler.cpp:924 (ParallelSum:x86_64+0x10000a687)
    #6 main ParallelSum.cpp:136 (ParallelSum:x86_64+0x10000231a)

SUMMARY: ThreadSanitizer: data race TaskScheduler.cpp:412 in enki::TaskScheduler::TryRunTask(unsigned int, unsigned int, unsigned int&)
==================
ThreadSanitizer report breakpoint hit. Use 'thread info -s' to get extended information about the report.

This is reporting that reading subTask.pTask->m_RangeToRun is a data race

bool TaskScheduler::TryRunTask( uint32_t threadNum_, uint32_t priority_, uint32_t& hintPipeToCheck_io_ )
{
// ...
 
        if( subTask.pTask->m_RangeToRun < partitionSize )
        {
            SubTaskSet taskToRun = SplitTask( subTask, subTask.pTask->m_RangeToRun );
       }

When declaring ParallelSumTaskSet m_ParallelSumTaskSet; inside struct ParallelReductionSumTaskSet

struct ParallelReductionSumTaskSet : ITaskSet
{
    ParallelSumTaskSet m_ParallelSumTaskSet;
    uint64_t m_FinalSum;

    ParallelReductionSumTaskSet( uint32_t size_ ) : m_ParallelSumTaskSet( size_ ), m_FinalSum(0)
    {
            m_ParallelSumTaskSet.Init( g_TS.GetNumTaskThreads() );
    }

    virtual void    ExecuteRange( TaskSetPartition range_, uint32_t threadnum_ )
    {
        g_TS.AddTaskSetToPipe( &m_ParallelSumTaskSet );
        g_TS.WaitforTask( &m_ParallelSumTaskSet );

        for( uint32_t i = 0; i < m_ParallelSumTaskSet.m_NumPartialSums; ++i )
        {
            m_FinalSum += m_ParallelSumTaskSet.m_pPartialSums[i].count;
        }
    }
}

will initialize ParallelSumTaskSet::m_RangeToRun in the constructor:

    class ITaskSet : public ICompletable
    {
    public:
        ITaskSet()
            : m_SetSize(1)
            , m_MinRange(1)
            , m_RangeToRun(1)
        {}
};

I am not expert on the field, but it looks like a potential false positive, because TryRunTask is executed only after AddTaskSetToPipe.

I try to keep our software clean from all sanitizer reports, so that I can catch real bugs ;)
For this reason if this or other reports look safe, I suggest to add annotations that disable TSAN where appropriate (using no_sanitize("thread")).

Does it make sense for me to report all data races found by the TSAN output?

Oh, and thanks for your excellent work on the library :)

@dougbinks dougbinks self-assigned this Jan 17, 2020
@dougbinks
Copy link
Owner

As you note, this is a false positive as the constuctor for ITaskSet is called before the task is launched via AddTaskSetToPipe and then TryRunTask.

This might be a bug in ThreadSanitizer (it's in beta still), I'm wondering if the looping over the test causes it to see a read from the previous iteration before a write from the next? Would you consider posting this to https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual as a potential issue?

Marking enkiTS functions with __attribute__((no_sanitize("thread"))) in a cross-compiler compatible way is potentially a fair amount of work, and it might lower the readability of the code as I may need to markup a fair few functions with a macro. I'll look into it when I get some time.

If you do get further reports do let me know - I'll also look into adding ThreadSanitizer to my internal testing.

@dougbinks
Copy link
Owner

Thanks for the report by the way!

@Pagghiu
Copy link
Author

Pagghiu commented Jan 20, 2020

I have been analyzing the reports some more and I've found that there is only one point of synchronization that ThreadSanitizer is complaining about.
TSAN complains about the data race around m_flags during the CAS loop insideReaderTryReadBack.
What could happen according to TSAN is that it's possible for ReaderTryReadBack to get access to a value in m_Buffer that it is still not formally flagged with FLAG_CAN_READ by WriterTryWriteFront.
Adding acquire and release semantics when storing / loading m_flags fixes it.
Sadly I don't have enough knowledge yet to prove that this is a correct modification to make/
I am using the current level of intuition that I have around these lock-free techniques while I am studying them, using TSAN as a sort of "validator".

In detail, these are the changes, in case you're interested:

    template<uint8_t cSizeLog2, typename T> inline
        bool LockLessMultiReadPipe<cSizeLog2,T>::ReaderTryReadBack(   T* pOut )
    {

               // ...

        while(true)
        {
            // ...
            uint32_t previous = FLAG_CAN_READ;
            bool bSuccess = m_Flags[  actualReadIndex ].compare_exchange_strong( previous, FLAG_INVALID, std::memory_order_acquire );
            if( bSuccess )
            {
                break;
            }
            // ...
        }

        // ...
        return true;
    }

and

 template<uint8_t cSizeLog2, typename T> inline
        bool LockLessMultiReadPipe<cSizeLog2,T>::WriterTryWriteFront( const T& in )
    {
        //  ...
        m_Buffer[ actualWriteIndex ] = in;
        m_Flags[  actualWriteIndex ].store( FLAG_CAN_READ, std::memory_order_release );
        //  ...

        return true;
    }

@dougbinks
Copy link
Owner

Thanks - I'll take a look at that.

@dougbinks
Copy link
Owner

I think this is a false positive as I use an acquire / release thread fence, but if it silences the warning it may be worth implementing even if not an issue - I'll check it doesn't adversely affect performance first.

@Pagghiu
Copy link
Author

Pagghiu commented Jan 21, 2020

I can confirm that with the latest change you've been pushing with commits 71cd321 to 49d2de8 in the dev branch, in my tests, TSAN no longer complains about data races 😄.

If you plan to land such changes in the master then we could also close this issue.

@dougbinks
Copy link
Owner

I'll be landing these to master after some testing, and will close this once I do - thanks for the reports!

@dougbinks
Copy link
Owner

I've completed testing and am merging to main & closing, once again thanks for the reports!

@dougbinks
Copy link
Owner

Just wanted to add an additional comment on ThreadSanitizer (TSAN) for future reference:

It appears the issue of TSAN not recognising thread fences is known - see about 3/4 the way down this thread about TBB and TASN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants