Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lot's of non-deterministic failures in univariate_statistics_test.cpp #585

Closed
jzmaddock opened this issue Mar 28, 2021 · 12 comments
Closed

Comments

@jzmaddock
Copy link
Collaborator

These seem to be related to gini calculation, I changed all the BOOST_TEST's to BOOST_TEST_LT so we could see how close the failures were, here's a typical run with msvc-14.2:

D:\data\boost\boost\libs\math\test\univariate_statistics_test.cpp(694): test 'abs(gini - expected) < Real(0.03)' ('0.0487317' < '0.03') failed in function 'void __cdecl test_gini_coefficient<float,const class std::execution::parallel_policy&>(const class std::execution::parallel_policy &)'
D:\data\boost\boost\libs\math\test\univariate_statistics_test.cpp(694): test 'abs(gini - expected) < Real(0.03)' ('0.0593992' < '0.03') failed in function 'void __cdecl test_gini_coefficient<long double,const class std::execution::parallel_policy&>(const class std::execution::parallel_policy &)'
D:\data\boost\boost\libs\math\test\univariate_statistics_test.cpp(659): test 'abs(gini - expected) < tol' ('0.666667' < '5.34553e-50') failed in function 'void __cdecl test_gini_coefficient<class boost::multiprecision::number<class boost::multiprecision::backends::cpp_bin_float<50,10,void,int,0,0>,0>,const class std::execution::parallel_policy&>(const class std::execution::parallel_policy &)'
D:\data\boost\boost\libs\math\test\univariate_statistics_test.cpp(682): test 'abs(gini) < tol' ('0.222222' < '5.34553e-50') failed in function 'void __cdecl test_gini_coefficient<class boost::multiprecision::number<class boost::multiprecision::backends::cpp_bin_float<50,10,void,int,0,0>,0>,const class std::execution::parallel_policy&>(const class std::execution::parallel_policy &)'
D:\data\boost\boost\libs\math\test\univariate_statistics_test.cpp(694): test 'abs(gini - expected) < Real(0.03)' ('0.144181' < '0.03') failed in function 'void __cdecl test_gini_coefficient<class boost::multiprecision::number<class boost::multiprecision::backends::cpp_bin_float<50,10,void,int,0,0>,0>,const class std::execution::parallel_policy&>(const class std::execution::parallel_policy &)'
D:\data\boost\boost\libs\math\test\univariate_statistics_test.cpp(629): test 'abs(gini - 1) < tol' ('1' < '5.34553e-50') failed in function 'void __cdecl test_sample_gini_coefficient<class boost::multiprecision::number<class boost::multiprecision::backends::cpp_bin_float<50,10,void,int,0,0>,0>,const class std::execution::parallel_policy&>(const class std::execution::parallel_policy &)'
6 errors detected.

However, the output is different every time! Also note how large some of the errors are, something grievous is going on here, but I don't see it at present.

@mborland
Copy link
Member

I remember seeing similar egregious failures with GCC9 which used TBB instead of it's own implementation of the parallelism TS. TBB fails tsan horribly, so I wonder if something similar is going on with MSVC?

@jzmaddock
Copy link
Collaborator Author

I don't know, but I went back through the history and the issue is still present at 1eb3c71 which is the original #434 "Implement Policies in Statistics" commit.

mborland added a commit to mborland/math that referenced this issue Mar 28, 2021
@mborland
Copy link
Member

Can you please give PR #586 a try?

@jzmaddock
Copy link
Collaborator Author

Looking at:

template<typename ReturnType, typename ExecutionPolicy, typename RandomAccessIterator>
ReturnType gini_coefficient_parallel_impl(ExecutionPolicy&& exec, RandomAccessIterator first, RandomAccessIterator last)
{
    using Real = typename std::iterator_traits<RandomAccessIterator>::value_type;
    
    ReturnType i = 1;
    ReturnType num = 0;
    ReturnType denom = 0;
    
    std::for_each(exec, first, last, [&i, &num, &denom](const Real& val)
    {
        num = num + val * i;
        denom = denom + val;
        i = i + 1;
    });

    if(denom == 0)
    {
        return ReturnType(0);
    }

    return ((2*num)/denom - i)/(i-1);
}

I have 2 comments:

  1. How is there not a race condition here accessing num, denom and i ?
  2. Does the order of the elements matter for the result? I ask because the parallel for_each can call the functor in any order, so i*val will be different for each run.

@jzmaddock
Copy link
Collaborator Author

OK, looking at this some more I think we have several issues:

  • Yes there really is a race condition on modifying the shared variables.
  • The input data must be sorted, but parallel std::for_each can call the functor in any order, which effectively "unsorts" it.
  • I'm not convinced that gcc/libstdc++ is actually running the tests in parallel: there's some complicated logic going on, but on mingw at least, the test code ends up in a sequential/serial for_each which is why the tests pass there. More investigation needed there.

I fear the only correct way to implement this as a parallel algorithm, is to divide the input range into N segments and create a future for each one, each of which returns a pair of partial sums, which can then be accumulated at the end. Any other thoughts?

We should probably review the other parallel algorithms for similar issues too.

@jzmaddock
Copy link
Collaborator Author

A quick eyeball over the code found nothing else obvious. @mborland , has everything (regression tests and performance tests) been run through clang's thread sanitizer just to be on the safe side?

@NAThompson
Copy link
Collaborator

Do we have a threadsanitizer build in CI?

@jzmaddock
Copy link
Collaborator Author

Do we have a threadsanitizer build in CI?

No, I know I added some sanitizer to the multiprecision CI tests, but we have nothing here yet I think?

In general they're quite hard to run as CI jobs: either the VM rejects them or there are false positives. But yes we should try and set something up.

@mborland
Copy link
Member

I fear the only correct way to implement this as a parallel algorithm, is to divide the input range into N segments and create a future for each one, each of which returns a pair of partial sums, which can then be accumulated at the end. Any other thoughts?

We should probably review the other parallel algorithms for similar issues too.

The bulk of the other algorithms use exactly what you are describing. They would serve as an effective starting point.

A quick eyeball over the code found nothing else obvious. @mborland , has everything (regression tests and performance tests) been run through clang's thread sanitizer just to be on the safe side?

In the original PR I ran everything through GCC's asan and tsan. Have not tried the Clang equivalent.

Do we have a threadsanitizer build in CI?

No, I know I added some sanitizer to the multiprecision CI tests, but we have nothing here yet I think?

There is currently nothing in the CI to run asan/tsan/ubsan. Perhaps we could add a CircleCI config to only run these? The runs for GHA already take many times longer now that most of Boost has moved to it so I would recommend against adding more to GHA. GCC11 and Clang12 should be mainstream pretty soon here too...

@jzmaddock
Copy link
Collaborator Author

Tentative CircleCI run added here: #592

This also adds doc building and inspection report run.

@jzmaddock
Copy link
Collaborator Author

Haha, well CircleCI might not have been the best choice, apparently I have -24K free credits left!

jzmaddock added a commit that referenced this issue Mar 30, 2021
mborland added a commit to mborland/math that referenced this issue Mar 31, 2021
jzmaddock added a commit that referenced this issue Apr 6, 2021
@mborland
Copy link
Member

mborland commented May 3, 2021

@jzmaddock This issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants