Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress automatic GIL acquire to avoid deadlock #3037

Merged
merged 2 commits into from
Oct 26, 2020

Conversation

wphicks
Copy link
Contributor

@wphicks wphicks commented Oct 21, 2020

Explicitly disable Cython GIL acquire on entry into logging callbacks to avoid deadlock in multi-threaded C code which calls callback

Close #3027

Explicitly disable Cython GIL acquire on entry into logging callbacks to
avoid deadlock in multi-threaded C code which calls callback
@wphicks wphicks requested a review from a team as a code owner October 21, 2020 16:43
@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@wphicks
Copy link
Contributor Author

wphicks commented Oct 21, 2020

The one thing this PR is missing is a good unit test for the problem being fixed. It's tricky because it would require us to set up a multi-threaded C function that calls the logging, compile that, and then call that function from Python. Any thoughts on a clean way to test this would be very welcome.

Copy link
Member

@dantegd dantegd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change lgtm

@dantegd
Copy link
Member

dantegd commented Oct 21, 2020

rerun tests

1 similar comment
@wphicks
Copy link
Contributor Author

wphicks commented Oct 21, 2020

rerun tests

@codecov-io
Copy link

codecov-io commented Oct 21, 2020

Codecov Report

Merging #3037 into branch-0.17 will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##           branch-0.17    #3037   +/-   ##
============================================
  Coverage        59.26%   59.26%           
============================================
  Files              142      142           
  Lines             8962     8962           
============================================
  Hits              5311     5311           
  Misses            3651     3651           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3647695...8ade1c6. Read the comment docs.

@wphicks
Copy link
Contributor Author

wphicks commented Oct 21, 2020

Affected by same issue as 3038. Looks like the extra xgboost package is gone, so I'll rerun this one more time.

@wphicks
Copy link
Contributor Author

wphicks commented Oct 21, 2020

rerun tests

@vinaydes
Copy link
Contributor

The one thing this PR is missing is a good unit test for the problem being fixed. It's tricky because it would require us to set up a multi-threaded C function that calls the logging, compile that, and then call that function from Python. Any thoughts on a clean way to test this would be very welcome.

As you said, it is difficult to write test for this scenario. However I think we should add multi-threaded logging statements to logger unit test file. Something like:

#pragma omp parallel for n_threads(5)
for(int i = 0; i < 5; i++) {
  CUML_LOG_INFO("Testing multi-threaded logging from thread: %d", omp_get_thread_num());
}

This will ensure that we capture hang in logging if there is one.

@wphicks
Copy link
Contributor Author

wphicks commented Oct 22, 2020

@vinaydes I don't think that's a bad idea, but unfortunately it wouldn't have caught this particular bug. CUML_LOG_INFO and its brethren are a relatively thin wrapper on spdlog. Is there enough complexity in the logic of those wrappers that this would test something non-trivial beyond what spdlog already guarantees with its own unit tests? Certainly not trying to argue against it, just making sure I understand the value-add.

Either way, if it's all right by you, I think let's add that in a separate PR, since it's not a test of this particular change.

@vinaydes
Copy link
Contributor

@wphicks Yeah, you are right. It would only work if we also write a Python test which calls the C++ test code. We can leave it for another PR.

@wphicks wphicks added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Oct 22, 2020
@JohnZed JohnZed merged commit ae223f1 into rapidsai:branch-0.17 Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] cuml logger wrapper doesn't seem to work well in multi-threaded environments
6 participants