-
Notifications
You must be signed in to change notification settings - Fork 112
Use one cusparse handle per thread to avoid race condition on cuspars… #544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughSets cuBLAS/cuSPARSE pointer modes to DEVICE in the cusparse_view constructor, changes concurrent-solver barrier handle creation to use a raft::handle_t constructed from Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/linear_programming/solve.cu (1)
676-682: Race condition confirmed: Both threads share single cuSPARSE/cuBLAS handle via shallow copy — separate handles requiredThe issue is real.
user_problem_tstoresraft::handle_t const* handle_ptras a member. Whenbarrier_problem = dual_simplex_problem(line 684) executes, the shallow copy leaves both problems pointing to the samebarrier_handle. Bothrun_dual_simplex_thread(viadual_simplex_problem) andrun_barrier_thread(viabarrier_problem) then use this shared handle concurrently, creating a race on cuSPARSE/cuBLAS state. Additionally, onlybarrier_handle.sync_stream()is called before joining threads—the dual simplex thread stream is never synced.Create separate
raft::handle_tinstances for each thread and sync both streams before consuming results, as suggested. This eliminates the race without requiring per-thread default stream initialization.
🧹 Nitpick comments (1)
cpp/src/dual_simplex/cusparse_view.cu (1)
141-144: Verified: pointer mode setup is correct, consider centralizing across codebaseVerification confirms all cuBLAS/cuSPARSE calls in
cpp/src/dual_simplex/cusparse_view.cuuse device scalars (d_one_, d_zero_, d_minus_one_), consistent with the DEVICE pointer mode set in lines 141-144. No HOST pointer mode usages detected in this file.The centralization suggestion remains valid—pointer mode initialization appears in multiple files (mip/solve.cu, linear_programming/solve.cu, mip/problem/problem.cu, mip/solution/solution.cu, mip/solver.cu) and could be consolidated to reduce redundant state changes per handle. This is optional but worth considering in a future refactoring pass.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cpp/src/dual_simplex/cusparse_view.cu(1 hunks)cpp/src/linear_programming/solve.cu(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.1, 3.12, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-mps-parser / 13.0.1, 3.10, amd64, rockylinux8
- GitHub Check: wheel-build-cuopt-sh-client / 13.0.1, 3.10, amd64, rockylinux8
- GitHub Check: checks / check-style
a2c2ae7 to
ef84855
Compare
|
/merge |
Summary by CodeRabbit