Parallel Branch-and-Bound #412

nguidotti · 2025-09-23T14:01:05Z

Description

This PR implement a parallel branch-and-bound procedure, which is split into two phases. In the first phase, the algorithm will greedily expand the search tree until a certain depth and then add the bottom nodes to a global heap. The parallel expansion is implemented using omp task.

In the second phase, some threads will explore the tree using best first search with plunging, i.e., they take the first node from the global heap and then explore the entire branch that starts on this node. Any unexplored node are insert into the heap. The remaining threads will perform deep dives in order to find feasible solutions. The solver keep a small heap contains the most promising nodes to perform the dives, which is keep in sync with the global heap.

This PR also

Replace the std::thread-based parallelization in the strong branching with OpenMP in order to use dynamic scheduling. This ensures that all threads have similar amount of work and improve parallel performance.
Fixed invalid memory access when trying to access the status of a fathomed node.
Replaced std::mutex with omp atomic whatever applicable.
Added dedicated classes dive_queue_t and search_tree_t to store the diving heap and the search tree, respectively.

This is an extension of #305.
Closes #320.
Closes #417.

Benchmark results (MIPLIB2017):

master branch (53d6e74)

Average Gap: 0.2174861712

This PR:

Average Gap: 0.1989485546

i.e., a 1.8% improvement. In terms of the geomean of the gap ratio, this is equal to 1.62x.

Checklist

I am familiar with the Contributing Guidelines.
Testing
- New or existing tests cover these changes
- Added tests
- Created an issue to follow-up
- NA
Documentation
- The documentation is up to date with these changes
- Added new documentation
- NA

…and diving threads.

copy-pr-bot · 2025-09-23T14:01:16Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

nguidotti · 2025-09-23T15:46:11Z

/ok to test 0e14fe2

Signed-off-by: nicolas <nguidotti@nvidia.com>

aliceb-nv

Thanks a lot for the great work and results Nicolas! Just a few minor nitpicks, otherwise looks good :) I'll let Chris review the algorithm side

cpp/src/dual_simplex/branch_and_bound.cpp

cpp/src/dual_simplex/branch_and_bound.hpp

cpp/src/dual_simplex/mip_node.hpp

aliceb-nv · 2025-09-24T09:56:51Z

cpp/src/dual_simplex/simplex_solver_settings.hpp

-                    : std::thread::hardware_concurrency()),
+      num_threads(std::thread::hardware_concurrency() - 1),
+      num_bfs_threads(std::min(num_threads, 4)),
+      num_diving_threads(num_threads - num_bfs_threads),


Should we guarantee one diving thread at least maybe? The following code might make this assumption (and diving even with thread contention might be better than none at all)

True. Also, if num_threads < 4, there will be have no diving threads. Instead, we can set something like this:

num_bfs_threads = std::min(1, num_threads / 4) num_diving_threads = std::min(1, num_threads - num_bfs_threads)

I will experiment a bit with the different configurations

aliceb-nv · 2025-09-24T10:13:44Z

cpp/src/mip/diversity/recombiners/sub_mip.cuh

      branch_and_bound_settings.absolute_mip_gap_tol = context.settings.tolerances.absolute_mip_gap;
      branch_and_bound_settings.relative_mip_gap_tol = context.settings.tolerances.relative_mip_gap;
      branch_and_bound_settings.integer_tol = context.settings.tolerances.integrality_tolerance;
+      branch_and_bound_settings.num_threads = 1;


Have we experimented with this parameter? Submip tends to yield good solutions, it might perform better with a few more threads allocated

Not yet. It was set to 1 to avoid oversubscription of the threads. I can allocate a few more threads to submip and see how it behaves.

For now, I set to 1 best first thread and 1 diving thread. Later, we can experiment in allocating more threads to submip. @akifcorduk

cpp/src/dual_simplex/branch_and_bound.hpp

cpp/src/dual_simplex/mip_node.hpp

cpp/src/dual_simplex/branch_and_bound.cpp

chris-maes · 2025-10-02T13:55:39Z

cpp/src/dual_simplex/branch_and_bound.cpp

+          stack.push_front(second);
+          stack.push_front(first);
+
+          if (dive_queue_.size() < min_diving_queue_size_) {


It would be good to have a comment here about why you are adding to the dive_queue

Is this the only place we add to the dive queue? I thought the BFS threads would also add to the dive queue

The BFS also adds to the diving queue. It is around the line 856 in the explore_subtree routine.

cpp/src/dual_simplex/branch_and_bound.cpp

chris-maes · 2025-10-02T14:00:25Z

cpp/src/dual_simplex/branch_and_bound.hpp

+  UNSET      = 6,  // The status is not set
+};
+
+enum class mip_exploration_status_t {


Thanks for splitting this apart. I'm still a little confused why you need the status TIME_LIMIT, NODE_LIMIT, and NUMERICAL here. Why can't this status just be RUNNING or COMPLETED?

The solver checks the (internal) status to determines when to stop (basically, when it is not RUNNING). The other status is to keep track of why we stopped the solver (it will be translated later to the mip_status_t).

But if we hit a TIME_LIMIT or NODE_LIMIT shouldn't any part of the code be able to tell that just by checking the time and the number of nodes?

I think there is probably an opportunity to simplify the code a bit here and not mix the return status with whether branch and bound is running or not.

chris-maes · 2025-10-02T14:02:40Z

cpp/src/dual_simplex/mip_node.hpp

+  FATHOMED         = 3,  // Node objective is greater than the upper bound
+  HAS_CHILDREN     = 4,  // Node has children to explore
+  NUMERICAL        = 5,  // Encountered numerical issue when solving the LP relaxation
+  TIME_LIMIT       = 6   // Time out during the LP relaxation


Why is TIME_LIMIT needed here?

cpp/src/dual_simplex/pseudo_costs.cpp

cpp/src/mip/local_search/rounding/constraint_prop.cu

cpp/src/mip/local_search/rounding/lb_constraint_prop.cu

cpp/src/dual_simplex/branch_and_bound.cpp

chris-maes

I'm removing the request for changes. But I'm still a bit fuzzy on how you handle nodes with numerical errors, in particular in the case where they are encountered by the diving thread.

I also a bit fuzzy on how the lower bound is handled in the ramp up phase versus normal BFS threads.

chris-maes

Trying to remove request changes

rg20 · 2025-10-03T19:00:54Z

/merge

nguidotti added 7 commits September 19, 2025 22:23

parallel mip solver with support for best first search with plunging …

a123da2

…and diving threads.

removed unused/duplicated includes

02d4b60

added a separated class for the search tree. code cleanup.

bd53b94

fixed race condition

3e6c21c

added ramp up phase

3c9b192

fixed invalid memory access

189b023

re-enabled ramp up phase

065f112

nguidotti requested a review from a team as a code owner September 23, 2025 14:01

nguidotti requested review from Kh4ster and aliceb-nv September 23, 2025 14:01

nguidotti marked this pull request as draft September 23, 2025 14:01

nguidotti added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Sep 23, 2025

fixed initialization order

0e14fe2

nguidotti marked this pull request as ready for review September 23, 2025 19:18

fixed incorrect lower bounds

c41af70

Signed-off-by: nicolas <nguidotti@nvidia.com>

nguidotti changed the title ~~[Draft] Multithreaded Branch-and-Bound~~ Parallel Branch-and-Bound Sep 24, 2025

Merge branch 'branch-25.10' into parallel-mip

62cd233

aliceb-nv reviewed Sep 24, 2025

View reviewed changes

nguidotti requested a review from chris-maes September 24, 2025 13:02

rg20 reviewed Sep 24, 2025

View reviewed changes

anandhkb added this to the 25.10 milestone Sep 24, 2025

rg20 approved these changes Sep 25, 2025

View reviewed changes

nguidotti added 5 commits September 25, 2025 16:41

added a wrapper class for omp_lock and omp_atomic.

d83148e

fixed incorrect convergence check (NVIDIA#417)

bbdee5e

set the default number of threads

7f59c73

fixed log spacing

424331b

Merge branch 'branch-25.10' into parallel-mip

3ba3414