Skip to content

Conversation

@nguidotti
Copy link
Contributor

@nguidotti nguidotti commented Sep 23, 2025

Description

This PR implement a parallel branch-and-bound procedure, which is split into two phases. In the first phase, the algorithm will greedily expand the search tree until a certain depth and then add the bottom nodes to a global heap. The parallel expansion is implemented using omp task.

In the second phase, some threads will explore the tree using best first search with plunging, i.e., they take the first node from the global heap and then explore the entire branch that starts on this node. Any unexplored node are insert into the heap. The remaining threads will perform deep dives in order to find feasible solutions. The solver keep a small heap contains the most promising nodes to perform the dives, which is keep in sync with the global heap.

This PR also

  • Replace the std::thread-based parallelization in the strong branching with OpenMP in order to use dynamic scheduling. This ensures that all threads have similar amount of work and improve parallel performance.
  • Fixed invalid memory access when trying to access the status of a fathomed node.
  • Replaced std::mutex with omp atomic whatever applicable.
  • Added dedicated classes dive_queue_t and search_tree_t to store the diving heap and the search tree, respectively.

This is an extension of #305.
Closes #320.
Closes #417.

Benchmark results (MIPLIB2017):

master branch (53d6e74)

Average Gap: 0.2174861712

This PR:

Average Gap: 0.1989485546

i.e., a 1.8% improvement. In terms of the geomean of the gap ratio, this is equal to 1.62x.

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

@nguidotti nguidotti requested a review from a team as a code owner September 23, 2025 14:01
@nguidotti nguidotti marked this pull request as draft September 23, 2025 14:01
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 23, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@nguidotti nguidotti added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Sep 23, 2025
@nguidotti
Copy link
Contributor Author

/ok to test 0e14fe2

@nguidotti nguidotti marked this pull request as ready for review September 23, 2025 19:18
Signed-off-by: nicolas <nguidotti@nvidia.com>
@nguidotti nguidotti changed the title [Draft] Multithreaded Branch-and-Bound Parallel Branch-and-Bound Sep 24, 2025
Copy link
Contributor

@aliceb-nv aliceb-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the great work and results Nicolas! Just a few minor nitpicks, otherwise looks good :) I'll let Chris review the algorithm side

: std::thread::hardware_concurrency()),
num_threads(std::thread::hardware_concurrency() - 1),
num_bfs_threads(std::min(num_threads, 4)),
num_diving_threads(num_threads - num_bfs_threads),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we guarantee one diving thread at least maybe? The following code might make this assumption (and diving even with thread contention might be better than none at all)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Also, if num_threads < 4, there will be have no diving threads. Instead, we can set something like this:

num_bfs_threads = std::min(1, num_threads / 4)
num_diving_threads = std::min(1, num_threads - num_bfs_threads)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will experiment a bit with the different configurations

branch_and_bound_settings.absolute_mip_gap_tol = context.settings.tolerances.absolute_mip_gap;
branch_and_bound_settings.relative_mip_gap_tol = context.settings.tolerances.relative_mip_gap;
branch_and_bound_settings.integer_tol = context.settings.tolerances.integrality_tolerance;
branch_and_bound_settings.num_threads = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we experimented with this parameter? Submip tends to yield good solutions, it might perform better with a few more threads allocated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. It was set to 1 to avoid oversubscription of the threads. I can allocate a few more threads to submip and see how it behaves.

Copy link
Contributor Author

@nguidotti nguidotti Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I set to 1 best first thread and 1 diving thread. Later, we can experiment in allocating more threads to submip. @akifcorduk

@anandhkb anandhkb added this to the 25.10 milestone Sep 24, 2025
stack.push_front(second);
stack.push_front(first);

if (dive_queue_.size() < min_diving_queue_size_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have a comment here about why you are adding to the dive_queue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only place we add to the dive queue? I thought the BFS threads would also add to the dive queue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BFS also adds to the diving queue. It is around the line 856 in the explore_subtree routine.

UNSET = 6, // The status is not set
};

enum class mip_exploration_status_t {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting this apart. I'm still a little confused why you need the status TIME_LIMIT, NODE_LIMIT, and NUMERICAL here. Why can't this status just be RUNNING or COMPLETED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solver checks the (internal) status to determines when to stop (basically, when it is not RUNNING). The other status is to keep track of why we stopped the solver (it will be translated later to the mip_status_t).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we hit a TIME_LIMIT or NODE_LIMIT shouldn't any part of the code be able to tell that just by checking the time and the number of nodes?

I think there is probably an opportunity to simplify the code a bit here and not mix the return status with whether branch and bound is running or not.

FATHOMED = 3, // Node objective is greater than the upper bound
HAS_CHILDREN = 4, // Node has children to explore
NUMERICAL = 5, // Encountered numerical issue when solving the LP relaxation
TIME_LIMIT = 6 // Time out during the LP relaxation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is TIME_LIMIT needed here?

Copy link
Contributor

@chris-maes chris-maes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm removing the request for changes. But I'm still a bit fuzzy on how you handle nodes with numerical errors, in particular in the case where they are encountered by the diving thread.

I also a bit fuzzy on how the lower bound is handled in the ramp up phase versus normal BFS threads.

Copy link
Contributor

@chris-maes chris-maes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to remove request changes

@rg20
Copy link
Contributor

rg20 commented Oct 3, 2025

/merge

@rapids-bot rapids-bot bot merged commit 1e208da into NVIDIA:branch-25.10 Oct 3, 2025
173 of 174 checks passed
@nguidotti nguidotti deleted the parallel-mip branch October 9, 2025 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

6 participants