-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression in main
after an optimal solution already found
#4268
Comments
I just ran it under Async Profiler and have some more results. During the first gap (between the optimal solve and
During the first and second gaps, a different thread (the
The possible thread identities are just based off of the time they stop running, with the first thread stopping as soon as No other threads are active after the optimal solution is found. I can provide a model that causes this, but I think it's pretty widespread. |
Can you send us the model ? |
Here's the model I used: model.zip I did some more tests that might be a clue to what's happening:
I ran a t-test between two measures: reported walltime vs first known optimal solution (empty objective bounds) time, and on a set of 1000 runs before and after the changes. For before, it averaged to a 0.14s difference, with both measures having a standard deviation of ~1.8s. For after, it averaged a 13.2s difference (!!!), with reported walltime having a standard deviation of 0.37s and first known optimal solution time having a standard deviation of 2.33s. Finally, the difference between first known optimal solution for before and after was +1.25s, but this isn't as big of an issue. |
I ran it on main
and the final stats
|
That result makes it seem like there is a subsolver that I'm excluding that is causing the difference. I'm only using I can retest with EDIT: Full 13-minute log that had a solution in 22 seconds:
|
I just pushed the code I ran the test with.
Laurent Perron | Operations Research | ***@***.*** | (33) 1 42 68 53
00
Le dim. 16 juin 2024 à 12:36, Mitchell Skaggs ***@***.***> a
écrit :
… That result makes it seem like there is a subsolver that I'm excluding
that is causing the difference. I'm only using reduced_costs, max_lp, and
LNS workers because those are the only ones that appear "productive" during
the solve, and prior tests showed adding any other single subsolver didn't
improve times. Is there a subsolver that sticks out as the culprit? One
that would perform whatever work is delaying the solve completion during
the solve itself?
I can retest with quick_restart_no_lp, pseudo_costs, and objective_shaving
tomorrow since those are at least present in your log. Maybe including one
of those prevents a "backlog" of stuff to do at the end of the solve.
—
Reply to this email directly, view it on GitHub
<#4268 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACUPL3LNC7TRCBZGGX33JRLZHVTELAVCNFSM6AAAAABJDNM4WCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZRGQYTKOJQG4>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
I have a revelation to bring you: the entire issue disappears when I don't register a With default settings, 20 threads, |
I fixed it. It is a duplicate of #4376 |
What version of OR-Tools and what language are you using?
Version: main
Language: Java
Which solver are you using (e.g. CP-SAT, Routing Solver, GLOP, BOP, Gurobi)
CP-SAT
What operating system (Linux, Windows, ...) and version?
Linux (kernel 6.8.12)
I noticed a recent performance regression and bisected it to this commit: 766ada1
It only occurs between the end of solve and the final result being printed. Example:
I checked out the commit and added additional logging before and after the calls to
subsolvers[i].reset();
andLogFinalStatistics(shared);
which indicates they complete quickly and are not the cause. Settinglog_search_progress = false
andlog_subsolver_statistics = false
doesn't appear to do anything either.These gaps end up doubling the solve time after the optimal solution has been printed, which leads me to believe this isn't a minor change in the search algorithm randomization 😅.
A full log showing a 7.5s solve turn into a 19.1s solve
The text was updated successfully, but these errors were encountered: