-
Notifications
You must be signed in to change notification settings - Fork 112
Fix out-of-bound access in compute_max_variable_violation.
#146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When [there are free variables](https://github.com/NVIDIA/cuopt/blob/63fbb6b22c8949798fffe8cb34ace85ad203f2bb/cpp/src/mip/problem/problem.cu#L1234), the lower/upper bounds and assignment arrays are not necessarily the same size as that the original problem. In that case, when computing max violation we're accessing out-of-bounds data. This is already covered by `bound_standardization_test`.
Actually I'm seeing that same pattern of OOB accesses in other places elsewhere, so I'm suspecting that there is a larger pattern of broken invariants with the model. I'm opening a bug to discuss that. |
akifcorduk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's happenning here is actually the opposite. assignment is resized to original problem before we return to the user, but the internal problem_ptr still has the sizes of the modified problem.
cpp/src/mip/solution/solution.cu
Outdated
| f_t solution_t<i_t, f_t>::compute_max_variable_violation() | ||
| { | ||
| const auto num_variables = view().assignment.size(); | ||
| assert(problem_ptr->variable_lower_bounds.size() >= num_variables); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The standard is to use cuopt_assert to convey the error message. There, we also have the control to enable and disable asserts easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think it is not possible to infer any asserts here. The size might be greater or lower. The reason is that, we might be eliminating some vars in presolve, or we might be adding some vars because of free vars.
|
/ok to test 4cb0d22 |
|
/ok to test 11eadc5 |
|
Looks like this is break on assert, perhaps @akifcorduk comment still holds after trivial presolve updated 233 constraints 2009 variables. Objective offset 0.000000 |
|
@akifcorduk what's the next course of action here ? |
|
Moving to 25.10 milestone |
|
I couldn't reproduce this issue. It might have been fixed by one of the PRs or this is specific to the custom environment that the OP is using. |
|
@akifcorduk Could this have already been fixed by one of the merged PRs? |
I just checked after updating (at f298994), the invalid memory accesses are still happening. |
|
What's expected of me here ? |
Nah, I just assigned it to you since you are the owner and added awaiting response for @akifcorduk to get back to your question. |
Ah ok, thanks. |
|
@legrosbuffle could you give me the instructions to reproduce this issue? Data, machine, compiler, settings etc. |
I thought you had a repro here: https://github.com/NVIDIA/cuopt/actions/runs/16776602702/job/47507156031?pr=258#step:10:2580 ? For the data, this simply happens in several of the existing unit tests (see the bug for details: #150. Compiler & settings: Unfortunately we're using a custom toolchain (based on clang clang + libc++) and the only machines with GPUs I have access to require using that toolchain, so I can't give you a usable command-line for repro. But I can reproduce in two different ways: with asserts on (which triggers bound cheking errors in the |
|
/ok to test 35450be |
|
@akifcorduk @legrosbuffle I fixed a merge conflict in the last commit, please revert if there any mistakes. |
|
I can't reproduce the issue in 25.10. Instead the code is failing later on different OOB. The bug in the latter is more obvious and the fix is here: 346 |
When there are free variables, the lower/upper bounds and assignment arrays are not necessarily the same size as that the original problem.
In that case, when computing max violation we're accessing out-of-bounds data.
This is already covered by
bound_standardization_test.I'm not quite sure about whether
assert(problem_ptr->variable_lower_bounds.size() >= num_variables);should actually be an equality (i.e., how/if the mapping from variables to bounds is handled). But at least this is strictly better than out-of-bound accesses.