Skip to content

Conversation

@legrosbuffle
Copy link
Contributor

When there are free variables, the lower/upper bounds and assignment arrays are not necessarily the same size as that the original problem.

In that case, when computing max violation we're accessing out-of-bounds data.

This is already covered by bound_standardization_test.

I'm not quite sure about whether assert(problem_ptr->variable_lower_bounds.size() >= num_variables); should actually be an equality (i.e., how/if the mapping from variables to bounds is handled). But at least this is strictly better than out-of-bound accesses.

When [there are free variables](https://github.com/NVIDIA/cuopt/blob/63fbb6b22c8949798fffe8cb34ace85ad203f2bb/cpp/src/mip/problem/problem.cu#L1234),
the lower/upper bounds and assignment arrays are not necessarily the same size as that the original problem.

In that case, when computing max violation we're accessing out-of-bounds data.

This is already covered by `bound_standardization_test`.
@legrosbuffle legrosbuffle requested a review from a team as a code owner June 26, 2025 12:28
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jun 26, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@legrosbuffle
Copy link
Contributor Author

I'm not quite sure about whether assert(problem_ptr->variable_lower_bounds.size() >= num_variables); should actually be an equality (i.e., how/if the mapping from variables to bounds is handled). But at least this is strictly better than out-of-bound accesses.

Actually I'm seeing that same pattern of OOB accesses in other places elsewhere, so I'm suspecting that there is a larger pattern of broken invariants with the model. I'm opening a bug to discuss that.

@anandhkb anandhkb added this to the 25.08 milestone Jul 1, 2025
Copy link
Contributor

@akifcorduk akifcorduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's happenning here is actually the opposite. assignment is resized to original problem before we return to the user, but the internal problem_ptr still has the sizes of the modified problem.

f_t solution_t<i_t, f_t>::compute_max_variable_violation()
{
const auto num_variables = view().assignment.size();
assert(problem_ptr->variable_lower_bounds.size() >= num_variables);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard is to use cuopt_assert to convey the error message. There, we also have the control to enable and disable asserts easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think it is not possible to infer any asserts here. The size might be greater or lower. The reason is that, we might be eliminating some vars in presolve, or we might be adding some vars because of free vars.

@akifcorduk akifcorduk added bug Something isn't working non-breaking Introduces a non-breaking change labels Jul 4, 2025
@rgsl888prabhu
Copy link
Collaborator

/ok to test 4cb0d22

@rgsl888prabhu
Copy link
Collaborator

/ok to test 11eadc5

@tmckayus
Copy link
Contributor

Looks like this is break on assert, perhaps @akifcorduk comment still holds
conda_cpp_tests breaking consistently with this:

after trivial presolve updated 233 constraints 2009 variables. Objective offset 0.000000
ELIM_VAR_REMAP_TEST: /tmp/conda-bld-output/bld/rattler-build_libmps-parser/work/cpp/src/mip/solution/solution.cu:545: f_t cuopt::linear_programming::detail::solution_t<i_t, f_t>::compute_max_variable_violation() [with i_t = int; f_t = double]: Assertion `problem_ptr->variable_lower_bounds.size() >= num_variables' failed.

@rgsl888prabhu
Copy link
Collaborator

@akifcorduk what's the next course of action here ?

@tmckayus
Copy link
Contributor

Moving to 25.10 milestone

@tmckayus tmckayus modified the milestones: 25.08, 25.10 Jul 31, 2025
@akifcorduk
Copy link
Contributor

I couldn't reproduce this issue. It might have been fixed by one of the PRs or this is specific to the custom environment that the OP is using.

@anandhkb
Copy link
Contributor

anandhkb commented Aug 8, 2025

@akifcorduk Could this have already been fixed by one of the merged PRs?

@legrosbuffle
Copy link
Contributor Author

@akifcorduk Could this have already been fixed by one of the merged PRs?

I just checked after updating (at f298994), the invalid memory accesses are still happening.

@rgsl888prabhu rgsl888prabhu added the awaiting response This expects a response from maintainer or contributor depending on who requested in last comment. label Aug 18, 2025
@legrosbuffle
Copy link
Contributor Author

What's expected of me here ?

@legrosbuffle legrosbuffle removed their assignment Aug 18, 2025
@rgsl888prabhu
Copy link
Collaborator

What's expected of me here?

Nah, I just assigned it to you since you are the owner and added awaiting response for @akifcorduk to get back to your question.

@legrosbuffle
Copy link
Contributor Author

What's expected of me here?

Nah, I just assigned it to you since you are the owner and added awaiting response for @akifcorduk to get back to your question.

Ah ok, thanks.

@akifcorduk
Copy link
Contributor

@legrosbuffle could you give me the instructions to reproduce this issue? Data, machine, compiler, settings etc.

@legrosbuffle
Copy link
Contributor Author

@legrosbuffle could you give me the instructions to reproduce this issue? Data, machine, compiler, settings etc.

I thought you had a repro here: https://github.com/NVIDIA/cuopt/actions/runs/16776602702/job/47507156031?pr=258#step:10:2580 ?

For the data, this simply happens in several of the existing unit tests (see the bug for details: #150.

Compiler & settings: Unfortunately we're using a custom toolchain (based on clang clang + libc++) and the only machines with GPUs I have access to require using that toolchain, so I can't give you a usable command-line for repro. But I can reproduce in two different ways: with asserts on (which triggers bound cheking errors in the span), or when running with address sanitizer on . Note that this is not the first time that I'm triggering asserts that you don't seem to be able to reproduce (rapidsai/raft#2732 (comment)).

@rgsl888prabhu rgsl888prabhu changed the base branch from branch-25.08 to branch-25.10 August 22, 2025 15:09
@rgsl888prabhu
Copy link
Collaborator

/ok to test 35450be

@rgsl888prabhu
Copy link
Collaborator

@akifcorduk @legrosbuffle I fixed a merge conflict in the last commit, please revert if there any mistakes.

@legrosbuffle
Copy link
Contributor Author

I can't reproduce the issue in 25.10. Instead the code is failing later on different OOB. The bug in the latter is more obvious and the fix is here: 346

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting response This expects a response from maintainer or contributor depending on who requested in last comment. bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants