-
-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tools to test and debug JuMP models #3664
Comments
I'm going to point to jump-dev/HiGHS.jl#223 as well. I think we need some way to generically assess solution quality, because solvers lie to us. cc @sstroemer |
I've updated the description of this issue with my notes about a JuMP linter. |
cc @Robbybp: this is the issue I was talking about this morning |
Thanks for the great summary - just wanted to say that I've some (the ones that relate to LPs) of those points partially implemented - but only based on the LP matrix data struct. If this isn't already done anyways (since you're always so fast... 😅) when I return in Oct I'll clean it up and link it here! |
A long time ago, in a galaxy far, far away I started sketching this:https://github.com/joaquimg/FeasibilityOptInterface.jl/tree/jg/first/src The main feedback is to try using Dualization.jl to check dual stuff. Might be nice checking complementarity constraints as well |
I don't know how this would be done, but we can at least have a manual/tutorial section telling the user about: Clearing the solver after a failed (sometimes infeasible) optimization. |
Obligatory link to where I automatically try to reset the optimizer in SDDP |
I'll also leave a link to this section of the TimeFold docs, which discuss testing: |
This tooling should also have some way of tracking/profiling where JuMP spends its time: #3910 Pyomo has a built in profiler: see pg 129 of https://www.osti.gov/servlets/purl/1771935 The other place to look for inspiration is cvxpy, which prints a log of "compiling to x" and the time, etc. |
Despite making it easy for users to formulate and solve a wide range of optimization problems, JuMP provides little support for the users who make mistakes, or tools for advanced users to debug problematic models. Moreover, in our experience the majority of (expensive) human programmer time is spent, not in formulating or solving a model, but in the debugging and testing stage of development.
Common questions that developers ask us during the debugging and testing stage of development include:
The goal of this project is to develop an automated analysis framework that can identify common problems to reduce the time spent by developers in the debugging and testing phase. There are a range of problems which are trivial to identify and fix. We suggest that the correct approach is not to fix the problems automatically (as the AMPL presolve does), but to educate the user on how to write better models.
Ideas
The deliverable for this issue is a Julia package that exposes an API for linting JuMP and MathOptInterface models. See below for a set of initial ideas.
Summary statistics
Print summary statistics of the model such as the number of variables and constraints, whether bridging is used, and a hash of the problem data.
Motivation: Simple statistics are often helpful to give an idea of the model. If the number of variables or constraints is very large, it may be difficult to solve, and if the number of variables or constraints does not match what the user expected, they may have made a mistake in the formulation.
Prior art: Most solvers provide this information. It is trivial to implement.
Coefficient analysis
For each constraint type, print statistics such as the number of constraints, the range of left- and right-hand side coefficients, and the density of the coefficients (number of terms / number of variables in the model), and whether the function is convex or non-convex.
Motivation: Poorly scaled problems have coefficients that vary by many orders of magnitude. Poor problem scaling is the most common cause of numerical problems as many common solution algorithms include a decomposition of the constraint matrix whose stability depends on the condition number of that matrix. Poorly scaled problems are usually an indication of user-error, for example, using the wrong units so a coefficient is off by a factor of 10^N. Once problematic coefficients are identified, they can usually be fixed by scaling the variables and input coefficients.
Prior art: Some solvers provide this information. Oscar has implemented it before in SDDP.jl: https://github.com/odow/SDDP.jl/blob/9e88c6ba2a7dacc0f842b9be3f8073acfd55073e/src/print.jl#L191-L384
Degeneracy
Identify constraints in the problem that are linearly dependent.
Motivation: Most solution algorithms require a linearly independent set of constraints. Because linearly dependent constraints are commonly provided by users, most solvers have detection routines for them. However, they are usually a sign that a model’s formulation can be improved.
Prior art: Alex Dowling has a paper and implementation https://github.com/adowling2/DegeneracyHunter.jl
Incidence analysis
Identify nonlinear models in which the Jacobian is not full rank. This problem is closely related to the degeneracy problem.
Motivation: Under- or over-constrained systems of nonlinear equations cause solvers trouble, and they are typically a sign of user-error since physical systems are typically not over- or under-constrained.
Prior art: Robby Parker has an implementation of this for Pyomo https://pyomo.readthedocs.io/en/stable/contributed_packages/incidence/index.html, and an in-development version of the code for JuMP.
Bounds given as constraints
Identify constraints of the form
@constraint(model, l <= x <= u)
. These should instead be specified as@variable(model, l <= x <= u)
.Motivation: The former adds a new linear constraint row to the problem. The latter sets a variable’s bounds. Solvers typically have specialised support for variable bounds. If the solver doesn’t have specialised support for variable bounds, JuMP will reformulate into the constraint version.
Prior art: Most solvers have a presolve that does this, but it’s a sign of user-error that could be improved.
Variables that do not appear in constraints
Variables that do not appear in constraints are generally a sign of user-error. They might have forgotten to include the variable, or they could make the model more performant by omitting the variable in the first place.
Starting point analysis
Starting points which violate domain restrictions like
log(x)
whenx = 0
to start are common. Therefore, we should analyse the feasibility of a given starting point, and potentially that of the first- and second-derivatives as well.Domain analysis
Nonlinear programs often include terms like
log(x)
wherex >= 0
. This is a common cause of domain violations because the solver may tryx = 0
. It’s usually better to add an explicit lower bound onx
ofx >= 0.01
.Convexity analysis
Proving convexity is difficult. But we could test feasible trial points for a proof of nonconvexity. A full disciplined convex programming package is out-of-scope for this project.
Prior art: CVXPY has an implementation: https://github.com/cvxpy/cvxpyanalyzer/blob/master/analyzer/convexity_checker.py
Irreducible infeasible subsystem
If the problem is infeasible we could try and find a reduced subsystem that is still infeasible. This does not need to be the minimal subsystem (a notoriously difficult problem to solve), even a smaller system can be helpful.
Prior art: Python-MIP includes an implementation:
https://github.com/coin-or/python-mip/blob/6044fc8f0414d71430e94b8a08573b695dc35b5a/mip/conflict.py#L15
Dual feasibility report
We have
primal_feasibility_report
, but we need a way to verify the feasibility of dual solutionsThis could also include a way to check whether the primal and dual objectives match, and whether
objective_value(model)
matchesvalue(objective_function(model))
. See jump-dev/HiGHS.jl#223Hessian analysis
See jump-dev/MathOptInterface.jl#2527
We should provide a tool that determines the density of the Hessian, and whether user-level reformulations could improve performance.
MIP solution refiner
See https://youtu.be/rKcdF4Fgl-g?feature=shared&t=2535
The text was updated successfully, but these errors were encountered: