Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the status of the optimization stuff in the unstable branch? #211

Closed
asmeurer opened this issue Sep 4, 2015 · 14 comments
Closed
Assignees
Labels

Comments

@asmeurer
Copy link

asmeurer commented Sep 4, 2015

Sorry for opening an issue, but I couldn't find any other way to contact the development team (there is a "contact us" link at http://research.microsoft.com/en-us/um/redmond/projects/z3/, but it just downloads some document about Windows Small Business Server 2003 Service Pack 1).

What is the status on the optimization stuff in the unstable branch? Is there a timeline on when it will make it to a stable release?

If it won't be there soon, is there a way to do optimization in the latest stable release with reasonable performance (I know that at worst I can set up a SAT problem and bisect it).

The kind of problem that I wish to solve is a pseudo-boolean problem. I have several SAT clauses and a linear expression with integer coefficients on some of the SAT variables which I wish to minimize.

Specifically, I'm investigating Z3 for use in conda for dependency resolution. Conda already uses a SAT solver (picosat) to do dependency resolution, but the pseudo-boolean constraints are converted to SAT using BDDs (see "Translating Pseudo-Boolean Constraints into SAT", Eén and Söensson 2006). The SAT clauses represent things like "package A depends on package B", and the pseudo-boolean formula that is minimized lets the solver find a solution that has maximal versions installed. The formula is then minimized using bisection.

The problem with conda's approach is that it is too slow for certain problems (it can take minutes to get a solution). We've looked at using an ILP solver, but there don't seem to be any good free ones for Python. I tried using Z3's optimization stuff (see https://github.com/asmeurer/conda/blob/clauses/Z3Opt.ipynb) to solve a problem that takes conda's current solver about 15 minutes to solve, and it solved it in a few seconds, which is plenty fast, and I haven't even made some optimizations yet (e.g., there are some carnality constrants that are represented in CNF, but could probably be used by Z3 more efficiently as actual cardinality constraints). (side note, why does it take 14 seconds to convert the clauses to Z3 Python objects?)

conda's current solver is a hassle to users when it hits a hard problem, and is preventing us from doing some things that we'd like to do with the solver. Z3 seems like it's a good fit, since it has Python bindings, works in Python 2 and 3 and all platforms, and is MIT open source.

@NikolajBjorner
Copy link
Contributor

Could you possibly use the unstable branch?
We will update the master branch at some point, in fact more frequently than in the past, but have to balance some pace of fixes to issues that get discovered and feature additions.

Did you try to start python with the -O option? By default it uses debug = True, which triggers type checks which take some overhead.

@asmeurer
Copy link
Author

asmeurer commented Sep 4, 2015

We could. I'm just worried about how, well, stable it is. Stability is a pretty high priority for conda, since if breaks (say, crashes due to some bug in the solver), there is no way for users use conda to update itself to a newer version with a fix: they would have to completely reinstall everything.

API stability is also somewhat important, although it's usually pretty easy to workaround API changes.

I didn't use python -O or do any profiling yet. As I said, I already know that there are optimizations I can make (like cardinality constraints). In that notebook, I was just checking a base level of Z3's performance, to see if it was even worth looking at. I haven't had a chance to play with it beyond that yet. Maybe I should see how fast a naive bisection using the stable release of Z3 would be.

@NikolajBjorner
Copy link
Contributor

Maybe I should see how fast a naive bisection using the stable release of Z3 would be.

It all depends on how your constraints end up being solved. WIth the optimization features, I added several transformations and features to support PB constraints. There is both pre-processing and a dedicated PB theory. The PB theory compiles cardinality costraints to circuits, but only when the constraints from those inequalities are used in several conflicts. It contains some very modest cutting plane reasoning as well. But many constraints can also directly be converted into SAT without overhead and the pre-processor does some efforts to detect this and determine when this is the case.
Finally, if the optimization objective is over 0-1 variables or over Booleans, then the back-end engine is based on max-sat.
You can run Z3 with "verbose" mode set on (from python you can call set_param(verbose=10) to have Z3 display some selected information what it is doing. You can also extract statistics after the satsifiability call to get a profile of which steps it did. It requires some idea of the code modules to deduce anything from it, e.g., if it says it is using anything with "arith", then you are using the simplex solver, on the other hand if it mentions ternary clauses, then it is using pure SAT solving (and max-sat).

@asmeurer
Copy link
Author

asmeurer commented Sep 8, 2015

Thanks for the info. I've added a statistics call to the notebook (I don't see "arith" anywhere). I also captured the verbose output. My cardinality constraints are still in CNF (they are all "zero or one" constraints, i.e., "no more than one version for each package should be installed"). I also have a cardinality condition that I want to optimize lexicographically ("after maximizing versions, minimize the number of packages that are installed"), which I haven't played with here yet either. I guess I'll report back if either of those end up changing what is in the statistics.

I guess you mentioned max-sat because it's in the stable z3. Is there some way to efficiently represent pseudo-boolean optimization as a max-sat problem?

@NikolajBjorner
Copy link
Contributor

Is there some way to efficiently represent pseudo-boolean optimization as a max-sat problem?
In Z3, the objective (maximize (+ (* 2 p) (* 3 q))) where p, q are 0-1 variables gets translated into weighted maxsat problem where p and q are soft constraints with weights 2, respectively 3.
This is happening for your problem. I can see this from the statistics:
The line:

  :maxres-cores               75

means that the weighted maxsat solver got invoked and used 75 cores
before it found an optimum.
The maxsat code in Z3 uses some elements from the following paper:
http://ijcai.org/papers15/Papers/IJCAI15-041.pdf
(in your case seems mostly to just use cores).

I suspect any simple algorithm will not be as efficient, and this could also explain why you see better performance on your problem that with other approaches that don't use up-to-date Maxsat algorithms.

@asmeurer
Copy link
Author

asmeurer commented Sep 8, 2015

Is weighted max-sat only in the unstable branch?

@NikolajBjorner
Copy link
Contributor

so far yes. We will work on preparing a master update if we can stabilize regressions in the coming weeks.

@asmeurer
Copy link
Author

OK, now I'm hitting a choke point. I tried adding a cardinality condition to minimize

z3_pkg_eq = sum(IntSort().cast(vars[w[i]]) for i in w)

(there are about 2000 variables). I want to minimize this after the other equation is minimized, so I used

o = Optimize()
for z3_clause in z3_clauses:
    o.add(z3_clause)
o.minimize(z3_ver_eq)
o.minimize(z3_pkg_eq)
o.check()
m = o.model()

If I understand the docs correctly lexicographic is the default.

This causes the solver to be very slow. For my particular problem, it hasn't finished. I ran a simpler problem and it took 30 minutes (if I remember correctly) (this is a problem that my existing solver can do in a few seconds). Any suggestions on how to fix this? I can get better performance by having Z3 minimize the first equation and minimizing the second using my current algorithm (convert the optimal value of the first equation to SAT using BDD and then iterate all solutions from the sat solver with a limit of 1000 then falls back to using pseudo-boolean compiled to SAT using sorter networks to minimize the second), but this is not ideal, because that would still be slower than what I would like to achieve.

How does Z3 handle multiple equations? Does it combine them into one equation or are they solved one by one?

I ran this on the latest unstable branch (646dcfb).

@NikolajBjorner
Copy link
Contributor

NB. I need to revise this answer. It is not accurate at all.
I will soon delete this answer, after adding the correction below.

Z3 uniformly uses a less efficient SAT solver for the multi-objective case. It also creates a Pseudo-Boolean constraint after solving the first objective. So this could explain the big difference. This constraint also influences solving time for the second problem. I used to implement a much weaker version of lexicographic composition which simply hardened the soft constraints from the first objective. This approach would be faster, but not produce optimal values for the second objective.
So the semantics is different the output depends heavily on which solution is found in the first round.

There are some better approaches, http://link.springer.com/article/10.1007%2Fs10472-011-9233-2, but I have not yet examined support for these, but good use-cases could help drive adapting better solutions.

You could also try to combine the two objectives into a single by multiplying coefficients of the first objective by the sum of the coefficients of the second.

@NikolajBjorner
Copy link
Contributor

For lex combinations of MaxSAT Z3 solves one problem at a time. It commits the last state from each round, which forces any solution to later round maxsat problems to have the same values for the previous rounds. This approach amounts to the one that shows better performance in http://link.springer.com/article/10.1007%2Fs10472-011-9233-2.

I now tuned the code for the special case of multiple maxsat problems,
so performance may have improved (the default code path introduces push/pop operations, which increases clause sizes uniformly). Otherwise, a sanity check in terms of a trace from your scenario may provide additional clues about how to tune.

@asmeurer
Copy link
Author

That seems to have helped. My test problem finished in 9 minutes (as opposed to several hours before; actually it's still running). I'm still hoping Z3 can do better. I'll try combining the objective into a single objective. I also haven't yet factored out my cardinality constraints (they're still implemented as direct cnf SAT clauses). I'll report back if I can make things significantly faster. The results do seem to be correct (at least at a glance), so that's good.

@asmeurer
Copy link
Author

Is there a high-level API for iterating all models? Stack Overflow recommends doing it manually, which is a bit complicated, even more so in this case, as you have to either check if the objective has changed or add a constraint for it not to (I'm not even sure what would be the most efficient there).

@NikolajBjorner
Copy link
Contributor

There isn't any direct support for extracting all models. There are specialized all-sat solvers, but Z3 does not incorporate any of these methods.

If your formula is ground, e.g, the set of atomic formulas is fixed. Then extracting a disjunction of satisfying assignments as subsets of atomic formulas corresponds roughly to extracting a set of minimal unsatisfiable cores of the negated formula (assume the negated formula is written in Negation normal form, so that the only literals you consider are in the NNF, so you don't consider negations of literals in the NNF unless they are already there).
The MARCO tool and successors (see http://sun.iwu.edu/~mliffito/marco/ and links) use nice incremental algorithms for extracting cores. There are some follow-on papers by Alessandro Previti, Joao Marques-Silva that contain additional insights into using these ideas.
The essence of MARCO makes for a nice example. It is listed below.
To use it for finding all models you would have to take F in negation normal form, extract the literals l1, l2, .., l_k. Then use this procedure to find all minimal cores of !F, l1, l2, .., l_k that include !F (e.g, the cores that only contain literals l_1, ..., l_k are not interesting because the conjunction of these literals is inconsistent.).

def main():
    x, y = Reals('x y')
    constraints = [x > 2, x < 1, x < 0, Or(x + y > 0, y < 0), Or(y >= 0, x >= 0), Or(y < 0, x < 0), Or(y > 0, x < 0)]
    csolver = SubsetSolver(constraints)
    msolver = MapSolver(n=csolver.n)
    for orig, lits in enumerate_sets(csolver, msolver):
        output = "%s %s" % (orig, lits)
        print(output)


def get_id(x):
    return Z3_get_ast_id(x.ctx.ref(),x.as_ast())


def MkOr(clause):
    if clause == []:
       return False
    else:
       return Or(clause)

class SubsetSolver:
   constraints = []
   n = 0
   s = Solver()
   varcache = {}
   idcache = {}

   def __init__(self, constraints):
       self.constraints = constraints
       self.n = len(constraints)
       for i in range(self.n):
           self.s.add(Implies(self.c_var(i), constraints[i]))

   def c_var(self, i):
       if i not in self.varcache:
          v = Bool(str(self.constraints[abs(i)]))
          self.idcache[get_id(v)] = abs(i)
          if i >= 0:
             self.varcache[i] = v
          else:
             self.varcache[i] = Not(v)
       return self.varcache[i]

   def check_subset(self, seed):
       assumptions = self.to_c_lits(seed)
       return (self.s.check(assumptions) == sat)

   def to_c_lits(self, seed):
       return [self.c_var(i) for i in seed]

   def complement(self, aset):
       return set(range(self.n)).difference(aset)

   def seed_from_core(self):
       core = self.s.unsat_core()
       return [self.idcache[get_id(x)] for x in core]

   def shrink(self, seed):
       current = set(seed)
       for i in seed:
          if i not in current:
             continue
          current.remove(i)
          if not self.check_subset(current):
             current = set(self.seed_from_core())
          else:
             current.add(i)
       return current

   def grow(self, seed):
       current = seed
       for i in self.complement(current):
           current.append(i)
           if not self.check_subset(current):
              current.pop()
       return current



class MapSolver:
   def __init__(self, n):
       """Initialization.
             Args:
            n: The number of constraints to map.
       """
       self.solver = Solver()
       self.n = n
       self.all_n = set(range(n))  # used in complement fairly frequently

   def next_seed(self):
       """Get the seed from the current model, if there is one.

            Returns:
            A seed as an array of 0-based constraint indexes.
       """
       if self.solver.check() == unsat:
            return None
       seed = self.all_n.copy()  # default to all True for "high bias"
       model = self.solver.model()
       for x in model:
           if is_false(model[x]):
              seed.remove(int(x.name()))
       return list(seed)

   def complement(self, aset):
       """Return the complement of a given set w.r.t. the set of mapped constraints."""
       return self.all_n.difference(aset)

   def block_down(self, frompoint):
       """Block down from a given set."""
       comp = self.complement(frompoint)
       self.solver.add( MkOr( [Bool(str(i)) for i in comp] ) )

   def block_up(self, frompoint):
       """Block up from a given set."""
       self.solver.add( MkOr( [Not(Bool(str(i))) for i in frompoint] ) )



def enumerate_sets(csolver, map):
    """Basic MUS/MCS enumeration, as a simple example."""
    while True:
        seed = map.next_seed()
        if seed is None:
           return
        if csolver.check_subset(seed):
           MSS = csolver.grow(seed)
           yield ("MSS", csolver.to_c_lits(MSS))
           map.block_down(MSS)
        else:
           MUS = csolver.shrink(seed)
           yield ("MUS", csolver.to_c_lits(MUS))
           map.block_up(MUS)

main()
'''

@NikolajBjorner
Copy link
Contributor

Closing: optimization has for a while now been merged with the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants