Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMT internalize doesn't respect timeout #4192

Open
nunoplopes opened this issue May 2, 2020 · 11 comments
Open

SMT internalize doesn't respect timeout #4192

nunoplopes opened this issue May 2, 2020 · 11 comments
Assignees

Comments

@nunoplopes
Copy link
Collaborator

This doesn't come back after several minutes and memory goes up to GBs:
$ z3 -t:1000 timeout.smt2

I see smt::context::internalize_rec(expr*, bool) () getting called thousands of times, and see it frequently in smt::context::top_sort_expr.
AFAICT, the internalizer never checks the resource limits. I'm also wondering why it's taking so long with such a small formula.. The expressions have a lot of sharing, so if the internalizer traverses the expression like a tree, I can see why it would never finish. But does it need to? (I'm clueless about this code; just speculating)

timeout.zip

(this is the root cause of a crash in Alive2: https://web.ist.utl.pt/nuno.lopes/alive2/index.php?hash=1ede71ade7988ad1&test=Transforms%2FLoopVectorize%2FX86%2Funroll_selection.ll)

@NikolajBjorner
Copy link
Contributor

  • the internalizer traverses as a DAG, not tree.
  • it is overall unsafe to throw an exception during push, which invokes internalization so cancelation is disabled.

NikolajBjorner added a commit that referenced this issue May 2, 2020
@NikolajBjorner
Copy link
Contributor

@wintersteiger - seems most of time is within bit-blasting and then internalization for theory_fpa (within quantifier instantiation).

@nunoplopes
Copy link
Collaborator Author

Well, I have no clue how to fix this.. I've near zero knowledge about internalization.
You think there's no way of checking resources during internalization? I understand it might not be safe to throw an exception in the middle, but maybe it could switch into a mode that just creates constants for each expression such that the DAG traversal is cut short?

About FPA bit blasting I've no clue. The example has a few FPA operations, but not that many (< 40).

hgvk94 pushed a commit to hgvk94/z3 that referenced this issue May 7, 2020
@rainoftime
Copy link
Contributor

rainoftime commented May 8, 2020

Maybe related issue: #3864

@nunoplopes
Copy link
Collaborator Author

I investigated this thing a bit. It seems to be that there's an issue in SMT internalizer when dealing with formulas combining multiple theories. In the example I posted, some expressions combine BVs, arrays & FPs.

The behavior for the SMT internalizer is to do a top-sort of the expression to be internalized, and then iterate the sub-expressions top-down and internalize them one-by-one. The problem is that if the children also have multiple theories, the same top-sort & deep internalization behavior quicks in.
This results in a quadratic behavior, as each sub-expression contains most of the other sub-expressions and does this deep internalization for each sub-expressions. Leaf expressions are attempted to be internalized a quadratic number of times.
The fix requires a different internalization algorithm that avoids doing deep internalization of sub-expressions; it should do that only on the root.
@NikolajBjorner does this makes sense? (I don't know exactly how to fix it, though)

NikolajBjorner added a commit that referenced this issue May 31, 2020
Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>
@NikolajBjorner
Copy link
Contributor

The code is supposed to internalize expressions bottom-up. The top-sort is only applied to non-internalized expressions, so once they are internalized they wouldn't participate.
I added the throttle for internalize to the top-sort selection so to not produce something that would then later fail the filter for internalize_rec, but overall the idea is that non-internalized expressions are visited only once.

@nunoplopes
Copy link
Collaborator Author

You're very right, I misunderstood the code. It looks good.
I tried changing the colors from dense to sparse and I got a 4x speedup on this example. Profiling showed that most of the time was spent in memset'ing the color vectors (as AST ids quickly go over 1 million). By changing to sparse representation, smt::relevancy_propagator_imp::add_watch becomes the bottleneck.

So a dense representation makes sense for QF formulas I guess, while a sparse one is good for quantifier instantiations as AST ids are high then and zeroing out the vector becomes expensive.
On the other hand, the fundamental problem is probably elsewhere: why in the world is just a few quantifier instantiations invoking so many internalization calls? If they were batched the cost of top-sort would go away.

@nunoplopes
Copy link
Collaborator Author

An example:

    void theory_bv::process_args(app * n) {
        for (expr* arg : *n) {
            ctx.internalize(arg, false);
        }
    }

What if the internalizer had an API to process multiple arguments at once? At least we could batch all arguments of functions in one go. Would reduce calls to internalizer by half at least.

@NikolajBjorner
Copy link
Contributor

quantifier instantiation causes FP internalization which creates side constraints that are then internalized.

@nunoplopes
Copy link
Collaborator Author

I run with rewriter.hi_fp_unspecified=true so I don't get those huge side constraints.
It seems it's just that the encoding of fp.mul & fp.div creates big formulas. If I replace those with fp.add it becomes several times faster.

Nevertheless, my profiler still blames memset() in internalizer's set_color. Almost half of the time is spent there.

@wintersteiger
Copy link
Contributor

Yeah, there are lots of fp.mul and fp.div which create very complicated circuits and side constraints that then get bit-blasted. Then, multiply that by the number of quantifier instantiations...

You can get rid of the floats up front, e.g. via (check-sat-using (then simplify fpa2bv simplify smt)); fpa2bv is quick. I didn't have the patience to wait for smt to finish though. But, at least there's one theory less to debug. Also, after changing all the forall into exists, I can interrupt it at random times in the debugger and whenever I do so, it's in the internalizer. I don't have any good ideas for improving that situation, except to tell you not to quantify over large bit-vectors (or floats) :-)

Re: limits: Yeah, I think we should add resource checks to the internalizer, sometimes it can take a significant amount of time and users may have very tight limits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants