SMT internalize doesn't respect timeout #4192

nunoplopes · 2020-05-02T16:26:24Z

This doesn't come back after several minutes and memory goes up to GBs:
$ z3 -t:1000 timeout.smt2

I see smt::context::internalize_rec(expr*, bool) () getting called thousands of times, and see it frequently in smt::context::top_sort_expr.
AFAICT, the internalizer never checks the resource limits. I'm also wondering why it's taking so long with such a small formula.. The expressions have a lot of sharing, so if the internalizer traverses the expression like a tree, I can see why it would never finish. But does it need to? (I'm clueless about this code; just speculating)

timeout.zip

(this is the root cause of a crash in Alive2: https://web.ist.utl.pt/nuno.lopes/alive2/index.php?hash=1ede71ade7988ad1&test=Transforms%2FLoopVectorize%2FX86%2Funroll_selection.ll)

The text was updated successfully, but these errors were encountered:

NikolajBjorner · 2020-05-02T17:28:12Z

the internalizer traverses as a DAG, not tree.
it is overall unsafe to throw an exception during push, which invokes internalization so cancelation is disabled.

NikolajBjorner · 2020-05-02T18:04:47Z

@wintersteiger - seems most of time is within bit-blasting and then internalization for theory_fpa (within quantifier instantiation).

nunoplopes · 2020-05-06T08:13:10Z

Well, I have no clue how to fix this.. I've near zero knowledge about internalization.
You think there's no way of checking resources during internalization? I understand it might not be safe to throw an exception in the middle, but maybe it could switch into a mode that just creates constants for each expression such that the DAG traversal is cut short?

About FPA bit blasting I've no clue. The example has a few FPA operations, but not that many (< 40).

rainoftime · 2020-05-08T08:56:07Z

Maybe related issue: #3864

nunoplopes · 2020-05-31T12:56:17Z

I investigated this thing a bit. It seems to be that there's an issue in SMT internalizer when dealing with formulas combining multiple theories. In the example I posted, some expressions combine BVs, arrays & FPs.

The behavior for the SMT internalizer is to do a top-sort of the expression to be internalized, and then iterate the sub-expressions top-down and internalize them one-by-one. The problem is that if the children also have multiple theories, the same top-sort & deep internalization behavior quicks in.
This results in a quadratic behavior, as each sub-expression contains most of the other sub-expressions and does this deep internalization for each sub-expressions. Leaf expressions are attempted to be internalized a quadratic number of times.
The fix requires a different internalization algorithm that avoids doing deep internalization of sub-expressions; it should do that only on the root.
@NikolajBjorner does this makes sense? (I don't know exactly how to fix it, though)

Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>

NikolajBjorner · 2020-05-31T20:37:34Z

The code is supposed to internalize expressions bottom-up. The top-sort is only applied to non-internalized expressions, so once they are internalized they wouldn't participate.
I added the throttle for internalize to the top-sort selection so to not produce something that would then later fail the filter for internalize_rec, but overall the idea is that non-internalized expressions are visited only once.

nunoplopes · 2020-06-01T15:29:46Z

You're very right, I misunderstood the code. It looks good.
I tried changing the colors from dense to sparse and I got a 4x speedup on this example. Profiling showed that most of the time was spent in memset'ing the color vectors (as AST ids quickly go over 1 million). By changing to sparse representation, smt::relevancy_propagator_imp::add_watch becomes the bottleneck.

So a dense representation makes sense for QF formulas I guess, while a sparse one is good for quantifier instantiations as AST ids are high then and zeroing out the vector becomes expensive.
On the other hand, the fundamental problem is probably elsewhere: why in the world is just a few quantifier instantiations invoking so many internalization calls? If they were batched the cost of top-sort would go away.

nunoplopes · 2020-06-01T15:50:52Z

An example:

    void theory_bv::process_args(app * n) {
        for (expr* arg : *n) {
            ctx.internalize(arg, false);
        }
    }

What if the internalizer had an API to process multiple arguments at once? At least we could batch all arguments of functions in one go. Would reduce calls to internalizer by half at least.

NikolajBjorner · 2020-06-01T16:07:49Z

quantifier instantiation causes FP internalization which creates side constraints that are then internalized.

nunoplopes · 2020-06-01T17:11:08Z

I run with rewriter.hi_fp_unspecified=true so I don't get those huge side constraints.
It seems it's just that the encoding of fp.mul & fp.div creates big formulas. If I replace those with fp.add it becomes several times faster.

Nevertheless, my profiler still blames memset() in internalizer's set_color. Almost half of the time is spent there.

wintersteiger · 2020-07-08T19:25:45Z

Yeah, there are lots of fp.mul and fp.div which create very complicated circuits and side constraints that then get bit-blasted. Then, multiply that by the number of quantifier instantiations...

You can get rid of the floats up front, e.g. via (check-sat-using (then simplify fpa2bv simplify smt)); fpa2bv is quick. I didn't have the patience to wait for smt to finish though. But, at least there's one theory less to debug. Also, after changing all the forall into exists, I can interrupt it at random times in the debugger and whenever I do so, it's in the internalizer. I don't have any good ideas for improving that situation, except to tell you not to quantify over large bit-vectors (or floats) :-)

Re: limits: Yeah, I think we should add resource checks to the internalizer, sometimes it can take a significant amount of time and users may have very tight limits.

NikolajBjorner added a commit that referenced this issue May 2, 2020

micro tuning for #4192

8be266c

NikolajBjorner assigned nunoplopes and wintersteiger May 6, 2020

hgvk94 pushed a commit to hgvk94/z3 that referenced this issue May 7, 2020

micro tuning for Z3Prover#4192

f1df619

NikolajBjorner added a commit that referenced this issue May 31, 2020

block deep based on condition for internalization #4192

c424165

Signed-off-by: Nikolaj Bjorner <nbjorner@microsoft.com>

nunoplopes mentioned this issue Jun 1, 2020

Add context::internalize() API that takes multiple expressions at once #4488

Merged

wintersteiger mentioned this issue Jul 8, 2020

Timeout mechanism fails for QF_FP formula #3864

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SMT internalize doesn't respect timeout #4192

SMT internalize doesn't respect timeout #4192

nunoplopes commented May 2, 2020

NikolajBjorner commented May 2, 2020

NikolajBjorner commented May 2, 2020

nunoplopes commented May 6, 2020

rainoftime commented May 8, 2020 •

edited

Loading

nunoplopes commented May 31, 2020

NikolajBjorner commented May 31, 2020

nunoplopes commented Jun 1, 2020

nunoplopes commented Jun 1, 2020

NikolajBjorner commented Jun 1, 2020

nunoplopes commented Jun 1, 2020

wintersteiger commented Jul 8, 2020

SMT internalize doesn't respect timeout #4192

SMT internalize doesn't respect timeout #4192

Comments

nunoplopes commented May 2, 2020

NikolajBjorner commented May 2, 2020

NikolajBjorner commented May 2, 2020

nunoplopes commented May 6, 2020

rainoftime commented May 8, 2020 • edited Loading

nunoplopes commented May 31, 2020

NikolajBjorner commented May 31, 2020

nunoplopes commented Jun 1, 2020

nunoplopes commented Jun 1, 2020

NikolajBjorner commented Jun 1, 2020

nunoplopes commented Jun 1, 2020

wintersteiger commented Jul 8, 2020

rainoftime commented May 8, 2020 •

edited

Loading