-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HolSmt: implement div and mod, fix proof replay, fix translation #1210
Conversation
The SMT-LIB term translator couldn't handle parallel definitions properly. This commit fixes that.
The previous workaround for ARITH_PROVE failing to prove some terms changed the `th_lemma_arith` proof handler to use COOPER_PROVE instead. However, COOPER_PROVE sometimes is too slow and appears to hang when replaying some Z3 proofs. This commit changes the `th_lemma_arith` handler to first try ARITH_PROVE which is much faster, and only if it fails, then try COOPER_PROVE.
SMT_NUM_ADD and SMT_NUM_SUB were meant to be used in place of INT_ADD and INT_NUM_SUB for optimization reasons. However, this causes a regression in proof replay in one of the self-tests. Since this regression is not straightforward to fix, these theorems are not being used yet. NUM_OF_INT was also added to the default set of theorems since it helps solve some `num`-related goals.
Unfortunately, these operators don't seem to be documented. However, from a quick reading of the Z3 source code, it seems that they seem to be similar to `div` and `mod`, but they seem to be used to indicate that Z3 can't prove that the divisor is non-zero.
The `th_lemma_arith` was running into several issues. The following changes were made: 1. The `generalize_ite` procedure was removed, since the arithmetic decision procedures can handle `if ... then ... else ...` now. It was causing replay failures because some terms were not provable unless the decision procedures could inspect the `if` bodies. 2. The arithmetic decision procedures don't understand `smtdiv` and `smtmod`; so before proving the term, we now rewrite it with the definitions of these functions. 3. A workaround was also implemented for some instances of the following issue: HOL-Theorem-Prover#1207 Namely, the integer decision procedures cannot decide some terms containing `ABS` and `Num`, so we rewrite them when possible.
This commit contains a workaround for the following issue: HOL-Theorem-Prover#1209
In some proof certificates, Z3 emits a `hypothesis` inference rule without discharging it later with a `lemma` rule. This causes some proofs to fail at the end of the proof replay procedure, because they end up containing an extra hypothesis. Fortunately, so far all such undischarged hypothesis rules were of the form `p = p`, so for now we can easily work around this issue by creating such theorems without adding an assumption. This workaround allows us to enable proof replay for 5 more self-tests.
Since these two decision procedures need to handle the same kind of terms (including smtdiv / smtmod rewrites), this commit merges them.
This commit adds some tactics which help prove some arithmetic proof steps that intLib.{ARITH,COOPER}_TAC can't deal with. These are basically some workarounds for: HOL-Theorem-Prover#1207 This allows us to enable proof replay for the remaining arithmetic test cases that didn't already have proof replay enabled, with the exception of the very last test case (because we can't replay nonlinear arithmetic proof steps). That said, I don't expect these workarounds to be very reliable. Ideally, it would be better to improve HOL4's linear arithmetic decision procedures.
Added one commit to update the HolSmt documentation regarding recent improvements. |
Can you put the Euclidean div and mod constants (perhaps call them |
Sure, but how would you like me to do that? Just add those definitions to For reference, I took a look at how
So, which of the above should I do? |
I think @mn200 simply meant that you can move the definitions of |
@binghe has it right; what you have already is a good starting point for future work, and it would be better to have that work happen in |
The previously-implemented workaround might have an issue where if Z3 introduces an hypothesis of the form `P = P` but then *doesn't* forget to discharge it, we might fail to replay the `lemma` proof rule because we avoided adding an assumption. Instead, this commit fixes the workaround so that we remove the extra hypothesis at the end of the proof replay procedure, i.e. only when we are sure that Z3 *did* forget to discharge the hypothesis.
Now that integerTheory contains the definitions for Euclidean div and mod, let's use that instead of redefining these operators in HolSmtTheory.
Now that HOL-Theorem-Prover#1209 is fixed, we can remove the workaround for it in HolSmt.
HOL4's integer arithmetic decision procedures cannot handle `quot` and `rem`, so proof steps containing these symbols were failing to replay. This commit rewrites goals containing these symbols to increase the chances of a successful replay. `thm_AUTO` was similarly improved, as it had the same limitation and was failing to prove some of the self-tests.
Added 7 more commits:
Note: I wasn't sure what to name the newly added definitions and theorems in Thanks! |
In case `rewrite` proof rules ever come to have indices, let's ignore them instead of erroring out.
Added one more commit which ignores indices in |
Added one more commit which partially reverts a previous one, which had added a change that was actually unnecessary. I realized this was unnecessary after thinking about it (and realizing it didn't make much sense), however I don't understand why I introduced that change in the first place, since I cannot reproduce those test failures and nothing relevant was changed in the mean time... |
It should not be necessary to rewrite these symbols with their definitions, because from the perspective of SMT solvers, these should just be functions like any others. Therefore, when replaying arithmetic proof steps, the proof handler should not need to understand what `quot` and `rem` are. This is a partial revert of the following commit: 6220a17 It's not clear to me why I introduced this change in the first place, since the tests do work fine without it. Also, there doesn't seem to be any performance regression.
Add support for parsing `declare-const` commands to the SMT-LIB parser.
The SMT-LIB file parser was ending up raising an end-of-stream exception unless the `(exit)` command was present in the SMT-LIB file, even though the `(exit)` command does not seem to be mandatory according to the SMT-LIB v2.6 standard.
Added 2 more commits, which fix issues when parsing SMT-LIB commands. This part of the parser is used only for importing SMT-LIB problems into HOL4, which is useful for testing HolSmt but is not the main use case of HolSmt. |
This commit fixes a parsing error when parsing arithmetic `th-lemma` inference rules with an `assign-bounds` parameter, i.e.: ((_ th-lemma arith assign-bounds ...) ...) The error was due to `assign-bounds` not being in the known term dictionary.
Added one more commit with a simple fix, this time fixing an error when parsing some Z3 proof certificates. |
Now that all known issues were fixed wrt. ARITH_PROVE not proving goals that COOPER_PROVE could, we can remove this workaround, thus making proof replay faster in some cases.
Thanks for all of this work! |
Hi!
This PR implements support for
DIV
,MOD
,$/
,$%
,quot
andrem
, i.e. all div and mod operators fornum
andint
. It also enables proof replay for all the self-tests that contained these operators. No changes were made outside HolSmt.In total, 78 more self-tests are now passing (with Z3 v4 proof replay). In fact, I believe all arithmetic test cases are now passing, including with proof replay, with the exception of the word tests (and one which I added and mention below).
The support for the div and mod operators were implemented using the following approach:
HolSmtTheory
calledsmtdiv
andsmtmod
, which implement the semantics of SMT-LIB's integer division and modulo operators.smtdiv
was defined based on$/
andsmtmod
based on$%
for simplicity.selftests.sml
which verifies thatsmtdiv
andsmtmod
exactly match the SMT-LIB definition.selftests.sml
).HolSmtTheory
which define HOL4's div and mod operators in terms ofsmtdiv
andsmtmod
(these are easier to prove). These theorems are now added to the proving context by default, so that SMT solvers can translate HOL4's div and mod operators into the equivalent SMT-LIB div and mod operators which they natively support.smtdiv
andsmtmod
.smtdiv
andsmtmod
(by rewriting with their definitions into equivalent HOL4$/
and$%
operations).This PR also implements the following:
let ... and ... in ...
, i.e. the one with parallel definitions, into the SMT-LIB language. This translation had never actually worked before, onlylet ... ; ... in ...
was working properly.0n = x * Num 0i
" fails unexpectedly #1203. Apparently,COOPER_PROVE
is much slower thanARITH_PROVE
in some cases, so now we try the latter before falling back to the former.NotFound
#1209.generalize_ite
procedure was removed, as it seems to be unnecessary and was causing some arithmetic proof steps to fail.smtdiv
andsmtmod
.th_lemma_arith
andrewrite
arithmetic proof handlers need the above fixes, these arithmetic proving procedures now use a common implementation.P = P
, so it's easy to discharge them ourselves.iff-false
Z3 proof inference rule.I'm happy with the results, although I was a bit surprised by how hard it was to fix proof replay for these tests. I was expecting Z3 to throw more curve-balls, but this time Z3 did not give me much trouble.
Instead, the major problems were the limitations in the HOL4 integer arithmetic decision procedures, especially #1207. I implemented some workarounds (i.e. I try to prove the goals using some additional tactics, including arithmetic simplifications) but I don't expect these to be reliable and in fact, if I change some theorems around a bit I do encounter some linear arithmetic steps which HOL4 cannot prove automatically even with my workarounds.
Anyway, I think this concludes my work for the near future, as the next issues to tackle are quite hard to solve.
Thanks!
cc @tjark