Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add polymorphism of smt v3 in smt v2.6 #53

Merged
merged 12 commits into from
Feb 3, 2021

Conversation

bobot
Copy link
Contributor

@bobot bobot commented Oct 16, 2020

Over the year adding polymorphism to the smt language have been proposed a lot of time. It will become standard in v3 as already discussed in #48. But for COLIBRI I needed to be able to parse polymorphism (since it knows little above ground term any encoding is problematic). Finally the addition is quite small since dolmen already handles it internally.

Currently the modificatio is as simple as possible:

  • In v2.6
  • Without a flag to restrict it

In order to solve that, and if you are interested, what is the right way since v3 should be mainly backward compatible with 2.6:

  • A copy of v2.6 as v3?
  • Just add a new langage but still share the code?

I don't plan to add a v3 parser now, first because v3 is still not formalised and because I believe it is be better to do it step by step.

@Gbury
Copy link
Owner

Gbury commented Oct 16, 2020

If I understand correctly, this is a compatible extension of smtlib v2.6, correct ?
In any case, i have a few remarks:

  • considering dolmen may be used to check new benchmarks for the smtlib, this feature would either need to be protected by a flag (or be only in alternative version for smtlib)
  • thinking about it, I think having the extension be a new smtlib version (we'd need to find a name though, :p), would be the nicest from the point of view of users of dolmen. It could be the default for file with the .smt2 extension, but if needed an explicit set of the input language to smtlibv2.6 would be all that is needed to ensure user that what is parsed is exactly smtlib2.6
  • I know that @OCamlPro-Coquera also worked on an extension of smtlib v2.6 to support polymorphism, called psmt2, it'd be nice to at least compare the two approaches, and ideally have them be the same thing (since psmt2 is planned to be included in dolmen at some point)

Finally, note that I recently added error messages to smtlib, though it seems I didn't correctly set up testing to fail on syntax change, I'll fix that in master (and then you'll have the fun of writing the error messages for the new cases (and potentially fixing some of the already existing ones)).

@Gbury
Copy link
Owner

Gbury commented Oct 16, 2020

Actually, the test works fine (just run make test), but the question of error messages can come after we decide where the extension will live.

@bobot
Copy link
Contributor Author

bobot commented Oct 17, 2020

I haven't checked if the SMTLIB V3 use the same choices than @OcamlPro-Coquera and others. All the proposition of extension choosed the (par (type variables) ...), I think in the same way. There are differences for the use of the AS I remember.

In fact I would propose to go full v3 even if things are not yet settled and very incomplete, we just need to have the understanding that it is completely experimental. So:

  • use .smt3 as extension
  • start just with the code of v2.6 since v3 will be as backward compatible as possible
  • add feature of v3 little by little
  • move the parser to the grammar defined in a futur v3 document when available.

Doing in this way will avoid one gigantic branch, allows to test part of v3 and makes a playground for the smtlib v3 proposal.

I can do the simple first part, add .smt3 extension and copy the smtlib2 directory into smtlib3 in this MR.

@ACoquereau
Copy link

Hi,
It's seems to me that the (par (type variables) ...) notation presented in the smt-lib3 draft can be compatible with the one we presented at smt2018.
If needed, some benchmarks in the psmt2 format are available here.

@Gbury
Copy link
Owner

Gbury commented Oct 21, 2020

@bobot would the psmt2 format work for you (until smtlib3 is released) ?

@bobot
Copy link
Contributor Author

bobot commented Oct 22, 2020

It should work. But I must admit I don't know what is the difference with the current PR, except adding a specific version psmt. @OCamlPro-Coquera is there a difference? Do you have time to check with your benchmarks? There is also a corresponding why3 branch which adds par, I just remarked that I forgot to add in it the needed as for fixing result type when needed.

@Gbury
Copy link
Owner

Gbury commented Oct 22, 2020

If everyone can be happy with it, I think the best thing to do would be to add a new version of smtlib2 that contains the required extensions for polymorphism, probably called psmt2 if everyone agrees, since the name already kind of exists. If I remember correctly, psmt2 has a dedicated different extension *.psmt2, which would help differentiate from regular smt2 files. Would that work for you @bobot ?

@bobot
Copy link
Contributor Author

bobot commented Oct 28, 2020

It should work with me, I just don't know what are the next steps and who does them.

@Gbury
Copy link
Owner

Gbury commented Oct 28, 2020

I'll start on that, and ping you once you can try it out.

@Gbury
Copy link
Owner

Gbury commented Nov 1, 2020

So, I just pushed on the repo a branch psmt2 which adds a poly version for the smtlib2 language (recognized by the .psmt2 extension), which is exactly your polymorphic extension of the smtlib2 language (which luckily coincides with the psmt2 format as created by the alt-ergo developpers, yay !). However, there's still some work to be done, and I have some other things I'd like to work on for dolmen (i.e. printing/exporting), so I'll let you finish some things.

From now on, I'd suggest you do the following:

  • set your branch for this PR to the head of the psmt2 branch to continue work until things wan be merged
  • complete the syntax error messages of the smtlib2/poly language (see https://github.com/Gbury/dolmen/blob/master/HACKING.md#hand-written-syntax-error-messages for info about how to do that; and if the explanations there are not clear/informative enough, don't hesitate to tell me and I'll try and complete things): this means adding cases for all errors in the syntax.messages file, and adding corresponding test cases as described in the HACKING.md file
  • add some good test files (as in tests that pass and do not raise an error), for both parsing and typing (resp. in tests/parsing/smtlib/poly/ and tests/typing/base-smtlib/poly)

And then we should be good to go, :)

@bobot bobot changed the base branch from master to psmt2 November 2, 2020 15:29
@bobot
Copy link
Contributor Author

bobot commented Nov 2, 2020

Thank you for the fixes and the psmt2 branch. I will look at the last points next week.

@Gbury
Copy link
Owner

Gbury commented Nov 2, 2020

Great ! (Though just to be clear,rather than setting the target of your PR to the psmt2 branch, what I was suggesting was that you set your branch to point at the current HEAD commit of psmt2, still targeting master with the PR, and continue working from there (since the branch contains all your changes applied to the new parser for psmt2 rather than smtlib2))

@bobot bobot changed the base branch from psmt2 to master November 20, 2020 09:01
@bobot bobot force-pushed the add_parametricity_in_smt branch from ece580b to a418973 Compare November 20, 2020 09:29
@bobot
Copy link
Contributor Author

bobot commented Nov 20, 2020

Ok I fixed the misunderstanding by resetting bobot:add_parametricity_in_smt to psmt2. I'm not sure that copying all smt2 into psmt2 will help maintainability, even more since smt3 will arrive with the same basic blocs, but it is not a choice for me to make 😸 .

@bobot bobot force-pushed the add_parametricity_in_smt branch from a418973 to e05bf1a Compare November 20, 2020 10:31
@bobot
Copy link
Contributor Author

bobot commented Nov 20, 2020

@Gbury I think that all the remaining points are done.

FWIW:

  • in dune 2.8 diff action will work if the reference doesn't exists. It is considered as empty.
  • Updating syntax.messages is not trivial from new.messages: perhaps I used the wrong method (meld) or it could be improved in menhir (updating and adding new states).
  • The tests framework of dolmen can be added as a datapoint to Uses-cases of generated dune files ocaml/dune#3901 , and it has the advantage to be public.

@bobot
Copy link
Contributor Author

bobot commented Nov 20, 2020

What is the version of lsp that should be used?

@Gbury
Copy link
Owner

Gbury commented Nov 20, 2020

What is the version of lsp that should be used?

The latest release (1.2 iirc) should work (previous releases have a bug related to vendored libs which makes it impossible to use in conjunction with uutf).

@bobot
Copy link
Contributor Author

bobot commented Nov 20, 2020

@ACoquereau There is a difference between smtv3 and psmt2 at least when trying the benchmark:

In common-drv-psmt2-format/BWARE-VCs--DAB/Clavier_code-B_translation-Initialisation_1.smt2.zip:

(define-fun infix_eqeq
  (par (a1) ((s1 (set a1)) (s2 (set a1))) Bool) 
  (forall ((x a1)) (= (mem x s1) (mem x s2)))
)

In smtv3 and what is implemented would be:

(define-fun infix_eqeq
 (par (a1) ((s1 (set a1)) (s2 (set a1))) Bool 
     (forall ((x a1)) (= (mem x s1) (mem x s2)))
 )
)

The smtv3 is more regular since a1 in the function body is inside the par binder not outside.

@bobot bobot force-pushed the add_parametricity_in_smt branch from f1e1a58 to 0fc6b50 Compare November 20, 2020 14:00
@bobot
Copy link
Contributor Author

bobot commented Nov 20, 2020

I have a compilation error with 1.2.0 of lsp, and the travis job is failing for another reason.

@Gbury
Copy link
Owner

Gbury commented Nov 20, 2020

  • The tests framework of dolmen can be added as a datapoint to ocaml/dune#3901 , and it has the advantage to be public.

I've just posted on the issue to describe how and why dolmen generates dune files.

@Gbury
Copy link
Owner

Gbury commented Nov 20, 2020

I have a compilation error with 1.2.0 of lsp, and the travis job is failing for another reason.

I'll take a look, ^^

src/loop/parser.ml Outdated Show resolved Hide resolved
@Gbury
Copy link
Owner

Gbury commented Nov 20, 2020

  • Updating syntax.messages is not trivial from new.messages: perhaps I used the wrong method (meld) or it could be improved in menhir (updating and adding new states).

I know, however, it seems to be by design of menhir (see https://gitlab.inria.fr/fpottier/menhir/-/issues/14 for more information)

@bobot bobot force-pushed the add_parametricity_in_smt branch from 0fc6b50 to 00e0629 Compare November 20, 2020 14:40
@Gbury
Copy link
Owner

Gbury commented Nov 20, 2020

I have a compilation error with 1.2.0 of lsp, and the travis job is failing for another reason.

I'll take a look, ^^

Okay, so lsp actually broke the API compatibility between 1.1.0 which had the vendor problem, and 1.2.0 which as far as I can tell, doesn't install all the necessary libraries (see ocaml/ocaml-lsp#311 ). It'll likely take some time to fix, so you can ignore the lsp part for now, ^^

@rbonichon
Copy link

I'm not sure the polymorphic notation we introduced back then (for veriT) needs to be retained. Pretty sure that smt3 notation will be adopted.

@bobot
Copy link
Contributor Author

bobot commented Dec 2, 2020

Indeed define-funs-rec seems wrong, I don't think the smtlib3 draft talk about that part. So it is something that must be bring up on th emailing list. I still added the three solutions in the parser. (The syntax.message seems way too cumbersome to modify, we could generate most of the case on the fly using the incremental API of menhir since it gives us the next accepted tokens in a given state). We can remove the other solutions later.

For the Why3 MR: https://gitlab.inria.fr/why3/why3/-/merge_requests/112 .

@Gbury the MR also updates for the new menhir version, and dune promote allows to update the syntax.messages.

The trigger with as are not well taken into account (last problem from the psmt2 benchmark), but I will let you see the problem, I dont' understand how attributes must be handled.

@Gbury
Copy link
Owner

Gbury commented Dec 2, 2020

The trigger with as are not well taken into account (last problem from the psmt2 benchmark), but I will let you see the problem, I dont' understand how attributes must be handled.

Can you send me the problematic file ?
I agree that attributes are not the simplest part of dolmen, I'll get on writing some doc about that one of these days, ^^

@Gbury
Copy link
Owner

Gbury commented Dec 3, 2020

Ok, so the problem you encounter with "as" is the following: there is a :pattern annotation, which is meant to add a trigger annotation to a term and has as payload an s-expression (which is not parsed the same as terms according to the smtlib doc). This s-expression is understood as a list of terms, even though syntaxically, we have (recursively), a list of s-expressions and not a list of terms. Currently the typechecker correctly handles that, except that since s-expressions do not have the same syntax rules as terms, it does not have the specific parsing rules for as and par. You did add some production to parse these, but the way the productions work is not correct.

So, basically, according to a strict interpretation of the smtlib standard, par and as shouldn't really be allowed in these pattern annotations; also note that other constructions such as indexed identifiers (e.g. (_ bv32 13) ) are also not allowed technically, which may or may not be annoying in the long term. However, since we are talking about an extension and not smtlib v2.7, we have some leeway thankfully, ^^
In this case, i'd suggest you manually add the required production rules for the as and par keywords to the rules for parsing s_expressions, copying how they are handled in terms (so I think re-using the already existing qual_identifier production rule instead of the symbol one to handle the as keyword, and using the T.par function instead of creating a new id with the name par to correctly handle par).

@bobot
Copy link
Contributor Author

bobot commented Dec 3, 2020

In this case, i'd suggest you manually add the required production rules for the as and par keywords to the rules for parsing s_expressions, copying how they are handled in terms (so I think re-using the already existing qual_identifier production rule instead of the symbol one to handle the as keyword, and using the T.par function instead of creating a new id with the name par to correctly handle par).

Ok, I will try that. But in the long run, we could have other attributes that express something else than terms, so as should not be parsed like in terms. Would not it be simpler to have a real type sexp for unknown attribute and a way to extend the attribute that should be parsed as terms? Oh yes menhir parser can't be extended in this way. But still for a fixed known list of attributes?

@Gbury
Copy link
Owner

Gbury commented Dec 3, 2020

I'm not quite sure what you'd want, so let me resume how attributes are defined by smtlib2 and handled by dolmen.

In smtlib2, attributes have the form :name payload?, where the payload is optional and is an s-expression (whose definition is different from terms). Dolmen parses that as defined by the smtlib2, by adding in the "attrs" field of the untyped term a term of the form name(payload) (i.e. the attribute name applied to the payload). Additionally, in s-expressions, application (or rather the recursion production), is not parsed the same as term application, but rather a s-exp of the form (a b c ..) is parsed as an untyped term of the form $data(a, b, c, ...) (i.e. application of a special identifier $data to the list of s-expressions). That's for parsing.

For typing, the typechecker does try and type all attributes that are attached to each term (the "try" here means that by default an error in parsing an attribute will only result in a warning and not an error). Since there are close to no attributes defined in the tff typechecker, the typing of an attribute (and its optional payload), will almost always then be routed to the typing builtin function. That allows for each language to define its own typing of attributes. In the case of smtlib2, you can see the attributes defined there :

(* Named formulas *)
| Type.Id { Id.name = ":named"; ns = Id.Attr } ->
`Tags (Base.make_op1 (module Type) env ":named" (fun _ t ->
let name = parse_symbol env t in
[Type.Any (Tag.named, name)]
))
(* Trigger annotations *)
| Type.Id { Id.name = ":pattern"; ns = Id.Attr } ->
`Tags (Base.make_op1 (module Type) env ":pattern" (fun _ t ->
let l = parse_sexpr_list env t in
let l = List.map (Type.parse_term env) l in
[Type.Any (Tag.triggers, l)]
))

The first two cases define the typing of :named and :pattern attributes. In the case of patterns, we parse the payload as a list of terms to be added as triggers, using this aux function:

let parse_sexpr_list env = function
| { Ast.term = Ast.App (
{ Ast.term = Ast.Symbol { Id.name = "$data"; ns = Id.Attr }; _ },
l); _} ->
l
| ast ->
Type._error env (Ast ast)
(Type.Expected ("a list of terms in a sexpr", None))

And finally, we have the following case which enables the typechecker to parse s-expressions as terms:

(* N-ary s-expressions in attributes *)
| Type.Id { Id.name = "$data"; ns = Id.Attr } ->
`Term (fun ast args ->
begin match args with
| f :: r -> parse_f env ast f r
| [] -> Type._error env (Ast ast)
(Type.Expected ("a non-empty s-expr", None))
end)

As such, parsing attributes works for all attributes (that's pretty much the only solution), and typing of these is very easily extensible and adaptable to each language.

However, note that currently, attributes are meant to be non-crucial information attached to terms, or said differently, the presence or absence of an attribute should not change whether a term type-checks; attributes are used as meta-data and/or hints for the consumer, which is why the current use of attributes are for triggers and naming things for later model output. information that is required to succesfuly type or prevent the typing of terms should not be in attributes but as normal nodes of the AST. This is the case for as typing annotations, which are translated using the Colon juxtaposition node of the untyped AST.

So now that the usage of attributes in dolmen is a bit clearer, we can talk about how to handle uses of as in s-expressions, as it happens in the case of typing the payload of a :pattern attribute. I find it a bit annoying that smtlib2 (and by extension psmt2) would define payload of attributes as s-expressions to then try and use them as terms. This creates a bit of a problem in my opinion, and will unnecessarily complicate most implementation in my opinion.
So currently, and according to the definition of s-expressions in smtlib2, keywords such as as and par are not allowed in s-expression (i.e. these would be syntax errors, not typing or interpretation errors). It seems that accepting these is however required when users want to use polymorphism in triggers, and thus be able to give terms as payloads. And it's not really clear to me how to handle this in dolmen: as you said, extending the parsing rules for s-expressions to include as and par feels a bit ad-hoc, on the other hand, since they are keywords, we'll need specific rules to parse them in s-expressions anyway...

@Gbury
Copy link
Owner

Gbury commented Dec 3, 2020

Basically, I'd say smtlib2 (and extension's authors) should decide whether attribute payloads are terms or s-expressions (or both, given an adequate syntax rule to distinguish the two cases), and stop trying to say different things depending on the situation.
For instance, the documentation defines in the syntax attributes payloads as s-expressions, but the documentation also says that :pattern payloads should be terms, and these two statements are contradictory.

@bobot
Copy link
Contributor Author

bobot commented Dec 3, 2020

I agree fully with your last comment. And that's why I think it would be simpler if dolmen could parse the :pattern directly as a term. However it should be done without making :pattern a keyword.

@Gbury
Copy link
Owner

Gbury commented Dec 3, 2020

I agree fully with your last comment. And that's why I think it would be simpler if dolmen could parse the :pattern directly as a term. However it should be done without making :pattern a keyword.

It would be a solution, but that's not really possible with menhir as far as I know, ^^ (plus it would be a hack in the sens that other attributes could not really do the same, except if the mechanism was somehow extensible at runtime, which would be .. complicated ; that's one of the reasons dolmen is likely never going to parse the Coq language).

By the way, I submitted a topic about that in the SMTLIB google group (it's pending for authorization currently).

@Gbury
Copy link
Owner

Gbury commented Jan 13, 2021

Sorry for not merging sooner, I'll try and merge asap.
Good catch for the to_int/floor problem in arith.

However, concerning the second one (about the timeouts), I'm not really sure about its usefulness. I'll need to add it in comments and in the docs, but there is already a way to disable the timer part of the limits, by using a timeout of 0 (see the documentation for Unix.itimer). For the size computation limits, is there a real and measurable cost to the Gc alarm (as in a test case where you can measure/observe the overhead of the alarm when setting the memory limit to infinity) ? And actually, what is the motivation behind the change ? Does it try and solve a concrete problem or is it more for code cleanliness ?

@bobot
Copy link
Contributor Author

bobot commented Jan 13, 2021

There is still the problem with the pattern. I have a problem with the namespace it is different for terms and type it seems. I haven't looked at it since just after the discussion.

I just didn't know what was the value for removing the limitations ;). (e.g the example Option module uses 300 and 1_000_000_000 default limits). And currently it is not the same for the time (0) and the memory (infinity). Finally there is only one timer of each kind in a process, so setting it to 0 would stop a timer set by another part of the processus. Instead of infinity, 0 can be used or an option, but I prefer that nothing is touched if nothing have to.

(About to_int/floor, for to_real I don't really like the use of coerce because it is hard to know the possible type arguments, moreover it seems that (to_real true) is accepted as to_real [bool,real] true. I'm not sure, I seen that in passing I will create an MR if it is true).

@Gbury
Copy link
Owner

Gbury commented Jan 13, 2021

I Have a plan to take care of the patterns properly, i'll try and work on it soon (probably next week).
Actually, I think I'll going to merge this PR as soon as I can, and you'll be able to open other PRs (or open issues) for the minor bugs you mention.

@bobot bobot force-pushed the add_parametricity_in_smt branch from 58dc3ed to f835115 Compare January 18, 2021 21:20
@bobot
Copy link
Contributor Author

bobot commented Jan 18, 2021

I removed the little fixes from this branch and I will put them to a new MR. The to_int fix was not good because Expr.Real.floor is real -> real instead of real -> int. I fixed it in another way but it could lead to discussions.

@Gbury Gbury force-pushed the add_parametricity_in_smt branch from f835115 to 7ff80c2 Compare February 3, 2021 12:55
@Gbury
Copy link
Owner

Gbury commented Feb 3, 2021

I rebased on master

@Gbury Gbury merged commit b9ebbbd into Gbury:master Feb 3, 2021
@Gbury
Copy link
Owner

Gbury commented Feb 3, 2021

Merged, thanks a lot for the work !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants