[RFC] Add polymorphism of smt v3 in smt v2.6 #53

bobot · 2020-10-16T13:03:59Z

Over the year adding polymorphism to the smt language have been proposed a lot of time. It will become standard in v3 as already discussed in #48. But for COLIBRI I needed to be able to parse polymorphism (since it knows little above ground term any encoding is problematic). Finally the addition is quite small since dolmen already handles it internally.

Currently the modificatio is as simple as possible:

In v2.6
Without a flag to restrict it

In order to solve that, and if you are interested, what is the right way since v3 should be mainly backward compatible with 2.6:

A copy of v2.6 as v3?
Just add a new langage but still share the code?

I don't plan to add a v3 parser now, first because v3 is still not formalised and because I believe it is be better to do it step by step.

Gbury · 2020-10-16T14:18:26Z

If I understand correctly, this is a compatible extension of smtlib v2.6, correct ?
In any case, i have a few remarks:

considering dolmen may be used to check new benchmarks for the smtlib, this feature would either need to be protected by a flag (or be only in alternative version for smtlib)
thinking about it, I think having the extension be a new smtlib version (we'd need to find a name though, :p), would be the nicest from the point of view of users of dolmen. It could be the default for file with the .smt2 extension, but if needed an explicit set of the input language to smtlibv2.6 would be all that is needed to ensure user that what is parsed is exactly smtlib2.6
I know that @OCamlPro-Coquera also worked on an extension of smtlib v2.6 to support polymorphism, called psmt2, it'd be nice to at least compare the two approaches, and ideally have them be the same thing (since psmt2 is planned to be included in dolmen at some point)

Finally, note that I recently added error messages to smtlib, though it seems I didn't correctly set up testing to fail on syntax change, I'll fix that in master (and then you'll have the fun of writing the error messages for the new cases (and potentially fixing some of the already existing ones)).

Gbury · 2020-10-16T15:10:21Z

Actually, the test works fine (just run make test), but the question of error messages can come after we decide where the extension will live.

bobot · 2020-10-17T10:12:16Z

I haven't checked if the SMTLIB V3 use the same choices than @OcamlPro-Coquera and others. All the proposition of extension choosed the (par (type variables) ...), I think in the same way. There are differences for the use of the AS I remember.

In fact I would propose to go full v3 even if things are not yet settled and very incomplete, we just need to have the understanding that it is completely experimental. So:

use .smt3 as extension
start just with the code of v2.6 since v3 will be as backward compatible as possible
add feature of v3 little by little
move the parser to the grammar defined in a futur v3 document when available.

Doing in this way will avoid one gigantic branch, allows to test part of v3 and makes a playground for the smtlib v3 proposal.

I can do the simple first part, add .smt3 extension and copy the smtlib2 directory into smtlib3 in this MR.

ACoquereau · 2020-10-19T08:13:03Z

Hi,
It's seems to me that the (par (type variables) ...) notation presented in the smt-lib3 draft can be compatible with the one we presented at smt2018.
If needed, some benchmarks in the psmt2 format are available here.

Gbury · 2020-10-21T21:22:08Z

@bobot would the psmt2 format work for you (until smtlib3 is released) ?

bobot · 2020-10-22T07:39:54Z

It should work. But I must admit I don't know what is the difference with the current PR, except adding a specific version psmt. @OCamlPro-Coquera is there a difference? Do you have time to check with your benchmarks? There is also a corresponding why3 branch which adds par, I just remarked that I forgot to add in it the needed as for fixing result type when needed.

Gbury · 2020-10-22T21:54:11Z

If everyone can be happy with it, I think the best thing to do would be to add a new version of smtlib2 that contains the required extensions for polymorphism, probably called psmt2 if everyone agrees, since the name already kind of exists. If I remember correctly, psmt2 has a dedicated different extension *.psmt2, which would help differentiate from regular smt2 files. Would that work for you @bobot ?

bobot · 2020-10-28T15:57:19Z

It should work with me, I just don't know what are the next steps and who does them.

Gbury · 2020-10-28T16:49:12Z

I'll start on that, and ping you once you can try it out.

src/languages/smtlib2/v2.6/parseSmtlib.mly

Gbury · 2020-11-01T14:40:29Z

So, I just pushed on the repo a branch psmt2 which adds a poly version for the smtlib2 language (recognized by the .psmt2 extension), which is exactly your polymorphic extension of the smtlib2 language (which luckily coincides with the psmt2 format as created by the alt-ergo developpers, yay !). However, there's still some work to be done, and I have some other things I'd like to work on for dolmen (i.e. printing/exporting), so I'll let you finish some things.

From now on, I'd suggest you do the following:

set your branch for this PR to the head of the psmt2 branch to continue work until things wan be merged
complete the syntax error messages of the smtlib2/poly language (see https://github.com/Gbury/dolmen/blob/master/HACKING.md#hand-written-syntax-error-messages for info about how to do that; and if the explanations there are not clear/informative enough, don't hesitate to tell me and I'll try and complete things): this means adding cases for all errors in the syntax.messages file, and adding corresponding test cases as described in the HACKING.md file
add some good test files (as in tests that pass and do not raise an error), for both parsing and typing (resp. in tests/parsing/smtlib/poly/ and tests/typing/base-smtlib/poly)

And then we should be good to go, :)

bobot · 2020-11-02T15:31:40Z

Thank you for the fixes and the psmt2 branch. I will look at the last points next week.

Gbury · 2020-11-02T17:28:08Z

Great ! (Though just to be clear,rather than setting the target of your PR to the psmt2 branch, what I was suggesting was that you set your branch to point at the current HEAD commit of psmt2, still targeting master with the PR, and continue working from there (since the branch contains all your changes applied to the new parser for psmt2 rather than smtlib2))

bobot · 2020-11-20T10:30:24Z

Ok I fixed the misunderstanding by resetting bobot:add_parametricity_in_smt to psmt2. I'm not sure that copying all smt2 into psmt2 will help maintainability, even more since smt3 will arrive with the same basic blocs, but it is not a choice for me to make 😸 .

bobot · 2020-11-20T13:33:35Z

@Gbury I think that all the remaining points are done.

FWIW:

in dune 2.8 diff action will work if the reference doesn't exists. It is considered as empty.
Updating syntax.messages is not trivial from new.messages: perhaps I used the wrong method (meld) or it could be improved in menhir (updating and adding new states).
The tests framework of dolmen can be added as a datapoint to Uses-cases of generated dune files ocaml/dune#3901 , and it has the advantage to be public.

bobot · 2020-11-20T13:38:50Z

What is the version of lsp that should be used?

Gbury · 2020-11-20T13:45:59Z

What is the version of lsp that should be used?

The latest release (1.2 iirc) should work (previous releases have a bug related to vendored libs which makes it impossible to use in conjunction with uutf).

bobot · 2020-11-20T13:56:48Z

@ACoquereau There is a difference between smtv3 and psmt2 at least when trying the benchmark:

In common-drv-psmt2-format/BWARE-VCs--DAB/Clavier_code-B_translation-Initialisation_1.smt2.zip:

(define-fun infix_eqeq
  (par (a1) ((s1 (set a1)) (s2 (set a1))) Bool) 
  (forall ((x a1)) (= (mem x s1) (mem x s2)))
)

In smtv3 and what is implemented would be:

(define-fun infix_eqeq
 (par (a1) ((s1 (set a1)) (s2 (set a1))) Bool 
     (forall ((x a1)) (= (mem x s1) (mem x s2)))
 )
)

The smtv3 is more regular since a1 in the function body is inside the par binder not outside.

bobot · 2020-11-20T14:04:32Z

I have a compilation error with 1.2.0 of lsp, and the travis job is failing for another reason.

Gbury · 2020-11-20T14:05:59Z

The tests framework of dolmen can be added as a datapoint to ocaml/dune#3901 , and it has the advantage to be public.

I've just posted on the issue to describe how and why dolmen generates dune files.

Gbury · 2020-11-20T14:06:17Z

I have a compilation error with 1.2.0 of lsp, and the travis job is failing for another reason.

I'll take a look, ^^

src/loop/parser.ml

Gbury · 2020-11-20T14:11:29Z

Updating syntax.messages is not trivial from new.messages: perhaps I used the wrong method (meld) or it could be improved in menhir (updating and adding new states).

I know, however, it seems to be by design of menhir (see https://gitlab.inria.fr/fpottier/menhir/-/issues/14 for more information)

Gbury · 2020-11-20T14:57:19Z

I have a compilation error with 1.2.0 of lsp, and the travis job is failing for another reason.

I'll take a look, ^^

Okay, so lsp actually broke the API compatibility between 1.1.0 which had the vendor problem, and 1.2.0 which as far as I can tell, doesn't install all the necessary libraries (see ocaml/ocaml-lsp#311 ). It'll likely take some time to fix, so you can ignore the lsp part for now, ^^

rbonichon · 2020-12-02T14:37:54Z

I'm not sure the polymorphic notation we introduced back then (for veriT) needs to be retained. Pretty sure that smt3 notation will be adopted.

bobot · 2020-12-02T15:35:30Z

Indeed define-funs-rec seems wrong, I don't think the smtlib3 draft talk about that part. So it is something that must be bring up on th emailing list. I still added the three solutions in the parser. (The syntax.message seems way too cumbersome to modify, we could generate most of the case on the fly using the incremental API of menhir since it gives us the next accepted tokens in a given state). We can remove the other solutions later.

For the Why3 MR: https://gitlab.inria.fr/why3/why3/-/merge_requests/112 .

@Gbury the MR also updates for the new menhir version, and dune promote allows to update the syntax.messages.

The trigger with as are not well taken into account (last problem from the psmt2 benchmark), but I will let you see the problem, I dont' understand how attributes must be handled.

Gbury · 2020-12-02T15:48:28Z

The trigger with as are not well taken into account (last problem from the psmt2 benchmark), but I will let you see the problem, I dont' understand how attributes must be handled.

Can you send me the problematic file ?
I agree that attributes are not the simplest part of dolmen, I'll get on writing some doc about that one of these days, ^^

Gbury · 2020-12-03T12:00:45Z

Ok, so the problem you encounter with "as" is the following: there is a :pattern annotation, which is meant to add a trigger annotation to a term and has as payload an s-expression (which is not parsed the same as terms according to the smtlib doc). This s-expression is understood as a list of terms, even though syntaxically, we have (recursively), a list of s-expressions and not a list of terms. Currently the typechecker correctly handles that, except that since s-expressions do not have the same syntax rules as terms, it does not have the specific parsing rules for as and par. You did add some production to parse these, but the way the productions work is not correct.

So, basically, according to a strict interpretation of the smtlib standard, par and as shouldn't really be allowed in these pattern annotations; also note that other constructions such as indexed identifiers (e.g. (_ bv32 13) ) are also not allowed technically, which may or may not be annoying in the long term. However, since we are talking about an extension and not smtlib v2.7, we have some leeway thankfully, ^^
In this case, i'd suggest you manually add the required production rules for the as and par keywords to the rules for parsing s_expressions, copying how they are handled in terms (so I think re-using the already existing qual_identifier production rule instead of the symbol one to handle the as keyword, and using the T.par function instead of creating a new id with the name par to correctly handle par).

bobot · 2020-12-03T12:57:02Z

In this case, i'd suggest you manually add the required production rules for the as and par keywords to the rules for parsing s_expressions, copying how they are handled in terms (so I think re-using the already existing qual_identifier production rule instead of the symbol one to handle the as keyword, and using the T.par function instead of creating a new id with the name par to correctly handle par).

Ok, I will try that. But in the long run, we could have other attributes that express something else than terms, so as should not be parsed like in terms. Would not it be simpler to have a real type sexp for unknown attribute and a way to extend the attribute that should be parsed as terms? Oh yes menhir parser can't be extended in this way. But still for a fixed known list of attributes?

Gbury · 2020-12-03T13:40:58Z

I'm not quite sure what you'd want, so let me resume how attributes are defined by smtlib2 and handled by dolmen.

In smtlib2, attributes have the form :name payload?, where the payload is optional and is an s-expression (whose definition is different from terms). Dolmen parses that as defined by the smtlib2, by adding in the "attrs" field of the untyped term a term of the form name(payload) (i.e. the attribute name applied to the payload). Additionally, in s-expressions, application (or rather the recursion production), is not parsed the same as term application, but rather a s-exp of the form (a b c ..) is parsed as an untyped term of the form $data(a, b, c, ...) (i.e. application of a special identifier $data to the list of s-expressions). That's for parsing.

For typing, the typechecker does try and type all attributes that are attached to each term (the "try" here means that by default an error in parsing an attribute will only result in a warning and not an error). Since there are close to no attributes defined in the tff typechecker, the typing of an attribute (and its optional payload), will almost always then be routed to the typing builtin function. That allows for each language to define its own typing of attributes. In the case of smtlib2, you can see the attributes defined there :

dolmen/src/typecheck/core.ml

Lines 129 to 142 in dfbd7ff

    
           (* Named formulas *) 
        
           | Type.Id { Id.name = ":named"; ns = Id.Attr } -> 
        
             `Tags (Base.make_op1 (module Type) env ":named" (fun _ t -> 
        
                 let name = parse_symbol env t in 
        
                 [Type.Any (Tag.named, name)] 
        
               )) 
        
           (* Trigger annotations *) 
        
           | Type.Id { Id.name = ":pattern"; ns = Id.Attr } -> 
        
             `Tags (Base.make_op1 (module Type) env ":pattern" (fun _ t -> 
        
                 let l = parse_sexpr_list env t in 
        
                 let l = List.map (Type.parse_term env) l in 
        
                 [Type.Any (Tag.triggers, l)] 
        
               ))

The first two cases define the typing of :named and :pattern attributes. In the case of patterns, we parse the payload as a list of terms to be added as triggers, using this aux function:

dolmen/src/typecheck/core.ml

Lines 83 to 90 in dfbd7ff

    
           let parse_sexpr_list env = function 
        
             | { Ast.term = Ast.App ( 
        
                 { Ast.term = Ast.Symbol { Id.name = "$data"; ns = Id.Attr }; _ }, 
        
                 l); _} -> 
        
               l 
        
             | ast -> 
        
               Type._error env (Ast ast) 
        
                 (Type.Expected ("a list of terms in a sexpr", None))

And finally, we have the following case which enables the typechecker to parse s-expressions as terms:

dolmen/src/typecheck/core.ml

Lines 144 to 151 in dfbd7ff

    
           (* N-ary s-expressions in attributes *) 
        
           | Type.Id { Id.name = "$data"; ns = Id.Attr } -> 
        
             `Term (fun ast args -> 
        
                 begin match args with 
        
                   | f :: r -> parse_f env ast f r 
        
                   | [] -> Type._error env (Ast ast) 
        
                             (Type.Expected ("a non-empty s-expr", None)) 
        
                 end)

As such, parsing attributes works for all attributes (that's pretty much the only solution), and typing of these is very easily extensible and adaptable to each language.

However, note that currently, attributes are meant to be non-crucial information attached to terms, or said differently, the presence or absence of an attribute should not change whether a term type-checks; attributes are used as meta-data and/or hints for the consumer, which is why the current use of attributes are for triggers and naming things for later model output. information that is required to succesfuly type or prevent the typing of terms should not be in attributes but as normal nodes of the AST. This is the case for as typing annotations, which are translated using the Colon juxtaposition node of the untyped AST.

So now that the usage of attributes in dolmen is a bit clearer, we can talk about how to handle uses of as in s-expressions, as it happens in the case of typing the payload of a :pattern attribute. I find it a bit annoying that smtlib2 (and by extension psmt2) would define payload of attributes as s-expressions to then try and use them as terms. This creates a bit of a problem in my opinion, and will unnecessarily complicate most implementation in my opinion.
So currently, and according to the definition of s-expressions in smtlib2, keywords such as as and par are not allowed in s-expression (i.e. these would be syntax errors, not typing or interpretation errors). It seems that accepting these is however required when users want to use polymorphism in triggers, and thus be able to give terms as payloads. And it's not really clear to me how to handle this in dolmen: as you said, extending the parsing rules for s-expressions to include as and par feels a bit ad-hoc, on the other hand, since they are keywords, we'll need specific rules to parse them in s-expressions anyway...

Gbury · 2020-12-03T13:50:31Z

Basically, I'd say smtlib2 (and extension's authors) should decide whether attribute payloads are terms or s-expressions (or both, given an adequate syntax rule to distinguish the two cases), and stop trying to say different things depending on the situation.
For instance, the documentation defines in the syntax attributes payloads as s-expressions, but the documentation also says that :pattern payloads should be terms, and these two statements are contradictory.

bobot · 2020-12-03T14:20:06Z

I agree fully with your last comment. And that's why I think it would be simpler if dolmen could parse the :pattern directly as a term. However it should be done without making :pattern a keyword.

Gbury · 2020-12-03T14:25:18Z

I agree fully with your last comment. And that's why I think it would be simpler if dolmen could parse the :pattern directly as a term. However it should be done without making :pattern a keyword.

It would be a solution, but that's not really possible with menhir as far as I know, ^^ (plus it would be a hack in the sens that other attributes could not really do the same, except if the mechanism was somehow extensible at runtime, which would be .. complicated ; that's one of the reasons dolmen is likely never going to parse the Coq language).

By the way, I submitted a topic about that in the SMTLIB google group (it's pending for authorization currently).

Gbury · 2021-01-13T13:41:33Z

Sorry for not merging sooner, I'll try and merge asap.
Good catch for the to_int/floor problem in arith.

However, concerning the second one (about the timeouts), I'm not really sure about its usefulness. I'll need to add it in comments and in the docs, but there is already a way to disable the timer part of the limits, by using a timeout of 0 (see the documentation for Unix.itimer). For the size computation limits, is there a real and measurable cost to the Gc alarm (as in a test case where you can measure/observe the overhead of the alarm when setting the memory limit to infinity) ? And actually, what is the motivation behind the change ? Does it try and solve a concrete problem or is it more for code cleanliness ?

bobot · 2021-01-13T14:32:37Z

There is still the problem with the pattern. I have a problem with the namespace it is different for terms and type it seems. I haven't looked at it since just after the discussion.

I just didn't know what was the value for removing the limitations ;). (e.g the example Option module uses 300 and 1_000_000_000 default limits). And currently it is not the same for the time (0) and the memory (infinity). Finally there is only one timer of each kind in a process, so setting it to 0 would stop a timer set by another part of the processus. Instead of infinity, 0 can be used or an option, but I prefer that nothing is touched if nothing have to.

(About to_int/floor, for to_real I don't really like the use of coerce because it is hard to know the possible type arguments, moreover it seems that (to_real true) is accepted as to_real [bool,real] true. I'm not sure, I seen that in passing I will create an MR if it is true).

Gbury · 2021-01-13T17:39:45Z

I Have a plan to take care of the patterns properly, i'll try and work on it soon (probably next week).
Actually, I think I'll going to merge this PR as soon as I can, and you'll be able to open other PRs (or open issues) for the minor bugs you mention.

bobot · 2021-01-18T21:22:44Z

I removed the little fixes from this branch and I will put them to a new MR. The to_int fix was not good because Expr.Real.floor is real -> real instead of real -> int. I fixed it in another way but it could lead to discussions.

Gbury · 2021-02-03T12:56:13Z

I rebased on master

Gbury · 2021-02-03T12:56:47Z

Merged, thanks a lot for the work !

Gbury reviewed Oct 30, 2020

View reviewed changes

src/languages/smtlib2/v2.6/parseSmtlib.mly Outdated Show resolved Hide resolved

bobot changed the base branch from master to psmt2 November 2, 2020 15:29

bobot changed the base branch from psmt2 to master November 20, 2020 09:01

bobot force-pushed the add_parametricity_in_smt branch from ece580b to a418973 Compare November 20, 2020 09:29

bobot force-pushed the add_parametricity_in_smt branch from a418973 to e05bf1a Compare November 20, 2020 10:31

bobot force-pushed the add_parametricity_in_smt branch from f1e1a58 to 0fc6b50 Compare November 20, 2020 14:00

Gbury reviewed Nov 20, 2020

View reviewed changes

src/loop/parser.ml Outdated Show resolved Hide resolved

bobot force-pushed the add_parametricity_in_smt branch from 0fc6b50 to 00e0629 Compare November 20, 2020 14:40

bobot force-pushed the add_parametricity_in_smt branch from 58dc3ed to f835115 Compare January 18, 2021 21:20

bobot and others added 12 commits February 3, 2021 12:39

Fix typos in comments

b6d3292

Add polymorphism of smt v3 in smt v2.6

362f09c

Add psmt2 (some error messages still missing)

4d07691

Integrate new states in messages for poly

6e1dd2a

Add messages for (par ...)

bede9d4

Add tests for typing of psmt2

2421611

Allows to select psmt2 with -i and fix hardcoding of smtlib latest

97f5095

Add other proposition of extension of smt2 to polymorphism

034e0d3

Try the new --merge-errors option of menhir

1fa82e1

Update for menhir 20201201

08ce3d5

Update syntax.messages

8553edd

SMTLIB Poly Parse more attributes

7ff80c2

Gbury force-pushed the add_parametricity_in_smt branch from f835115 to 7ff80c2 Compare February 3, 2021 12:55

Gbury merged commit b9ebbbd into Gbury:master Feb 3, 2021

[RFC] Add polymorphism of smt v3 in smt v2.6 #53

[RFC] Add polymorphism of smt v3 in smt v2.6 #53

Conversation

bobot commented Oct 16, 2020

Gbury commented Oct 16, 2020

Gbury commented Oct 16, 2020

bobot commented Oct 17, 2020

ACoquereau commented Oct 19, 2020

Gbury commented Oct 21, 2020

bobot commented Oct 22, 2020 • edited Loading

Gbury commented Oct 22, 2020

bobot commented Oct 28, 2020

Gbury commented Oct 28, 2020

Gbury commented Nov 1, 2020 • edited Loading

bobot commented Nov 2, 2020

Gbury commented Nov 2, 2020

bobot commented Nov 20, 2020

bobot commented Nov 20, 2020

bobot commented Nov 20, 2020

Gbury commented Nov 20, 2020

bobot commented Nov 20, 2020

bobot commented Nov 20, 2020

Gbury commented Nov 20, 2020 • edited Loading

Gbury commented Nov 20, 2020

Gbury commented Nov 20, 2020

Gbury commented Nov 20, 2020

rbonichon commented Dec 2, 2020

bobot commented Dec 2, 2020 • edited Loading

Gbury commented Dec 2, 2020

Gbury commented Dec 3, 2020

bobot commented Dec 3, 2020

Gbury commented Dec 3, 2020 • edited Loading

Gbury commented Dec 3, 2020

bobot commented Dec 3, 2020

Gbury commented Dec 3, 2020

Gbury commented Jan 13, 2021

bobot commented Jan 13, 2021 • edited Loading

Gbury commented Jan 13, 2021

bobot commented Jan 18, 2021

Gbury commented Feb 3, 2021

Gbury commented Feb 3, 2021

bobot commented Oct 22, 2020 •

edited

Loading

Gbury commented Nov 1, 2020 •

edited

Loading

Gbury commented Nov 20, 2020 •

edited

Loading

bobot commented Dec 2, 2020 •

edited

Loading

Gbury commented Dec 3, 2020 •

edited

Loading

bobot commented Jan 13, 2021 •

edited

Loading