Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require opam-version 2.1 and greater at the start of the file #43

Merged
merged 6 commits into from
Apr 16, 2021

Conversation

dra27
Copy link
Member

@dra27 dra27 commented Mar 11, 2021

opam 2.1 does not define any new fields, so there is no need for any opam file to contain opam-version: "2.1". Given this general quietness, there is one semantic change being made for the 2.1 and subsequent formats in order to allow the lexer and parser to evolve in future. opam-version must appear first (apart from whitespace and comments) for "2.1" and later, or the file does not parse. This gives the possibility in the future that additional constructs can be added to the file format, while retaining compatibility with older lexers.

This PR enforces two requirements on the whole-file lexing:

  1. opam-version: "2.1" must be the first non-whitespace/comment field
  2. For opam-version: "2.1" and greater, repeating the opam-version field is a parsing error

In both these cases, for maximum API compatibility, Parsing.Parse_error is raised rather than a new exception. API changes in the future will be needed to take advantage of this, but the key thing in this PR is to enforce the invariant so that this can be done when it's needed in future.

Opam itself only uses OpamParser.FullPos.string, OpamParser.FullPos.channel, OpamParser.FullPos.main all of which obey this. The only additional function used is OpamParser.FullPos.value_from_string - when lexing changes are introduced, functions such as that will need a parameter to specify the permitted lexing version.

On top of this, there is then an additional future-proofing change to the library: if opam-version is greater than the library version, then the lexer and parser do a best-effort parse. In this case, Parse_error is only raised if opam-version is not first (since it must be 2.2+) or if it is repeated before the first lexer/parser error. This allows opam (and any other tool) to display a superior "this is an opam 2.2 file, so was ignored" rather than a lexer/parser error message which is more likely to result in users reporting bugs (to pinned repositories, opam-repository maintainers, etc.).

I added a few tests to ensure the Parse_error exceptions are raised. It could do with some more tests, although I have test_ed_ it! There is an additional slightly weird test which ensures that OpamBaseParser.version matches the build version.

It should be the case that the EOF token generated by the lexer on error
will immediately unravel the parser, however there's a tiny chance that
error recovery might cause the lexer to attempt reading some more
tokens. Therefore we store the position of the lexbuf and restore it
before the exception is raised.
Position the lexbuf's markers to point to the offending opam-version
line if a Parse_error is raised.
@dra27 dra27 added this to the 2.1.3 milestone Mar 26, 2021
@AltGr
Copy link
Member

AltGr commented Apr 16, 2021

LGTM. Feels a bit weird to have the parser explicitely refer to a specific field by string, with "opam" in its name;
nevertheless I think this was the right thing to do!

@AltGr AltGr merged commit 86f6841 into ocaml:master Apr 16, 2021
@dra27 dra27 deleted the 2.1-lexing branch April 22, 2021 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants