This file is incomplete, but may have interesting tips if you want to start hacking on dolmen. If you're having troubles, don't hesitate to contact me directly, ^^
You need to have ocaml installed to compile dolmen. Additionally, some ocaml dependencies are needed. The easiest way to get a working ocaml installation, and install the dependencies is to use opam. See this page for information about how to install opam on your system.
Once you have opam installed, you can use the following commands to isntall the needed dependencies to build dolmen:
opam pin add -n .
opam install --deps-only .
You can then build the project with
make
Most tests are setup through dune files, associated with tune "runtest" alias.
To run the tests:
make test
To promote the results of tests (if e.g. you have changed some error message, or added new tests):
make promote
There a few (though not that many), unit tests. These are basically to test internal functions of the library, in order to check internal assumptions that may otherwise not be easy to check with complete parsing/typing tests.
Follow the example in tests/unit/bitv/
to add more unit tests.
There are quite a lot of tests for parsing, typing and other features that make
use of the dolmen
binary to check the behaviour of the whole library. Some of
these tests verify that some well-formed inputs are correctly parsed/typed, but
most of the tests are there to check the behaviour in case of an error. It is
planned that in the long term, each error message should have a corresponding
test case (see coverage tests).
These tests are found in the subfolders tests/
(except for unit
which
is reserved for unit tests of the library functions). These tests are run by
dune, and a script is used to generate the dune files, so that adding a new
test is as easy as creating a file with an extension recognized by dolmen
under the tests
folder (in any subfolder, since they are scanned recursively
by the script).
The check performed for each of these test files is the following:
- given a test file
foo.ext
, there must exists afoo.expected
corresponding file (if it does not exists, it will be created as an empty file by the script) - for both incremental mode and full mode, do the following
- run dolmen on the input file, using additional options from the flags.dune file (one option per line, with no space)
- if
foo.expected
is empty, check that dolmen exited with error code 0, else, check that dolmen exited with error code 0 or 1 - compare the output of dolmen with the
foo.expected
file
This is done using bisect_ppx
. Once you have it installed (usually via opam),
just run make coverage
and it should open the coverage report in a browser
window for you (via xdg-open).
Dolmen now has support for customizing syntax error messages, using menhir's mechanism for syntax errors. If you want to contribute and/or help, you might first want to read menhir's manual about the error mecanism (and/or LR(1) parsers in general if you're not familiar with them).
Adding new error messages is as simple as editing the syntax.messages
file, and then re-building the project. Each error message in the
syntax.messages
is composed of the following:
- an input sentence (in the form of a list of tokens) that makes the parser automaton go into some error state
- a comment detailing the state of the parser at that point, most notably
the production being reduced/recognized; this, together with the
parseFoo.mly
file, should help one understand why the error happens, and what token would allow the parser to continue at that point - the error message properly. Dolmen currently has the following convention for how messages are written:
XXX
production parsed (on a single line)
what is expected at that point,
possibly on multiple lines
with on the first line an error message number (typically with 3 number, i.e. 042
),
on the second line the name of the construction being parsed (e.g. a term, a sort,
a function definition, etc...), and finally on the lines after that a description
of what would have been acceptable inputs (warning: there must be no blank lines,
else menhir will have trouble parsing the file).
Each syntax error must be exercised by at least one test file, typically in the
following file:
tests/parsing/language/version/errors/XXX_some_informal_description_of_the_error.ext
.
It is always better to have more than one test case, for instance, by varying the token
that produces the error (in the syntax.messages
file, the last token of each sentence
is assured to produce an error, but it is most ot the time not the only one; dolmen already
has a mechanism to inform the user of what token was read, so the error message written
should be agnostic of what the bad token is).
Note the following: dolmen currently has two modes of reading files: incremental and full.
In full mode, the whole file is read and parsed, and then each sentence sent for processing,
whereas in incremental mode, each sentence is read, parsed and processed before the next one
is read. Due to this, all parsers in dolmen have two main entry point: one entry point for
parsing a whole file, and one for parsing a single input sentence[1]. This will often
make it so that there are the same sentences twice in a *.messages
file, once for
for entry point of the file and once for the entry point of a single sentence. These
two error cases must share the exact same error message (i.e. same error number, prodution
parsed, etc...). This coherence is automatically and always checked by the tests which use
both incremental mode and full mode and expect the same output in both cases.
On syntax error message numbers (i.e. the XXX
in error messages):
- for uniformity, they are all on 3 numbers, which should be enough for all reasonable languages (mor than 1000 hand-written error messages seem a lot), but there should be no real limitation
- as much as possible, the error numbers should be contiguous starting from 0, however because editing messages by hand is already quite tiresome (and changing an error number means renaming test files that exercize it), it's not a problem is some numbers are unused/skipped
[1]: sadly, some languages (currently only the alt-ergo native format) cannot support incremental parsing because it is not possible to know when a sentence ends, except when another sentence begins. Concretely, this means such languages would have end-of-stream conflicts according to menhir. For those languages, inremental mode is not supported, but this should be an exception.
When changing the grammar/parser for a language that has hand-written error
messages, you'll need to update the syntax.messages
file according to the new
grammar. To do so, the simplest way is to run dune build @runtest
, which will
check that the syntax.messages
is correct with regards to the new grammar,
and if not, will print a diff to update the syntax.messages
file. Running
dune build @runtest --promote
will promote the diff and change the
syntax.messages
file.
Release checklist/workflow:
- update version number in
VERSION
- update version number in src/bin/main.ml
- update
CHANGES.md
with the new version number - commit the changes
- run
dune-release lint
- run
dune-release tag
- run
dune-release distrib
- run
dune-release publish distrib
- run
dune-release opam pkg
- run
dune-release opam submit
- add line for new version in
doc/scripts/index.txt
- run
./doc/script/release
to update the doc on github-pages