-
Notifications
You must be signed in to change notification settings - Fork 99
Description
Some thoughts about a separate tool for this purpose, perhaps a shell or OCaml script in this repo, an improved version of the current corpus.sh
.
TL;DR for @avsm, @rizo: I'm leaning towards a script that works on one package at a time, to run in opam's CI. It will:
- Run over the package's
.cmti
files naturally generated by its own build system. - Blindly run over the package's
.mli
and.mld
files, trying to compile them individually. - Filter out duplicate errors.
- Show the result and succeed/fail according to what we think about the severity of having doc syntax errors.
The purpose of (1) is to use the build system's knowledge about project layout and preprocessing. The purpose of (2) is to stupidly try to cover anything that wasn't included by (1), and cover .mld
files.
Note that we can do (1) and (2) whether the package build succeeds or not. (1) can still produce some .cmti
files, and (2) will work for some .mli
files.
Thought process starting from the last time I did a mass style check (more like whether-odoc-is-reasonable check), using corpus.sh
:
-
corpus.sh
wants as large an opam switch as possible.The reason for this is that opam files already contain each package's own build instructions, and we want to build as many
.cmti
s that way as we can.This can be done with a command like
opam install odoc --criteria='+count(solution)'
It then goes over all the
.cmti
s and runsodoc compile
on each one.This covers, to some extent, both packages that use Dune, and packages that don't, as long as they pass
-bin-annot
. In my experience, using a "typical" package as the "center" of the switch (odoc
in the command above) installs about 75% of the packages.
The rest of the thoughts are about covering packages that aren't installed in (1), or files that aren't compiled.
-
List all packages, and repeat (1) with "centers" that are outside already-installed switches.
These consistent switches centered around different packages might have large intersections of packages that can be installed in both. We may want to install only each package and its dependencies in each next switch.
There is also the complication that packages installed in each switch might not be at their latest version, so we probably need to check that.
Also, some packages will not be buildable. This whole process needs to be done on a dedicated (virtual) machine or in a container, because some packages have system dependencies that need to be installed first.
(recommended) A yet other way to do this is to rely on opam's CI, or a variant of it, and either take
.cmti
files from it, or just the generated error messages. We likely don't actually need to ever run this newcorpus.sh
outside opam CI anyway, and it's probably fine to run the script on one package at a time.
After solving (2), we should be checking as many .cmti
files as we can get automatically compiled by each package's own build system.
-
Keep the sources of each package. Also, if it is possible that opam ever doesn't download the sources of any package after (2), we should try to download them separately.
Go through the sources, find
.mld
files, and runodoc compile
on them.Find all
.mli
files in sources, subtract.mli
files that we already have.cmti
files for. Try to compile these.mli
files using separateocamlc
commands, and then runodoc compile
on the resulting.cmti
s.This should cover all
.mld
files, and all.mli
s that either were not used during successful builds in (1) or (2), or.mli
s in projects where the builds failed, or could not be started due to constraints, except those.mli
s that require preprocessing. We probably can't do anything intelligent about the latter.(not recommended by me) We should be able to find the source file of each
.cmti
by loading it usingcompiler-libs
, and checking locations. At least Dune seems to put correct locations into the files, even in case of preprocessing (though this will need to be checked). However, even if the source file names are wrong, the only harm is that we will effectively try to process the corresponding.mli
file twice, as long as no.cmti
ever claims that the source file is a real.mli
from which it was not compiled (thus shadowing it).(recommended) Alternatively, we can just blindly try to compile all
.mli
s in (3), and use postprocessing to remove duplicate errors.
We may want a mode that works on the master
branch of each repo, as that might be more useful for knowing what still needs to be fixed (rather than only released). If going with doing this per-package, then the same script that will work in opam's CI should also work from the root directory of a repo checkout.
EDIT: This was prompted by #226.