Skip to content

How to style-check comments in all packages in opam (a.k.a. "odoc lint") #228

@aantron

Description

@aantron

Some thoughts about a separate tool for this purpose, perhaps a shell or OCaml script in this repo, an improved version of the current corpus.sh.

TL;DR for @avsm, @rizo: I'm leaning towards a script that works on one package at a time, to run in opam's CI. It will:

  1. Run over the package's .cmti files naturally generated by its own build system.
  2. Blindly run over the package's .mli and .mld files, trying to compile them individually.
  3. Filter out duplicate errors.
  4. Show the result and succeed/fail according to what we think about the severity of having doc syntax errors.

The purpose of (1) is to use the build system's knowledge about project layout and preprocessing. The purpose of (2) is to stupidly try to cover anything that wasn't included by (1), and cover .mld files.

Note that we can do (1) and (2) whether the package build succeeds or not. (1) can still produce some .cmti files, and (2) will work for some .mli files.


Thought process starting from the last time I did a mass style check (more like whether-odoc-is-reasonable check), using corpus.sh:

  1. corpus.sh wants as large an opam switch as possible.

    The reason for this is that opam files already contain each package's own build instructions, and we want to build as many .cmtis that way as we can.

    This can be done with a command like

    opam install odoc --criteria='+count(solution)'
    

    It then goes over all the .cmtis and runs odoc compile on each one.

    This covers, to some extent, both packages that use Dune, and packages that don't, as long as they pass -bin-annot. In my experience, using a "typical" package as the "center" of the switch (odoc in the command above) installs about 75% of the packages.

The rest of the thoughts are about covering packages that aren't installed in (1), or files that aren't compiled.


  1. List all packages, and repeat (1) with "centers" that are outside already-installed switches.

    These consistent switches centered around different packages might have large intersections of packages that can be installed in both. We may want to install only each package and its dependencies in each next switch.

    There is also the complication that packages installed in each switch might not be at their latest version, so we probably need to check that.

    Also, some packages will not be buildable. This whole process needs to be done on a dedicated (virtual) machine or in a container, because some packages have system dependencies that need to be installed first.

    (recommended) A yet other way to do this is to rely on opam's CI, or a variant of it, and either take .cmti files from it, or just the generated error messages. We likely don't actually need to ever run this new corpus.sh outside opam CI anyway, and it's probably fine to run the script on one package at a time.

After solving (2), we should be checking as many .cmti files as we can get automatically compiled by each package's own build system.


  1. Keep the sources of each package. Also, if it is possible that opam ever doesn't download the sources of any package after (2), we should try to download them separately.

    Go through the sources, find .mld files, and run odoc compile on them.

    Find all .mli files in sources, subtract .mli files that we already have .cmti files for. Try to compile these .mli files using separate ocamlc commands, and then run odoc compile on the resulting .cmtis.

    This should cover all .mld files, and all .mlis that either were not used during successful builds in (1) or (2), or .mlis in projects where the builds failed, or could not be started due to constraints, except those .mlis that require preprocessing. We probably can't do anything intelligent about the latter.

    (not recommended by me) We should be able to find the source file of each .cmti by loading it using compiler-libs, and checking locations. At least Dune seems to put correct locations into the files, even in case of preprocessing (though this will need to be checked). However, even if the source file names are wrong, the only harm is that we will effectively try to process the corresponding .mli file twice, as long as no .cmti ever claims that the source file is a real .mli from which it was not compiled (thus shadowing it).

    (recommended) Alternatively, we can just blindly try to compile all .mlis in (3), and use postprocessing to remove duplicate errors.


We may want a mode that works on the master branch of each repo, as that might be more useful for knowing what still needs to be fixed (rather than only released). If going with doing this per-package, then the same script that will work in opam's CI should also work from the root directory of a repo checkout.


EDIT: This was prompted by #226.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions