Standardization of reST section markups via CI #3975

thatlittleboy · 2022-07-11T06:41:18Z

As follow-up from #3973 , and related discussion from #3932 .

Following the numpydoc styleguide, we will be adopting the style of no newline after the section header, like so:

    Examples
    ----------
    >>> pl.col("....")

instead of

    Examples
    ----------

    >>> pl.col("....")

The suggestion here is to add a check in our documentation-CI to enforce this style as a hard check moving forward.

thatlittleboy · 2022-07-17T07:00:52Z

Initially I was thinking of writing a CLI script to detect + autofix (which was the original method I used to fix #3973), but I eventually settled on writing a custom flake8 plugin for our Polars project so it integrates well with our current dev-tools.

It is designed to be extendible as well, so we can easily add more custom style checks in the future (not necessarily just for docstrings).

Just as a preview, these are the error codes I'm considering to implement:

POL001 No blank lines are allowed after section heading in docstring.
POL002 There should be at least 1 blank line above section heading in docstring.
POL003 Docstrings must end with an empty line.

ritchie46 · 2022-07-17T07:28:24Z

I did not know you can make custom flake8 plugins. That sounds like a nice addon and fits our CI.

Just as a preview, these are the error codes I'm considering to implement:

POL001 No blank lines are allowed after section heading in docstring.
POL002 There should be at least 1 blank line above section heading in docstring.
POL003 Docstrings must end with an empty line.

I believe this agrees with: https://numpydoc.readthedocs.io/en/latest/example.html 👍

zundertj · 2022-07-17T11:42:17Z

There is a flake8 plugin that checks for numpy doc style: https://pypi.org/project/flake8-docstrings, which allows you to do:

flake8 --docstring-convention numpy

Example output on utils.py:

./utils.py:1:1: D100 Missing docstring in public module
./utils.py:45:1: D205 1 blank line required between summary line and description
./utils.py:45:1: D400 First line should end with a period
./utils.py:72:1: D103 Missing docstring in public function
./utils.py:76:1: D103 Missing docstring in public function
./utils.py:81:1: D200 One-line docstring should fit on one line with quotes
./utils.py:81:1: D400 First line should end with a period
./utils.py:81:1: D401 First line should be in imperative mood
./utils.py:121:1: D205 1 blank line required between summary line and description
./utils.py:121:1: D400 First line should end with a period
./utils.py:121:1: D401 First line should be in imperative mood
./utils.py:130:1: D103 Missing docstring in public function
./utils.py:139:1: D200 One-line docstring should fit on one line with quotes
./utils.py:145:1: D103 Missing docstring in public function
./utils.py:245:1: D200 One-line docstring should fit on one line with quotes
./utils.py:245:1: D401 First line should be in imperative mood
./utils.py:252:1: D200 One-line docstring should fit on one line with quotes
./utils.py:252:1: D400 First line should end with a period
./utils.py:259:1: D401 First line should be in imperative mood
./utils.py:282:1: D401 First line should be in imperative mood; try rephrasing

Which seems useful overall. But it does not capture the empty lines below an Examples block...

Also, it does depend on pydocstyle, which is not really actively being developed at the moment: PyCQA/pydocstyle#575 But that may still be better than trying to roll our own parser & flake8-plugin?

thatlittleboy · 2022-07-17T13:00:34Z

@zundertj Thanks, yeah I saw that plugin as well during my initial research. Unfortunately that --docstring-convention flag seems to be specified merely for convenience of docstring parsing, rather than for checking numpydoc-specific styles.

flake8-docstrings seems to only check generic docstring violations. The three "style violations" that we're trying to capture (POL001, POL002, POL003 from above) don't seem to be covered by that plugin (as you rightly pointed out), and I don't quite agree with some of the violations introduced by flake8-docstrings anyway.
So my initial option was to write a script (which are just a couple of regex's actually..) to catch these violations.

But later I thought it is better to wrap these regex'es within a Flake8 plugin so that we don't have to make a separate script call in our Makefile (in make pre-commit). We can instead hook into the automated plugin discovery of flake8 so it runs alongside the other flake8 checks.
It's not terribly complicated, just some AST parsing coupled with some regex.

Ultimately the goal here is to achieve a consistent docstring style within the project (some of these style violations will lead to wrong rendered output, cf. some of my previous PRs; some violations are just a matter of style).
I see this as the same motivation for why we run black on our python files.

But I do empathize with the concern that this will be adding yet another thing we have to maintain in the long-term. If there exists a plugin/tool that can check for the aforementioned violations, I fully agree on the sentiment to rely on existing (well-maintained) tools.
Barring which, I suppose an alternative is having this flake8 plugin I'm writing live outside of polars as a separate project.
I'll just note that many mature projects will develop a need to implement custom linting/style enforcements eventually (see pandas, airflow). So I'll lean more towards hooking into existing infrastructure from the get-go (note: doesn't have to be flake8) rather than invoking many scripts independently.

zundertj · 2022-07-17T15:02:00Z

Just flagging that there are some similar tools. Good to know you did you come across that as well, and it indeed doesn't solve our use case, although one would hope it does. So it is not surprising that every (big) Python project ends up rolling their own docstring validation script.

To mitigate the maintenance argument, I think it would best to make it an external package:

if it ends up being too complicated, we can drop it relatively easy.
there is clearly a need for a good package in the ecosystem, so other projects may want to adopt at some point and this could spread the development burden; the barrier to contribute from other projects is higher if it lives inside the polars repo, and there is clearly less visibility.
development of the docstring plugin does not have to go via this repository, so the focus stays here on actually building polars.

Downside is that a separate repo and package needs to be set up, but it seems you are up for that task? And as you said, can just be a couple of regex to start with, not expecting a full blown linter. Hooking into flake8 as a plugin seems sensible to me.

thatlittleboy · 2022-07-17T15:20:20Z

Sure, I think that makes sense. Will update here when I get around to doing that.

thatlittleboy · 2022-07-17T16:58:03Z

Okay, maybe the whole flake8 plugin business isn't necessary after all. 😅 Just realized there is validation built right into numpydoc, that should get us at least 50% of the way there.

We can either trigger the docstring validation during sphinx build, looks a little like this:

The sphinx build fails eagerly (which might be useful in CI ?).

Or use something like numpydoc-validation to run the validation as an external check on the whole py-polars package (instead of hooking into the sphinx build) -- probably more user-friendly for fixing documentation errors.

Given that this is directly from the numpydoc team, I suppose we can just go with whatever they are using rather than our own interpretation of their style (though, it doesn't seem like all of their error codes are compliant with their example..)

stinodego · 2022-07-25T21:15:15Z

I just found this issue after opening my PR #4155. I looked into pydocstyle, which is behind flake8-docstrings.

Regarding the lints you are looking for:

Number 1 is present as D412
Number 2 is present as D410/D411 (though there are edge cases where this check fails).
Number 3 is present as D413.

So I really don't see the need for a custom tool here!

zundertj added the documentation Improvements or additions to documentation label Jul 16, 2022

thatlittleboy mentioned this issue Jul 27, 2022

Docstring improvements & enable lints #4161

Merged

ritchie46 closed this as completed in #4161 Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardization of reST section markups via CI #3975

Standardization of reST section markups via CI #3975

thatlittleboy commented Jul 11, 2022

thatlittleboy commented Jul 17, 2022

ritchie46 commented Jul 17, 2022

zundertj commented Jul 17, 2022

thatlittleboy commented Jul 17, 2022

zundertj commented Jul 17, 2022 •

edited

Loading

thatlittleboy commented Jul 17, 2022

thatlittleboy commented Jul 17, 2022 •

edited

Loading

stinodego commented Jul 25, 2022

Standardization of reST section markups via CI #3975

Standardization of reST section markups via CI #3975

Comments

thatlittleboy commented Jul 11, 2022

thatlittleboy commented Jul 17, 2022

ritchie46 commented Jul 17, 2022

zundertj commented Jul 17, 2022

thatlittleboy commented Jul 17, 2022

zundertj commented Jul 17, 2022 • edited Loading

thatlittleboy commented Jul 17, 2022

thatlittleboy commented Jul 17, 2022 • edited Loading

stinodego commented Jul 25, 2022

zundertj commented Jul 17, 2022 •

edited

Loading

thatlittleboy commented Jul 17, 2022 •

edited

Loading