-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using ruff to format code examples in docstrings #7146
Comments
This sounds really useful and something our formatter could do. However, I don't expect us to make progress on this in the near future but happy to talk external contributors through on how this could be implemented. One question that comes to mind is what do with code that has syntax errors but the easiest is to just leave such code unchanged. |
Without any knowledge of the code base: I think Examples blocks can already be detected as there are some docstring formatting rules in >>> pl.date_range(
... date(2023, 1, 1), date(2023, 5, 1), "1mo", eager=True
... ).dt.month_end() You'll probably find more refined logic by looking at the
I would imagine code examples with syntax errors raise an error, and would lead to that specific example to not be autofixed. |
I guess this brings up the question whether this should be a lint rule with an autofix or integrated into the formatter (runs automatically on save)? I assumed that this would be similar to the linked project and be part of the formatter. The big challenge will be to implement fast parsing and integrate it in the otherwise already extensive docstring formatting ruff/crates/ruff_python_formatter/src/expression/string.rs Lines 787 to 905 in c05e462
That's my intuition but the question is would the rest of the file still be formatted? I would say yes, but the formatter then needs a way to raise a warning which we lack today. |
I would imagine this would have to be part of the autoformatter.
Yes, definitely. Other code examples in the same docstring could probably still be formatted, even. Each example is a self-contained Python interpreter command. |
@stinodego Out of curiosity, was there anything that pushed you to use blackdoc instead of blacken-docs? Also, do you use |
I didn't pick blackdoc myself, it was introduced to the project at some point by someone else and I never thought to try other tools. I had heard of blacken-docs but have never tried it myself. It seems to do more than I need. We currently only format Python in docstring examples. We don't autoformat code in our README, and code in our other docs is magically imported from Python files, so we can use ruff autoformatter for those. |
I spent some time looking into this and I think I've come up with a rough proposal on how to move this forward. Straw implementation proposalTactically, my plan is to follow @MichaReiser's suggestion above For example, all the following Python snippets have doc strings that would be Markdown docstrings with Python snippets: def foo():
'''
Docs about foo.
```python
def quux():
x = do_something(foo())
return x + 1
```
'''
pass Markdown docstrings with pycon ("Python console") snippets: def foo():
'''
Docs about foo.
```pycon
>>> def quux():
... x = do_something(foo())
... return x + 1
```
'''
pass reStructuredText docstrings with Python snippets: def foo():
'''
Docs about foo.
.. code-block:: python
def quux():
x = do_something(foo())
return x + 1
'''
pass reStructuredText docstrings with pycon ("Python console") snippets: def foo():
'''
Docs about foo.
.. code-block:: pycon
>>> def quux():
... x = do_something(foo())
... return x + 1
'''
pass It's plausible we'll want to support more than this, perhaps even in the Prior artI believe the main prior art in this space is blacken-docs and blackdoc. It looks like Testing planIn addition to the usual snapshot tests we can write by hand, I think another Django ran into some interesting challenges when switching over to We should also be mindful of transformations that could make the overall Maybe should be in scopereStructuredText literal blocksWe will probably also want to support plain literal blocks. The Django def foo():
'''
Docs about foo.
Here's an example that uses foo::
def quux():
x = do_something(foo())
return x + 1
'''
pass The above represents a literal block where in theory any kind of "code-like" Plain doctestsI went and looked through CPython and Django source code to get a "sense" of def _property_resolver(arg):
"""
When arg is convertible to float, behave like operator.itemgetter(arg)
Otherwise, chain __getitem__() and getattr().
>>> _property_resolver(1)('abc')
'b'
>>> _property_resolver('1')('abc')
Traceback (most recent call last):
...
TypeError: string indices must be integers
>>> class Foo:
... a = 42
... b = 3.14
... c = 'Hey!'
>>> _property_resolver('b')(Foo())
3.14
"""
pass This doesn't look like a one-off to me. They appear quite commonplace. It also Not in scopeA way to toggle formatting of individual code blocksThis is a feature requested in Formatting Python code in other formatsThis means you won't be able to run the formatter over a Avoid Markdown/reStructuredText parsingI think that trying to parse docstrings as Markdown or reStructuredText With that said, I do think this means that our detection logic will necessarily |
Great to see this become more concrete! For our specific use case, the "plain doctests" you mention are most important. We follow the numpy docstring standard, and most of our docstrings look something like: def cool_stuff(arg):
"""
Do cool stuff.
Parameters
----------
arg
Some description.
Examples
--------
Cool stuff with an integer.
>>> cool_stuff(1)
2
Cool stuff with a string.
>>> input = "q"
>>> cool_stuff(input)
'x'
"""
pass Reference to the NumPy doc guidelines: Example file from the Polars code base with lots of docstrings with example sections: |
Could you repurpose the format options for this case, setting the format docstring code option to false when we recurse? |
@stinodego Ah that's useful feedback, thank you! I think that means it makes sense to have plain doctests in the initial scope. @konstin Yeah that sounds like a good idea. :-) |
@stinodego So I ran the docstring code formatter (from #8811) on polars, and here's the diff I got: BurntSushi/polars@559b9d6 Most things seem to be unwrapping wrapped lines. Thoughts? |
## Summary This PR adds opt-in support for formatting doctests in docstrings. This reflects initial support and it is intended to add support for Markdown and reStructuredText Python code blocks in the future. But I believe this PR lays the groundwork, and future additions for Markdown and reST should be less costly to add. It's strongly recommended to review this PR commit-by-commit. The last few commits in particular implement the bulk of the work here and represent the denser portions. Some things worth mentioning: * The formatter is itself not perfect, and it is possible for it to produce invalid Python code. Because of this, reformatted code snippets are checked for Python validity. If they aren't valid, then we (unfortunately silently) bail on formatting that code snippet. * There are a couple places where it would be nice to at least warn the user that doctest formatting failed, but it wasn't clear to me what the best way to do that is. * I haven't yet run this in anger on a real world code base. I think that should happen before merging. Closes #7146 ## Test Plan * [x] Pass the local test suite. * [x] Scrutinize ecosystem changes. * [x] Run this formatter on extant code and scrutinize the results. (e.g., CPython, numpy.)
@BurntSushi Thanks so much for your work - this is great! Looking forward to putting this into practice soon. About the diff you posted:
I think that accounts for most/all of the diff! |
Yeah I came across this during my work and it seemed like trimming the |
I don't feel very strongly either way about the trailing Seems like it occurs with indentation differences, so when defining functions or using a context manager. Those are relatively rare, at least in our docs. I'd be fine with either pruning the trailing The line length differences is a more fundamental issue - I commented my thoughts on the separate issue you opened. |
I don't have any experience writing examples in Python myself. So I can't make a recommendation on whether we should enforce the extra line or not. Implementation wise I see two options for enforcing the extra line and I'm fine with either, although I would probably prefer 1. because the logic may not apply to other example formats and is closer to the problem its solving.
|
To make sure we don't lose track of the decision about the trailing empty line or not, I created an issue about it: #8908 @MichaReiser Aye. I think we can't unconditionally add a trailing line, so we'll need a heuristic, but I also favor that approach if we decide to go that route. And in particular, if we do need a heuristic, then we probably have to go route (1) since we'll probably want to base the decision on things like "were there any other non-empty PS2 prompt lines." |
This PR does the plumbing to make a new formatting option, `docstring-code-format`, available in the configuration for end users. It is disabled by default (opt-in). It is opt-in at least initially to reflect a conservative posture. The intent is to make it opt-out at some point in the future. This was split out from #8811 in order to make #8811 easier to merge. Namely, once this is merged, docstring code snippet formatting will become available to end users. (See comments below for how we arrived at the name.) Closes #7146 ## Test Plan Other than the standard test suite, I ran the formatter over the CPython and polars projects to ensure both that the result looked sensible and that tests still passed. At time of writing, one issue that currently appears is that reformatting code snippets trips the long line lint: https://github.com/BurntSushi/polars/actions/runs/7006619426/job/19058868021
At @pola-rs , we currently use
ruff
in combination withblack
for our Python formatting needs. We also useblackdoc
for formatting our docstring code examples. Out of these three,blackdoc
is by far the slowest.With
ruff
gaining many of the capabilities of theblack
formatter, it would be great if we could also replaceblackdoc
soon and only useruff
.The text was updated successfully, but these errors were encountered: