Infer docstring style #5

analog-cbarber · 2022-01-31T16:06:39Z

It would be nice if we could simply infer the doc string style automatically and choose the appropriate
parser. This would especially be useful when:

the code is the result of combining code from other projects using different doc styles
you are changing the official doc-string style and don't want to have to convert all your strings at once
you are subclassing from another library using a different doc string format and want to include inherited members

pawamoy · 2022-01-31T20:35:54Z

Allow me to move this over to griffe 🙂
EDIT: oh, it seems the issue cannot go out of the organization. Then I'll move griffe into it.

pawamoy · 2022-01-31T20:48:00Z

So, nice idea! Currently, each object can have its own docstring style attached. But you listed some cases that would still not be supported by this. I'll think about it 🙂

francois-rozet · 2022-04-18T15:47:44Z

Instead of inferring the docstring style at once, it could be easier to infer the section styles independently. The rationale is that the style of a section is usually clear while the style of a docstring could be ambiguous. In fact, I've seen mix-styled docstrings.

This approach could actually simplify the way griffe parses docstrings as a lot of the parsing is common to all styles. It would also become easier to add parsing rules for custom sections.

For the inference in itself, instead of choosing a single style in mkdocs.yml you could rank the styles in the order of preference, and try each of them until one matches.

pawamoy · 2022-04-20T21:40:03Z

This approach could actually simplify the way griffe parses docstrings as a lot of the parsing is common to all styles. It would also become easier to add parsing rules for custom sections.

Could you expand on that? It's true that the Google and Numpy parsers have a lot in common, though subtle differences remain. What do you mean by "parsing rules for custom sections"?

francois-rozet · 2022-04-21T07:15:21Z

This approach could actually simplify the way griffe parses docstrings as a lot of the parsing is common to all styles. It would also become easier to add parsing rules for custom sections.

I was thinking of having a docstring parse function common to all styles, which calls the "section" parsers implemented for each styles.

Thinking back on that, it would work only if the space policy between sections is the same across all styles. But for similar enough styles you could "share" that docstring-level parse function and change arbitrarily the parsing style for each section.

What do you mean by "parsing rules for custom sections"?

If I understand correctly, the sections that are availabe are limited by the DocstringSection classes (Parameters, Returs,...) currently implemented which are mapped with section titles. My idea was to allow parameters-like, returns-like, examples-like, note-like ... section parsing with different section titles by letting the user add rules "section title -> parser". sphinx.ext.napoleon allows to do that.

Also, maybe some current section could be merged? I am thinking of Returns and Yields.

pawamoy · 2022-04-21T08:49:18Z

I'm not sure supporting multiple-style docstrings is worth the effort. Even if you're switching from one style to another in a big codebase, docstrings themselves are usually short enough that you can stick to a single style, and not just switch one or two sections in the docstring. And if it's not about switching styles, then I'd consider it a very niche use-case, again not worth the effort.

Your argument about factorizing the code still stands though. It's not the prettiest, DRYest code I've written, it can surely be refactored to something more elegant and reusable 🙂

allow parameters-like, returns-like, examples-like

Custom titles for sections are already supported! There's an example here in Griffe's own docs (unfold the source code). I don't think more customization is welcomed in docstring parsers. If you deviate from the spec (Google, Numpy, Sphinx, etc.), then all the tooling around docstrings stop working (linters, IDEs, ...). Just my opinion of course, happy to read counter-examples!

analog-cbarber · 2022-04-21T13:44:51Z

Tooling for docstring styles is not so great in many tools. (e.g. PyCharm still doesn't support google style very well), but obviously no one would recommend that anyone intentionally use multiple styles in the same doc string, yet it does happen sometimes.

It makes sense for linters and the like to be more strict about doc-string styles, but people want their doc generator to just magically do the "right thing" if at all possible.

I think that it would be fantastic if inference was just at the docstring level, but it would be even nicer if it could be done per section.

However, maybe anyone who wants this should provide some real-world examples doc-strings using multiple styles.

pawamoy · 2022-04-27T17:43:09Z

Here's some very rough/ugly script to count styles in packages docstrings:

import os
import re
from contextlib import suppress
from griffe.exceptions import AliasResolutionError
from griffe.loader import GriffeLoader

# build manually or dynamically, from PDM/Poetry/pip cache for example
packages_paths = []

numpy_docstrings = 0
google_docstrings = 0
sphinx_docstrings = 0
unknown_docstrings = 0
markdown_docstrings = 0
rst_docstrings = 0

md_inline_code = re.compile(r"`[^`]+`")
rst_inline_code = re.compile(r"``[^`]+``")

def prompt_docstring(docstring):
    global numpy_docstrings
    global google_docstrings
    global sphinx_docstrings
    global unknown_docstrings
    global markdown_docstrings
    global rst_docstrings
    if docstring:
        for markup in (
            ":meth:",
            ":func:",
            ":attr:",
            ":mod:",
            ":class:",
            ":raises:",
            ":raise ",
            ":note:",
            ":param ",
            ":return:",
            ":returns:",
            ":rtype:",
        ):
            if markup in docstring.value:
                sphinx_docstrings += 1
                return
        for markup in (
            "Args:\n  ",
            "Arguments:\n  ",
            "Attributes:\n  ",
            "Raises:\n  ",
            "Returns:\n  ",
            "Yields:\n  ",
            "Example:\n  ",
            "Examples:\n  ",
        ):
            if markup in docstring.value:
                google_docstrings += 1
                return
        for markup in (
            "Args\n----",
            "Arguments\n---------",
            "Attributes\n----------",
            "Parameters\n----------",
            "Raises\n------",
            "Returns\n-------",
            "Yields\n------",
            "Methods\n-------",
        ):
            if markup in docstring.value:
                numpy_docstrings += 1
                return
        for markup in (
            "```\n",
        ):
            if markup in docstring.value:
                markdown_docstrings += 1
                return
        for markup in (
            ".. autofunction::",
            ".. code::",
            ".. todo::",
            ".. note::",
            ".. warning::",
            ".. versionchanged::",
            ".. versionadded::",
            "::\n\n    ",
        ):
            if markup in docstring.value:
                rst_docstrings += 1
                return

        if rst_inline_code.search(docstring.value):
            rst_docstrings += 1
            return
        if md_inline_code.search(docstring.value):
            markdown_docstrings += 1
            return
        
        unknown_docstrings += 1


def iter_docstrings(obj):
    try:
        prompt_docstring(obj.docstring)
    except AliasResolutionError:
        return
    for member in obj.members.values():
        if not member.is_alias:
            iter_docstrings(member)


if __name__ == "__main__":
    loader = GriffeLoader(allow_inspection=False)
    for index, package_path in enumerate(packages_paths, 1):
        print(f"\r{index}/{len(packages_paths)}", end="")
        with suppress(Exception):
            package = loader.load_module(package_path)
            iter_docstrings(package)

    print("Google", google_docstrings)
    print("Numpy", numpy_docstrings)
    print("Sphinx", sphinx_docstrings)
    print("Markdown", markdown_docstrings)
    print("RST", rst_docstrings)
    print("Unknown", unknown_docstrings)

It does not search for multi-style docstrings. To do it, the words search should probably be replaced with regexes, and all regexes should be tested.

It outputs something like this:

Google 2948
Numpy 150
Sphinx 8665
Markdown 5544
RST 4883
Unknown 58256

Note that most of the packages installed on my machine are not data-science libraries, that's probably why there's so few Numpy-style docstrings. Unknown docstrings are docstrings for which it didn't detect any specific markup. They could be used as Markdown or reStructuredText. Also, the numbers here are really not to take seriously, as multiple versions of the same packages were scanned.

pawamoy · 2023-04-12T14:03:44Z

Note: see #132 (comment), we could use cdd to auto-detect style and parse accordingly.

…auto detection feature Issue-5: #5

Issue-5: #5

pawamoy · 2024-08-09T14:24:42Z

It will available in next Insiders version.

SamuelMarks · 2024-08-11T02:04:50Z

Aww, you didn't use my lib

pawamoy · 2024-08-11T04:32:09Z

@SamuelMarks no indeed, as I couldn't find anything public in your library that would allow me to do this. But maybe I missed it?

pawamoy · 2024-08-11T04:39:04Z

Ah, I now see derive_docstring_format, but the tokens are quite limited. Even if they weren't, I wouldn't want to depend on your entire library just for this tiny part 😅

analog-cbarber added the feature New feature or request label Jan 31, 2022

pawamoy transferred this issue from mkdocstrings/pytkdocs Jan 31, 2022

pawamoy added the griffe: docstrings Related to docstring parsing label Feb 3, 2022

astrojuanlu mentioned this issue Mar 11, 2023

Support docstring formats: Google, NumPy, ReST #132

Closed

i-aki-y mentioned this issue Mar 4, 2024

[Doc] Style of parameter tables in API reference is being broken albumentations-team/albumentations#1555

Closed

pawamoy added the insiders Candidate for Insiders label Jun 8, 2024

pawamoy self-assigned this Jun 8, 2024

pawamoy added a commit that referenced this issue Aug 9, 2024

refactor: Add DocstringStyle literal type to prepare docstring style …

b7aaf64

…auto detection feature Issue-5: #5

pawamoy added a commit that referenced this issue Aug 9, 2024

refactor: Finish preparing docstring style auto-detection feature

03bdec6

Issue-5: #5

pawamoy closed this as completed Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infer docstring style #5

Infer docstring style #5

analog-cbarber commented Jan 31, 2022

pawamoy commented Jan 31, 2022

pawamoy commented Jan 31, 2022

francois-rozet commented Apr 18, 2022 •

edited

Loading

pawamoy commented Apr 20, 2022 •

edited

Loading

francois-rozet commented Apr 21, 2022 •

edited

Loading

pawamoy commented Apr 21, 2022 •

edited

Loading

analog-cbarber commented Apr 21, 2022

pawamoy commented Apr 27, 2022 •

edited

Loading

pawamoy commented Apr 12, 2023

pawamoy commented Aug 9, 2024

SamuelMarks commented Aug 11, 2024

pawamoy commented Aug 11, 2024

pawamoy commented Aug 11, 2024

Infer docstring style #5

Infer docstring style #5

Comments

analog-cbarber commented Jan 31, 2022

pawamoy commented Jan 31, 2022

pawamoy commented Jan 31, 2022

francois-rozet commented Apr 18, 2022 • edited Loading

pawamoy commented Apr 20, 2022 • edited Loading

francois-rozet commented Apr 21, 2022 • edited Loading

pawamoy commented Apr 21, 2022 • edited Loading

analog-cbarber commented Apr 21, 2022

pawamoy commented Apr 27, 2022 • edited Loading

pawamoy commented Apr 12, 2023

pawamoy commented Aug 9, 2024

SamuelMarks commented Aug 11, 2024

pawamoy commented Aug 11, 2024

pawamoy commented Aug 11, 2024

francois-rozet commented Apr 18, 2022 •

edited

Loading

pawamoy commented Apr 20, 2022 •

edited

Loading

francois-rozet commented Apr 21, 2022 •

edited

Loading

pawamoy commented Apr 21, 2022 •

edited

Loading

pawamoy commented Apr 27, 2022 •

edited

Loading