Skip to content

Compose literals for argument values in docstring #2668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jan-grimo
Copy link

Description

Closes #2667 by addition of a compose_literal function that attempts to handle the generation of valid Python literals for default arguments in docstrings. Can handle built-in data structure nesting by recursion. Generates literals for bound enums, replaces non-enum bound class instances with an ellipsis.

Suggested changelog entry:

Attempt to generate syntactically valid literals for default function arguments in docstrings

@jan-grimo jan-grimo force-pushed the arg-literals branch 2 times, most recently from 123ebfc to 97f3ad5 Compare November 16, 2020 10:51
@jan-grimo
Copy link
Author

Seeking review @henryiii

@YannickJadoul
Copy link
Collaborator

It's interesting to see this, but I'm afraid it has practically no chance of getting merged. A few reasons:

  • @wjakob has repeatedly expressed that he's not willing to increase the binary size of all pybind11-generated extension modules, just to make the docstrings perfect.
  • This solution seems reasonably specific and long, while there's no way for users to opt-in/out to this, or extend this.
  • Isn't that what __repr__ is for, anyway, to have a full, detailed specification of a value?
  • py::argv already allows doing this, so in principle, it's not critical for this to be added to pybind11's core functionality. I suppose you could even make your own function/subclass/custom literal/... that creates the argv instance/docstring values you want?

Do you have a specification/example/... of what e.g. mypy or Sphinx expect?
If so, there were some ideas to figure out a way to have users opt-in or opt-out of changes that increase the binary size. Maybe we can add this to the list of docstring things we'd like to do, once this is possible?

@jan-grimo
Copy link
Author

jan-grimo commented Nov 17, 2020

Thank you for the detailed explanation @YannickJadoul!

not willing to increase the binary size of all pybind11-generated extension modules, just to make the docstrings perfect

Perfectly understandable, especially considering I offered no configuration to the modifications I made.

Isn't that what repr is for, anyway, to have a full, detailed specification of a value?

Quoting the docs:

If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description...> should be returned.

There's an escape hatch in there. The method does not require you to generate a valid Python literal that would evaluate back to the same value. Choosing __repr__ as the closest approximation to getting valid literals into docstrings is admittedly a good and economic choice in that for the built-in Python types it does generate valid literals, but for everything else, placing this string representation into the function signature is - strictly speaking - incorrect. Conversely, implying that every user-defined class should implement literal-generating __repr__ methods is, as the documentation suggests, not always possible: Besides classes that are too complex to express in terms of a constructor and built-in literals, for instance, as far as I can tell, I cannot change the __repr__ method of bound enums. I would argue replacement by ellipsis in docstrings is the better alternative to angle-bracket __repr__s (with the exception of bound enums, where all the necessary information is always available).

py::argv already allows doing this

I can see why you're suggesting this customization point as an alternative to my blanket solution suppressing user-defined __repr__ functions. I must admit I wasn't aware of it and I agree that it's a viable path, but I suppose I was hoping to minimize my maintenance effort. I think the point remains, and I think you agree, that in principle the docstrings could be better.

Do you have a specification/example/... of what e.g. mypy or Sphinx expect?

I admittedly don't. I tried mypy's stubgen on a pybind module of mine, saw that the generated results are clearly syntactically wrong or not satisfactory - f(*args, **kwargs) -> Any everywhere - and immediately moved on to pybind11-stubgen, a standalone project trying to process the generated docstrings into valid typechecking stubs. There too, the angle-bracket representations of enum values trip up the parser and generate incorrect stubs. I suppose I was trying to fix the encountered problem at the source.

If so, there were some ideas to figure out a way to have users opt-in or opt-out of changes that increase the binary size. Maybe we can add this to the list of docstring things we'd like to do, once this is possible?

I would appreciate that. Do you see any way forward for this PR?

@jan-grimo
Copy link
Author

It occurred to me that the only reason this PR is as large as it is, is because I'm somewhat insistent on handling bound enums. If I gave that up, the minimum solution to getting syntactically valid docstrings might be as simple as string-replacing anything in the result of __repr__ in angle brackets with an ellipsis. Might that be viable?

@jan-grimo
Copy link
Author

@sizmailov Both #2243 and #2244 strike me as necessary additions to any undertakings of generating fully valid docstrings. I've only skimmed #2244, but do you think that the same functionality is implementable at your suggested customization point?

@sizmailov
Copy link
Contributor

sizmailov commented Nov 17, 2020

Sorry, I gave a short glance at your PR without thinking too much and left non-sense (already deleted) comment.

I didn't realize that at the moment of docstring construction we deal with python objects which not necessarily have bare C++ counterpart, which disables one from introducing template-based customization point.

py::arg/py::arg_v are another candidates to do the trick. In fact I just realized that to some degree this customization point already exists since we can overload py::arg::operator=(T&& value):

namespace pybind11 {
    template<>
    py::arg_v arg::operator=(int &&value) const {
        return {std::move(*this), std::forward<int>(value), "my_custom_int_repr"};
    }
}

PYBIND11_EMBEDDED_MODULE(example, m) {
    m.def("add", [](int a, int b) { return a + b; }, py::arg("a"), py::arg("b") = 1);
   // renders `add(a: int, b: int = my_custom_int_repr) -> int`
}

This seems to be enough to customize behavior for bound C++ types and python builtin types (using py::handle as a template parameter and implementation from this PR).

@jan-grimo
Copy link
Author

I'm going to close this since the customization point suggested by @sizmailov is suitable for my use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEAT] Syntactically valid default argument docstring
3 participants