Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SequenceProtocol and MappingProtocol descriptions to the guide #1546

Merged
merged 5 commits into from
Apr 7, 2021

Conversation

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for giving some much needed love to our documentation!

I have a couple of suggestions and questions; comments below.

Also, a more general question I have wondered about: the PyResult bit of all the return values is optional - e.g. for a proto which returns PyResult<T>, just T is also an acceptable return value. Do you think it's better if:

  • we include PyResult<T> on all these return values, with a note in an # Error Handling section saying PyResult is optional
  • we just have T on all the return value, with a note in an # Error Handling section saying all return types can be wrapped in PyResult

I think that just having T on the return value is easier to read, but it's maybe less obvious for new users to then figure out how to return errors unless they read the whole document and find the Error Handling section!

guide/src/class/protocols.md Outdated Show resolved Hide resolved
guide/src/class/protocols.md Outdated Show resolved Hide resolved
guide/src/class/protocols.md Outdated Show resolved Hide resolved
Comment on lines +161 to +164
*Note:* Negative integer indexes are handled as follows: if `__len__()` is defined,
it is called and the sequence length is used to compute a positive index,
which is passed to `__getitem__()`.
If `__len__()` is not defined, the index is passed as is to the function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This differs from https://docs.python.org/3/reference/datamodel.html#object.__getitem__, which says that negative integers are are passed directly to __getitem__ which then can interpret the negative numbers as it likes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume PyO3 plugs the __getitem__ method implementation into the PySequenceMethods.sq_item slot. The C function in that slot has a slightly different interface from the Python-native __getitem__() method, as described in the C-API docs linked above. PyObject_GetItem() and PySequence_GetItem() handle the negative indexes themselves when __len__() is also implemented.

My implementations of __getitem__() always start with assert!(idx >= 0); because I also implement __len__(), and they indeed work with the negative indexes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, this is a really good point. Perhaps in the documentation here we should link all these methods to their relevant slot information?

All the relationships between these methods and their slots is defined in https://github.com/PyO3/pyo3/blob/main/pyo3-macros-backend/src/defs.rs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in the documentation here we should link all these methods to their relevant slot information?

I like this idea, but the mapping between methods and slots looks not so straightforward. There are methods that do not have a C function slot like __bytes__(), __format__() and __reversed__(). And then there are methods that share the same slot like __setitem__() and __delitem__(). I think we could add a symbolic tag like PySequenceMethods.sq_item to each trait method.

Another interesting issue here is the implicit slot method probing order: for example the evaluation of obj[i] Python expression probes PyMappingMethods.mp_subscript before PySequenceMethods.sq_item. This is relevant when someone implements both of these traits for some reason.

Comment on lines +240 to +244
* `fn __reversed__(&self) -> PyResult<impl ToPyObject>`

Called (if present) by the `reversed()` built-in to implement reverse iteration.
It should return a new iterator object that iterates over all the objects in
the container in reverse order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm we should probably move __reversed__ to the PyIterProtocol trait? (I can do it in a separate PR.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, __reversed__() is a standalone method, not a PyMappingMethods slot, so there is no particular reason it should be a part of PyMappingProtocol.
However, implementing __reversed__() for sequences is not very useful, because the default implementation seems sufficient for most cases.

I personally do not like the idea of adding anything to PyIterProtocol though. It's already semantically overloaded as it encompasses both IntoIterator and Iterator Rust traits. But then the user is expected to implement a different subset of methods for each of the intended uses (as an iterator or as an iterable).

Currently, we also have to write mandatory boilerplate code like

fn __iter__(slf: PyRef<Self>) -> PyRef<Self> {
    slf
}

for every custom iterator class.

Most Rust programmers would expect a blanket implementation like

impl<I> IntoIterator for I where
    I: Iterator, 

from the stdlib to exist instead.

I'm sorry about the long rant, but PyIterProtocol was the biggest road bump for me when working with PyO3.
I really wish it was two distinct traits, like for example PyIterProtocol and PyIntoIterProtocol, so that anyone who knows how Rust iterators work could easily implement Python iterators without scratching their head about "why do I need to return PyRef<Self> instead Self here?"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I completely agree that the current situation is confusing in many ways. I wonder if alternatively here I should remove __reversed__ from #[pyproto] completely. In my opinion #[pymethods] are very easy to learn, and implementing __reversed__ in #[pymethods] will already work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea very much. There are other non-slotted methods that are included in the protocol traits, like __bytes__(). And then there are C function slots that have no corresponding PyObjectProtocol method like tp_call aka __call__().

I'm all for implementing true non-slotted methods as #[pymethods]. This way we can avoid surprises like #1465 in the future.

- Style

Co-authored-by: David Hewitt <1939362+davidhewitt@users.noreply.github.com>
@ravenexp
Copy link
Contributor Author

ravenexp commented Apr 7, 2021

Also, a more general question I have wondered about: the PyResult bit of all the return values is optional - e.g. for a proto which returns PyResult<T>, just T is also an acceptable return value. Do you think it's better if:

* we include `PyResult<T>` on all these return values, with a note in an `# Error Handling` section saying _PyResult is optional_

* we just have `T` on all the return value, with a note in an `# Error Handling` section saying _all return types can be wrapped in PyResult_

I think that just having T on the return value is easier to read, but it's maybe less obvious for new users to then figure out how to return errors unless they read the whole document and find the Error Handling section!

Hmm, either way is fine with me. I mostly followed the convention PyObjectProtocol and PyNumberProtocol docs already used.
It is also not very clear whether PyResult<()> return type can be removed entirely.
New Rust programmers would probably expect the trait method signatures to match exactly, and the fact that one can implement either fn foo(&self) or fn foo(&self) -> PyResult<()> is not obvious at all.

@davidhewitt
Copy link
Member

New Rust programmers would probably expect the trait method signatures to match exactly

This is very true. I think in that case it's better to leave the documentation with PyResult<_> on all the methods.

the fact that one can implement either fn foo(&self) or fn foo(&self) -> PyResult<()> is not obvious at all.

I agree. There's a lot of magic going on in these trait implementations, which I really wish we could simplify.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion here, I'm happy that this is good to be merged to the guide as-is. Thanks very much for this; the #[pyproto] documentation is desperately in need of love!

@davidhewitt davidhewitt merged commit f5188bb into PyO3:main Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants