Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SequenceProtocol and MappingProtocol descriptions to the guide #1546

Merged
merged 5 commits into from
Apr 7, 2021
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 108 additions & 10 deletions guide/src/class/protocols.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ The [`PyObjectProtocol`] trait provides several basic customizations.

To customize object attribute access, define the following methods:

* `fn __getattr__(&self, name: FromPyObject) -> PyResult<impl IntoPy<PyObject>>`
* `fn __setattr__(&mut self, name: FromPyObject, value: FromPyObject) -> PyResult<()>`
* `fn __delattr__(&mut self, name: FromPyObject) -> PyResult<()>`
* `fn __getattr__(&self, name: impl FromPyObject) -> PyResult<impl IntoPy<PyObject>>`
* `fn __setattr__(&mut self, name: impl FromPyObject, value: impl FromPyObject) -> PyResult<()>`
* `fn __delattr__(&mut self, name: impl FromPyObject) -> PyResult<()>`

Each method corresponds to Python's `self.attr`, `self.attr = value` and `del self.attr` code.

Expand Down Expand Up @@ -61,7 +61,7 @@ Each method corresponds to Python's `self.attr`, `self.attr = value` and `del se

### Emulating numeric types

The [`PyNumberProtocol`] trait allows [emulate numeric types](https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types).
The [`PyNumberProtocol`] trait allows to emulate [numeric types](https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types).
ravenexp marked this conversation as resolved.
Show resolved Hide resolved

* `fn __add__(lhs: impl FromPyObject, rhs: impl FromPyObject) -> PyResult<impl ToPyObject>`
* `fn __sub__(lhs: impl FromPyObject, rhs: impl FromPyObject) -> PyResult<impl ToPyObject>`
Expand Down Expand Up @@ -106,12 +106,6 @@ The reflected operations are also available:
The code generated for these methods expect that all arguments match the
signature, or raise a TypeError.

*Note*: Currently implementing the method for a binary arithmetic operations
(e.g, `__add__`) shadows the reflected operation (e.g, `__radd__`). This is
being addressed in [#844](https://github.com/PyO3/pyo3/issues/844). to make
these methods


This trait also has support the augmented arithmetic assignments (`+=`, `-=`,
`*=`, `@=`, `/=`, `//=`, `%=`, `**=`, `<<=`, `>>=`, `&=`, `^=`, `|=`):

Expand Down Expand Up @@ -147,6 +141,108 @@ Other:
* `fn __index__(&'p self) -> PyResult<impl ToPyObject>`
* `fn __round__(&'p self, ndigits: Option<impl FromPyObject>) -> PyResult<impl ToPyObject>`

### Emulating sequential containers (such as lists or tuples)

The [`PySequenceProtocol`] trait allows to emulate
ravenexp marked this conversation as resolved.
Show resolved Hide resolved
[sequential container types](https://docs.python.org/3/reference/datamodel.html#emulating-container-types).

For a sequence, the allowable keys should be the integers _k_ for which _0 <= k < N_,
ravenexp marked this conversation as resolved.
Show resolved Hide resolved
where _N_ is the length of the sequence.

* `fn __len__(&self) -> PyResult<usize>`

Implements the built-in function `len()` for the sequence.

* `fn __getitem__(&self, idx: isize) -> PyResult<impl ToPyObject>`

Implements evaluation of the `self[idx]` element.
If the `idx` value is outside the set of indexes for the sequence, `IndexError` should be raised.

*Note:* Negative integer indexes are handled as follows: if `__len__()` is defined,
it is called and the sequence length is used to compute a positive index,
which is passed to `__getitem__()`.
If `__len__()` is not defined, the index is passed as is to the function.
Comment on lines +161 to +164
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This differs from https://docs.python.org/3/reference/datamodel.html#object.__getitem__, which says that negative integers are are passed directly to __getitem__ which then can interpret the negative numbers as it likes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume PyO3 plugs the __getitem__ method implementation into the PySequenceMethods.sq_item slot. The C function in that slot has a slightly different interface from the Python-native __getitem__() method, as described in the C-API docs linked above. PyObject_GetItem() and PySequence_GetItem() handle the negative indexes themselves when __len__() is also implemented.

My implementations of __getitem__() always start with assert!(idx >= 0); because I also implement __len__(), and they indeed work with the negative indexes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, this is a really good point. Perhaps in the documentation here we should link all these methods to their relevant slot information?

All the relationships between these methods and their slots is defined in https://github.com/PyO3/pyo3/blob/main/pyo3-macros-backend/src/defs.rs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in the documentation here we should link all these methods to their relevant slot information?

I like this idea, but the mapping between methods and slots looks not so straightforward. There are methods that do not have a C function slot like __bytes__(), __format__() and __reversed__(). And then there are methods that share the same slot like __setitem__() and __delitem__(). I think we could add a symbolic tag like PySequenceMethods.sq_item to each trait method.

Another interesting issue here is the implicit slot method probing order: for example the evaluation of obj[i] Python expression probes PyMappingMethods.mp_subscript before PySequenceMethods.sq_item. This is relevant when someone implements both of these traits for some reason.


* `fn __setitem__(&mut self, idx: isize, value: impl FromPyObject) -> PyResult<()>`

Implements assignment to the `self[idx]` element. Same note as for `__getitem__()`.
Should only be implemented if sequence elements can be replaced.

* `fn __delitem__(&mut self, idx: isize) -> PyResult<()>`

Implements deletion of the `self[idx]` element. Same note as for `__getitem__()`.
Should only be implemented if sequence elements can be deleted.

* `fn __contains__(&self, item: impl FromPyObject) -> PyResult<bool>`

Implements membership test operators.
Should return true if `item` is in `self`, false otherwise.
For objects that don’t define `__contains__()`, the membership test simply
traverses the sequence until it finds a match.

* `fn __concat__(&self, other: impl FromPyObject) -> PyResult<impl ToPyObject>`

Concatenates two sequences.
Used by the `+` operator, after trying the numeric addition via
the `PyNumberProtocol` trait method.

* `fn __repeat__(&self, count: isize) -> PyResult<impl ToPyObject>`

Repeats the sequence `count` times.
Used by the `*` operator, after trying the numeric multiplication via
the `PyNumberProtocol` trait method.

* `fn __inplace_concat__(&mut self, other: impl FromPyObject) -> PyResult<Self>`

Concatenates two sequences in place. Returns the modified first operand.
Used by the `+=` operator, after trying the numeric in place addition via
the `PyNumberProtocol` trait method.

* `fn __inplace_repeat__(&mut self, count: isize) -> PyResult<Self>`

Repeats the sequence `count` times in place. Returns the modified first operand.
Used by the `*=` operator, after trying the numeric in place multiplication via
the `PyNumberProtocol` trait method.

### Emulating mapping containers (such as dictionaries)

The [`PyMappingProtocol`] trait allows to emulate
[mapping container types](https://docs.python.org/3/reference/datamodel.html#emulating-container-types).

For a mapping, the keys may be Python objects of arbitrary type.

* `fn __len__(&self) -> PyResult<usize>`

Implements the built-in function `len()` for the mapping.

* `fn __getitem__(&self, key: impl FromPyObject) -> PyResult<impl ToPyObject>`

Implements evaluation of the `self[key]` element.
If `key` is of an inappropriate type, `TypeError` may be raised;
if `key` is missing (not in the container), `KeyError` should be raised.

* `fn __setitem__(&mut self, key: impl FromPyObject, value: impl FromPyObject) -> PyResult<()>`

Implements assignment to the `self[key]` element or insertion of a new `key`
mapping to `value`.
Should only be implemented if the mapping support changes to the values for keys,
or if new keys can be added.
The same exceptions should be raised for improper key values as
for the `__getitem__()` method.

* `fn __delitem__(&mut self, key: impl FromPyObject) -> PyResult<()>`

Implements deletion of the `self[key]` element.
Should only be implemented if the mapping supports removal of keys.
The same exceptions should be raised for improper key values as
for the `__getitem__()` method.

* `fn __reversed__(&self) -> PyResult<impl ToPyObject>`

Called (if present) by the `reversed()` built-in to implement reverse iteration.
It should return a new iterator object that iterates over all the objects in
the container in reverse order.
Comment on lines +240 to +244
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm we should probably move __reversed__ to the PyIterProtocol trait? (I can do it in a separate PR.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, __reversed__() is a standalone method, not a PyMappingMethods slot, so there is no particular reason it should be a part of PyMappingProtocol.
However, implementing __reversed__() for sequences is not very useful, because the default implementation seems sufficient for most cases.

I personally do not like the idea of adding anything to PyIterProtocol though. It's already semantically overloaded as it encompasses both IntoIterator and Iterator Rust traits. But then the user is expected to implement a different subset of methods for each of the intended uses (as an iterator or as an iterable).

Currently, we also have to write mandatory boilerplate code like

fn __iter__(slf: PyRef<Self>) -> PyRef<Self> {
    slf
}

for every custom iterator class.

Most Rust programmers would expect a blanket implementation like

impl<I> IntoIterator for I where
    I: Iterator, 

from the stdlib to exist instead.

I'm sorry about the long rant, but PyIterProtocol was the biggest road bump for me when working with PyO3.
I really wish it was two distinct traits, like for example PyIterProtocol and PyIntoIterProtocol, so that anyone who knows how Rust iterators work could easily implement Python iterators without scratching their head about "why do I need to return PyRef<Self> instead Self here?"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I completely agree that the current situation is confusing in many ways. I wonder if alternatively here I should remove __reversed__ from #[pyproto] completely. In my opinion #[pymethods] are very easy to learn, and implementing __reversed__ in #[pymethods] will already work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea very much. There are other non-slotted methods that are included in the protocol traits, like __bytes__(). And then there are C function slots that have no corresponding PyObjectProtocol method like tp_call aka __call__().

I'm all for implementing true non-slotted methods as #[pymethods]. This way we can avoid surprises like #1465 in the future.


### Garbage Collector Integration

If your type owns references to other Python objects, you will need to
Expand Down Expand Up @@ -294,5 +390,7 @@ In Python a generator can also return a value. To express this in Rust, PyO3 pro
both `Yield` values and `Return` a final value - see its docs for further details and an example.

[`PyGCProtocol`]: {{#PYO3_DOCS_URL}}/pyo3/class/gc/trait.PyGCProtocol.html
[`PyMappingProtocol`]: {{#PYO3_DOCS_URL}}/pyo3/class/mapping/trait.PyMappingProtocol.html
[`PyNumberProtocol`]: {{#PYO3_DOCS_URL}}/pyo3/class/number/trait.PyNumberProtocol.html
[`PyObjectProtocol`]: {{#PYO3_DOCS_URL}}/pyo3/class/basic/trait.PyObjectProtocol.html
[`PySequenceProtocol`]: {{#PYO3_DOCS_URL}}/pyo3/class/sequence/trait.PySequenceProtocol.html