Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__anext__ should be able to return &PyAny #3190

Closed
lifthrasiir opened this issue May 29, 2023 · 18 comments · Fixed by #3661
Closed

__anext__ should be able to return &PyAny #3190

lifthrasiir opened this issue May 29, 2023 · 18 comments · Fixed by #3661
Milestone

Comments

@lifthrasiir
Copy link
Contributor

lifthrasiir commented May 29, 2023

Currently __anext__ expects IterANextOutput<_, _>, which can be only converted from Option<T1> or IterANextOutput<T2, T3> where all Ts implement IntoPy<PyObject>. But __anext__ frequently returns an awaitable, and whether the result is going to be Yield or Return is up to the awaitable and not __anext__ itself.

Searching for the existing __anext__ implementations from GitHub confirms this issue, and everyone seems to come up with their own workaround (I also had one):

Therefore ideally we should allow any T (that implements IntoPy<PyObject> as above), but that impl will conflict with both existing impls. While the impl for IterANextOutput<_, _> can be merged into IntoPyCallbackOutput and so is redundant, the impl for Option<T> cannot be removed because (Rust) None should be converted to PyStopAsyncIteration, not (Python) None. So the only possibility would be an autoref specialization.

On the other hand the value of forcing IterANextOutput<_, _> from __anext__ seems unclear. Assuming the last __anext__ did return some awaitable and it yielded some value, the awaitable might or might not know that the iteration has stopped---both possibilities are plausible. The current design is only suitable if the stop is already known. Probably __anext__ should not restrict return types and we should just make IterANextOutput implement IntoPy, ditching the Option<T> case (with a proper documentation of course).

@adamreichold
Copy link
Member

adamreichold commented May 29, 2023

On the other hand the value of forcing IterANextOutput<_, _> from anext seems unclear. Assuming the last anext did return some awaitable and it yielded some value, the awaitable might or might not know that the iteration has stopped---both possibilities are plausible. The current design is only suitable if the stop is already known. Probably anext should not restrict return types and we should just make IterANextOutput implement IntoPy, ditching the Option case (with a proper documentation of course).

While I am sympathetic to the simplicity of that approach, the resulting backwards incompatibility (silently compiling existing code to do something different) and the inconsistency between handling of __next__ and __anext__ appear to be significant downsides.

As an alternative, what do you think of adding a third enum variant to IterANextOutput which is basically passed to Python untouched? Something like IterANextOutput::Defer(PyObject)?

@davidhewitt
Copy link
Member

Agreed there is a backwards compatibility hazard here. It should be noted that the design of __anext__ intentionally mirrors __next__, however I'm open to concluding that was incorrect.

I think if we wanted to expand to allow T: IntoPy<PyObject> we could potentially use autoref specialization inside the generated code to first work out a "converter" and then use that type to drive the conversion.

@davidhewitt
Copy link
Member

(It's worth noting in the past these methods were limited by the design of the #[pyproto] traits. With those removed we do have new design space here.)

@adamreichold
Copy link
Member

adamreichold commented May 29, 2023

I think if we wanted to expand to allow T: IntoPy we could potentially use autoref specialization inside the generated code to first work out a "converter" and then use that type to drive the conversion.

While technically possible, I think this might end up hard to explain, especially if we use it to keep the Option special case and it will behave differently from "normal" #[pyfunction]s while the code looks the same. Then, the existing Option-layer on top of the IterANextOuput requirement is also not particularly straight-forward IMHO.

I guess in the end this is also a conflict between making the handling of __anext__ more idiomatic w.r.t. Rust (fixing the types or at least traits but providing conversions) versus Python (making this work automatically based on the inferred programmer intent). (There is a dusty closet in my mind where the trait-based protocols continue to be mourned for being what I would want the definition of the protocols to look like.)

@lifthrasiir
Copy link
Contributor Author

lifthrasiir commented May 29, 2023

While I am sympathetic to the simplicity of that approach, the resulting backwards incompatibility (silently compiling existing code to do something different) and the inconsistency between handling of __next__ and __anext__ appear to be significant downsides.

I agree that existing code should work identically as much as possible. The consistency, however, is nothing if it makes everyone to work around the quote-unquote consistent interface. I was surprised to learn that PyO3 does have a test for __anext__, which was the only code I've ever seen that fits to this design.

If we can agree that this is a real issue and desirable to fix, I believe my initial suggestion to use an autoref specialization is indeed fully backward compatible:

  • We reserve rights to change the expansion of proc-macros as long as documented inputs continue to work as advertised. In fact we don't even guarantee the exact behavioral compatibility either.
  • Therefore we are relatively free to change the internal implementation of #[pyclass]. While it is not necessary in this case, we are also safe to alter IntoPyCallbackOutput which was never publicly documented.
  • IntoPyCallbackOutput was never implemented for general T anyway, so using the autoref specialization to allow general T doesn't change existing code (which return types should go straight to IntoPyCallbackOutput). The autoref specialization would be unusable if that T was from type parameters, but we don't allow them.

As an alternative, what do you think of adding a third enum variant to IterANextOutput which is basically passed to Python untouched? Something like IterANextOutput::Defer(PyObject)?

In comparison this is technically a breaking change because IterANextOutput is publicly documented.

While technically possible, I think [the autoref specialization] might end up hard to explain, especially if we use to keep the Option special case and it will behave differently from "normal" #[pyfunction]s while the code looks the same.

Isn't this already true for __iter____next__? The only difference is that we don't return a coroutine for __iter____next__.

Then, the existing Option-layer on top of the IterANextOutput requirement is also not particularly straight-forward IMHO.

If a specialization magic is not your taste (I can relate), you can also specially treat Option and IterANextOutput return types. Proc-macros already recognize the exact ident Option, and IterANextOutput is specific enough that we can do the same. It's easy to explain: everything works like ordinary functions, except for two cases which are different to be consistent with __iter__.

@adamreichold
Copy link
Member

adamreichold commented May 29, 2023

In comparison this is technically a breaking change because IterANextOutput is publicly documented.

Yes, this is breaking a change. The enum is not just publicly documented but also not marked as #[non_exhaustive]. But it it easy to explain, to use and should be effectively backwards compatible. I do admit that this solution would be untypical for PyO3 though. We tend to be rather Pythonic insofar as we gravitate towards using different kinds of magic when it improves the ergonomics of the common case.

Isn't this already true for iter? The only difference is that we don't return a coroutine for iter.

I am at a loss to which behaviour you refer to exactly? Do you mean __next__ handling Option via impl<T> IntoPyCallbackOutput<PyIterNextOutput> for Option<T>?

If so, then my problem is not just that the existing protocol is complicated, but that I think we are discussing to add an additional twist on top of that.

The current contract is:

  • Return type of __(a)next__ has to be Iter(A)NextOutput.
  • Or Option<T> which is transformed into PyIter(A)NextOutput in a way that is distinct from how Option<T> is turned into PyObject.

But now we extend the contract of __anext__ with:

  • Or T if it isn't Option<T> which bypasses the above and is turned into PyObject as usual.

Personally, I would prefer to avoid adding another layer to this altogether, hence the suggestion of the option to add IterANextOutput::Defer. This would still create a difference between __next__ and __anext__, but it would give us a way to explicitly encode that difference in the type system and would produce call sites that were somewhat self-documenting. Ideally, I would even want to enforce that IterANextOutput::Defer produces an awaitable but I do not see how we could do that statically without pulling in something that bridges Future with Python's async I/O.

(If we do add the third option, using autoref specialization is fine IMHO and I would actually prefer that to matching on type names.)

((If we are back to the drawing board, I could even imagine copying Python's way of doing things, i.e. we always return T turned into PyObject as usual. If iteration is finished, we do panic_any(StopIteration) and our trampoline turns this into PyStopIteration. We do have to catch panics in any case so I don't think this would be a performance issue. But I think using unwinding for control flow is a big foot gun in Rust, e.g. most issues with unsafe code come from unexpected unwinding while some invariant is not yet reconstituted.))

@adamreichold
Copy link
Member

(Of course, adding IterANextOutput::Defer and the special handing for T which isn't Option<T> are not exclusive.)

@adamreichold
Copy link
Member

To add some context as to why I think specialization (whatever mechanism is used to implement it) is difficult to explain here: I think that specialization should be used when it does not need explanation at all, i.e. when the special case it captures is only special w.r.t. the quality of implementation (e.g. efficiency) but not w.r.t. the observable effects.

@lifthrasiir
Copy link
Contributor Author

Do you mean __next__ handling Option via impl<T> IntoPyCallbackOutput<PyIterNextOutput> for Option<T>?

Oops, yes, I meant to say __next__ and not __iter__. Your understanding is correct.

If we do add the third option, using autoref specialization is fine IMHO and I would actually prefer that to matching on type names.

Ah, that makes sense. So you want to make the result of specialization visible from the type, without necessary removing that specialization. I'm okay with that. (I'm also not particularly into specialization, but I do feel proc-macros are already quite magical and specialization doesn't change that much ;-)

I could even imagine copying Python's way of doing things, i.e. we always return T turned into PyObject as usual. If iteration is finished, we do panic_any(StopIteration) and our trampoline turns this into PyStopIteration.

I think most workarounds (for __anext__) already issue PyStopAsyncIteration by themselves, so this might be workable. But the existing code will still expect Option<T> to work.


So, to recap the interim conclusion as I understand:

  1. The current signature of __anext__ is less than helpful.
  2. The autoref specialization, by its own, doesn't break existing code.
  3. The specialization in general is hard to explain so types should strive to self-explain them. We can afford a slight backward incompatibility for that.

I believe these are enough to pinpoint what to do next. Please let me know if not.

@adamreichold
Copy link
Member

I believe these are enough to pinpoint what to do next. Please let me know if not.

While I understand that this might be frustrating, I think we are still exploring the design space for now.

If we do decide to include specialization for T-but-not-Option<T> in the final design, we could push that also into a 0.19.x maintenance release, but I don't think we have reached a consensus for that final design yet. Personally, I would also like to here more want other maintainers think before committing the project on any direction.

@lifthrasiir
Copy link
Contributor Author

Ah, I thought the design space is not very large given that we are dealing with concrete types and we have only a few ways to alter them (this would be not the case if public traits are involved), and thus those constraints are strong enough. Didn't meant to hurry, we have a lot of time to perfect this :-)

@davidhewitt
Copy link
Member

Rereading PEP 492 it seems to be that __anext__ is defined as returning an awaitable, in which case I am tempted to argue that PyO3's current design is fundamentally wrong. We could go so far as to say that the awaitable is always responsible for raising StopAsyncIteration.

This would also map to the StreamExt traits next() method returning impl Future<Item = Option<T>>, which we could plausibly support in the future if we had #1632.

Presumably this would be the eventual goal of what we'd want to get to, where PyO3 can package the future into an awaitable automatically.

So, the question becomes how do we get there, and avoid breaking users along the way? If we required __anext__ to return a future (i.e. be async), I think that would be sufficiently breaking. However we probably want users to also return Python awaitables without any overhead.

@adamreichold
Copy link
Member

Rereading PEP 492 it seems to be that anext is defined as returning an awaitable, in which case I am tempted to argue that PyO3's current design is fundamentally wrong. We could go so far as to say that the awaitable is always responsible for raising StopAsyncIteration.

Indeed in that case our design is wrong. Additionally, the example collected in the OP would all fit this pattern. Does anybody know any example where the contract intended by our current API is actually used?

If that is not the case, I would say that there is little point in caring about backwards compatibility, especially since Some(Future) would be equivalent to return Future under the normal IntoPyCallbackOutput rules?

If I understand this correctly, there would be no point in keeping IterANextOutput around at all?

(While we are here and discuss this, when Iter(A)NextOutput was originally designed, was there any discussion of just using PyResult instead and make the Rust code return PyStop(Async)Iteration as the above linked futures already do?)

@davidhewitt
Copy link
Member

davidhewitt commented Jun 1, 2023

It looks like the original optionality to __anext__ was added in #35. I expanded on it with the PyIter[A]NextOutput enums in #997. I vaguely recall at the time being puzzled by the optionality but also trusted it has been built correctly. I don't think either thought process was written down unfortunately. 😁

If that is not the case, I would say that there is little point in caring about backwards compatibility, especially since Some(Future) would be equivalent to return Future under the normal IntoPyCallbackOutput rules?

I mostly agree with the caveat that None should probably be checked for and a deprecation warning emitted with a backwards-compatibility path in the macro. Then after a couple of versions we remove that and returning None from __anext__ would become a bug.

Agreed IterANextOutput can be removed, I think even before the backwards compatibility path.

For IterNextOutput, the translation from a Rust iterator to a Python one seems reasonably well fitted. But maybe this could be removed and an alternative convenience added by a similar idea to #3195 with automatic or opt-in iter = true option to #[pyclass]?

@adamreichold
Copy link
Member

For IterNextOutput, the translation from a Rust iterator to a Python one seems reasonably well fitted. But maybe this could be removed and an alternative convenience added by a similar idea to #3195 with automatic or opt-in iter = true option to #[pyclass]?

I think with today's PyO3 I would expect __iter__ to return anything as well and use PyResult<T> to "throw" a "StopIterationException" following the normal IntoPyCallbackOutput rules, i.e. mirror the Python approach but Rustify the error handling.

#[pyclass(iter = true)] could then be a convenience layer on top. I don't think I would want to have this without an escape hatch as implementing __iter__ could do tricks like slf: Py<Self> and I may not want my type to be a Rust iterator.

@joshua-auchincloss
Copy link

For IterNextOutput, the translation from a Rust iterator to a Python one seems reasonably well fitted. But maybe this could be removed and an alternative convenience added by a similar idea to #3195 with automatic or opt-in iter = true option to #[pyclass]?

I think with today's PyO3 I would expect __iter__ to return anything as well and use PyResult<T> to "throw" a "StopIterationException" following the normal IntoPyCallbackOutput rules, i.e. mirror the Python approach but Rustify the error handling.

#[pyclass(iter = true)] could then be a convenience layer on top. I don't think I would want to have this without an escape hatch as implementing __iter__ could do tricks like slf: Py<Self> and I may not want my type to be a Rust iterator.

Adding in here from #3202 - would this approach still not conflict when users return error types though? If the value is caught within IntoPyCallbackOutput, then theres still no way to pass errors from Rust into Python (if this approach is being understood correctly.

Py03 is blending two separate protocol implementations into one protocol, when they should be different:

  1. Generators yielding values on each call to __iter__ or __next__ (current handling implements this, but forces a yield to get this behaviour & defaults to None)
    i) Issues: yielding None at the end of the generator overwrites the value from generator ::Return
  2. Iterators yielding values via StopIteration
    ii) Issues: The return value is obfuscated through the generator yield. E.g. users are forced to return a value into StopIteration, bypassing any other errors the user wants to raise at runtime.

The two conflict with eachother, and it looks like Py03 is currently merging the logic for them both which leads to inappropriate handling at runtime (#3202). As well, adding a convenience layer is useful, but as you alluded to this would obfuscate the actual procedure being implemented. From a user perspective, the IterNextOutput is perfect for encapsulating that logic, but falls short due to having no escape when an error needs to raised to the Python interpreter from rust. I'd propose having some GenOutputXX and IterOutputXX, where gen allows for passing via Yield as a return value, and iter implements the protocol for re-escaping a generator until StopIteration is raised by Return. As a final encapsulation, both would also require :Err or similar, which would raise a PyErr directly to the runtime (not StopIteration, an actual error)

@adamreichold
Copy link
Member

Adding in here from #3202 - would this approach still not conflict when users return error types though? If the value is caught within IntoPyCallbackOutput, then theres still no way to pass errors from Rust into Python (if this approach is being understood correctly.

Using the usual PyResult implementation of IntoPyCallbackOutput, you can return exception objects via the Ok variant and raise exceptions via the Err variant, e.g.

#[pyfn(m)]
fn foo(py: Python<'_>) -> PyResult<PyObject> {
    let err = pyo3::exceptions::PyRuntimeError::new_err("foo");
    Ok(err.into_py(py))
}

#[pyfn(m)]
fn bar() -> PyResult<PyObject> {
    let err = pyo3::exceptions::PyRuntimeError::new_err("bar");
    Err(err)
}

will make test tests

def test_foo():
    err = foo()
    assert type(err) == RuntimeError
    assert "foo" in str(err)

def test_bar():
    with pytest.raises(RuntimeError, match = "bar"):
        bar()

pass.

@davidhewitt davidhewitt mentioned this issue Jun 16, 2023
6 tasks
@davidhewitt davidhewitt added this to the 0.20 milestone Aug 11, 2023
@davidhewitt
Copy link
Member

As a note for discussion here, tp_iternext is allowed to return NULL without setting a StopIteration value when the return does not need to carry a value.

https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_iternext

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants