Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contribution opportunity: improved conversion traits #4041

Closed
davidhewitt opened this issue Apr 3, 2024 · 18 comments · Fixed by #4672
Closed

Contribution opportunity: improved conversion traits #4041

davidhewitt opened this issue Apr 3, 2024 · 18 comments · Fixed by #4672

Comments

@davidhewitt
Copy link
Member

Following the introduction of the Bound API in 0.21, we have converged on three primary traits which are used to convert Rust datatypes to Python.

  • one trait for from-Python conversions: FromPyObject
  • two traits for to-Python conversions: ToPyObject and IntoPy<PyObject> (and sub-forms IntoPy<Py<PyTuple>> and IntoPy<Py<PyString>>)

There are several opportunities to contribute to PyO3 with a goal of improving these traits. Given that these traits are so central to PyO3's API, they impact developer ergonomics and performance. The right designs could improve PyO3 a lot, though any new API would need a migration pathway which is feasible for users.

Improving FromPyObject

FromPyObject is already in a migration state as its input argument is changing from a GIL Ref to a Bound smart pointer. We also have FromPyObjectBound which allows a few additional optimisations and extractions like &str at cost of complexity. We think the two lifetimes of FromPyObjectBound will be the future design of FromPyObject, though we don't yet know how that will interact with #[derive(FromPyObject)].

Beyond resolving that question, I am aware of at least two other possibilities to refine FromPyObject:

  • Rather than returning PyErr as the error type, we could do something similar to std::str::FromStr and add a type Err associated type. This would allow implementations to use just downcasting errors, or other concrete Rust errors without having to go through the relatively heavy PyErr machinery.
  • In Review conversions and ensure consistency #3226 and similar issues we have discussed how there is a possibility to define either "strict" or "lax" conversions. Maybe FromPyObject could have an extract_exact method added, which defines strict conversions, so that the #[pyfunction] macros can have a #[pyo3(exact)] annotation on arguments to control this better.

Improving to-Python traits

I'm reasonably convinced that there doesn't need to be two (or four, depending how you see it) to-Python traits for PyO3.

In addition, we have a strong need to have a fallible conversion, because there are now several panics inside PyO3 which are caused by the limitation that none of these traits can fail.

I once proposed an IntoPyObject trait in #2316. If FromPyObject is staying around, I think IntoPyObject is the right name to pair with it (though maybe we just introduce a new trait for each direction, for easier migration). I think the design in #2316 is outdated and also it probably should be fallible, as FromPyObject is also fallible.

Maybe the following is closer to the right definition:

trait IntoPyObject {
    type Target;
    type Err;

    fn into_pyobject(self, py: Python<'py>) -> Result<Bound<'py, Self::Target>, Self::Err>;
}

... but that's just a sketch, and discussion / research / prototyping would yield valuable insight.

@Icxolu
Copy link
Contributor

Icxolu commented Apr 4, 2024

One point to consider is whether IntoPyObject should take Self by value or by reference. I believe typically in such a conversion one would take it by value to potentially reuse allocations, which I think we can't do because the conversion target lives on the Python heap, not the Rust one. So maybe taking &self is more sensible, such that Rust structures do not get needlessly consumed (and dropped). Or differently: Why should I not be allowed to continue to use a Vec<T> after I turned it into a Bound<PyList>?

I guess one downside would be the identity conversion of a Python type to itself (or another python type?), which would have to perform a (relatively cheap) reference count increase (which would most likely be happening in generic code).

@adamreichold
Copy link
Member

adamreichold commented Apr 5, 2024

though we don't yet know how that will interact with #[derive(FromPyObject)].

Note #[serde(borrow)] is prior art on the issue, i.e. we could implement FromPyObject<'a, 'py> for &'a T by default and use a lifetime generic parameter if #[pyo3(borrow)] is given.

@adamreichold
Copy link
Member

One point to consider is whether IntoPyObject should take Self by value or by reference.

As with e.g. IntoIterator, I think the trait should take self (by value) and we would implement it for for references &'a T where applicable.

@adamreichold
Copy link
Member

Maybe the following is closer to the right definition:

Any particular reason to fix the target via the associated type? Why not keep this as a generic IntoPyObject<T> to support "multiple overloads"?

@Icxolu
Copy link
Contributor

Icxolu commented Apr 5, 2024

As with e.g. IntoIterator, I think the trait should take self (by value) and we would implement it for for references &'a T where applicable.

That makes sense, I agree with that. (I always forget that you can implement traits on references as well 🤦 )

Any particular reason to fix the target via the associated type? Why not keep this as a generic IntoPyObject<T> to support "multiple overloads"?

I think it could be nice to support this. For example a HashMap could be turned into both a PyDict or a PyList of tuples. In any case I think the return type should be Bound<T> (and not just T, like IntoPy currently has), such that we actually know to convert into a Python type.

@davidhewitt
Copy link
Member Author

davidhewitt commented Apr 6, 2024

As with e.g. IntoIterator, I think the trait should take self (by value) and we would implement it for for references &'a T where applicable.

This is what I was thinking too. I think &T: SomeBound might be more generic than we want, but I imagine &Bound<T>, &[T] and maybe even &Vec<T> are all options to explore.

Any particular reason to fix the target via the associated type? Why not keep this as a generic IntoPyObject<T> to support "multiple overloads"?

I think it could be nice to support this. For example a HashMap could be turned into both a PyDict or a PyList of tuples. In any case I think the return type should be Bound<T> (and not just T, like IntoPy currently has), such that we actually know to convert into a Python type.

This is an interesting idea, and it would probably make migration easier than trying to refactor to solve existing IntoPy<Py<PyTuple>> and IntoPy<Py<PyString>> cases . I'm not entirely sure how I feel about it, worth exploring for sure though.

One case related to "generic overloads" worth mentioning is (). Its current implementation of IntoPy<PyObject> produces None on the Python side, but for IntoPy<Py<PyTuple>> it produces an empty Python tuple. This is done for because -> () function return values map to Python -> None, but my feeling is that it'd be better to try to specialize that case away inside the #[pyfunction] internals and have the trait implementations both produce an empty tuple.

@davidhewitt
Copy link
Member Author

A reason against the generic overloads is type inference. Assuming the function is into_pyobject(), then if we had an associated type .into_pyobject() always is unambiguous. A generic trait (like we currently have for IntoPy) means that anything which implements more than one IntoPyObject<T> will need type hints.

Examples to consider is e.g. Vec<T>. This becomes a Python list at the moment, so I imagined the associated type would be PyList. In the generic case, we might instead want to have IntoPyObject<PyAny> and IntoPyObject<PyList> implemented.

I'm assuming that all types usable for #[pyfunction] return values will need IntoPyObject<PyAny> implemented.

Currently for Vec<T>, it only implements IntoPy<PyObject>. To get to Bound<PyList> you then need to do a downcast (optionally unchecked & unsafe).

@Icxolu
Copy link
Contributor

Icxolu commented Apr 6, 2024

A reason against the generic overloads is type inference.

True, and type hints for generic traits are always a bit awkward, because one can't use a turbo fish on the method (and thus it often interrupts method chains).

One case related to "generic overloads" worth mentioning is (). Its current implementation of IntoPy<PyObject> produces None on the Python side, but for IntoPy<Py<PyTuple>> it produces an empty Python tuple.

Hmm, I can see how this can be surprising. I guess this is a general problem of the generic case. If there is more than one implementation of IntoPyObject<T>, what should IntoPyObject<PyAny> return?

I'm assuming that all types usable for #[pyfunction] return values will need IntoPyObject<PyAny> implemented.

This might not be necessarily needed, because into_any() is available unconditionally for all Bound<T>. I think this would make it possible to convert during macro expansion.

@davidhewitt
Copy link
Member Author

This might not be necessarily needed, because into_any() is available unconditionally for all Bound<T>. I think this would make it possible to convert during macro expansion.

Potentially, yes. My worry is that if a type has two implementations of IntoPyObject<T>, then the macros will also have type inference problems. Assuming that they always use IntoPyObject<PyAny> seems like the way we'd resolve this.

@adamreichold
Copy link
Member

True, and type hints for generic traits are always a bit awkward, because one can't use a turbo fish on the method (and thus it often interrupts method chains).

Couldn't we provide an extension trait to make the turbo fish available, e.g.

trait IntoPyObjectExt {
  fn into_pyobject<T, E>(&self) -> Result<Bound<'py, T>, E> where Self: IntoPyObject<Target=T, Err=E>;
}

impl<T> IntoPyObjectExt for T {
  fn into_pyobject<T, E>(&self) -> Result<Bound<'py, T>, E> where Self: IntoPyObject<Target=T, Err=E> {
    <self as &IntoPyObject>::into_pyobject_impl(self)
  }
}

That one probably does not even compile, but I think some way of shuffling the generic parameters around should be available?

Other than that, I don't think this particularly worse than e.g. FromIterator or From/Into from std. OTOH, I am highly sceptical that we will get by with a single trait if the resulting type is fixed by an associated type.

@Icxolu
Copy link
Contributor

Icxolu commented Aug 23, 2024

Hey, while working on porting the trait bounds of our Py*Methods APIs as discussed in #4456 (comment) I also tried to port some tests to IntoPyObject and ran into a road block with the reference impls. One example that actually completely unaffected by my trait bounds changes is types::mapping::tests::test_len. I applied to following patch which should use the impls for &HashMap and then HashMap.

diff --git a/src/types/mapping.rs b/src/types/mapping.rs
index 537959719..84fa577df 100644
--- a/src/types/mapping.rs
+++ b/src/types/mapping.rs
@@ -189,19 +189,20 @@ mod tests {
     use crate::{exceptions::PyKeyError, types::PyTuple};
 
     use super::*;
+    use crate::conversion::IntoPyObject;
 
     #[test]
     fn test_len() {
         Python::with_gil(|py| {
             let mut v = HashMap::new();
-            let ob = v.to_object(py);
-            let mapping = ob.downcast_bound::<PyMapping>(py).unwrap();
+            let ob = (&v).into_pyobject(py).unwrap();
+            let mapping = ob.downcast::<PyMapping>().unwrap();
             assert_eq!(0, mapping.len().unwrap());
             assert!(mapping.is_empty().unwrap());
 
             v.insert(7, 32);
-            let ob = v.to_object(py);
-            let mapping2 = ob.downcast_bound::<PyMapping>(py).unwrap();
+            let ob = v.into_pyobject(py).unwrap();
+            let mapping2 = ob.downcast::<PyMapping>().unwrap();
             assert_eq!(1, mapping2.len().unwrap());
             assert!(!mapping2.is_empty().unwrap());
         });

Unfortunately I get a rather puzzling compile error:

error[E0275]: overflow evaluating the requirement `&instance::Bound<'_, _>: conversion::IntoPyObject<'_>`
   --> src/types/mapping.rs:198:27
    |
198 |             let ob = (&v).into_pyobject(py).unwrap();
    |                           ^^^^^^^^^^^^^
    |
    = help: consider increasing the recursion limit by adding a `#![recursion_limit = "256"]` attribute to your crate (`pyo3`)
note: required for `&Either<instance::Bound<'_, _>, _>` to implement `conversion::IntoPyObject<'_>`
   --> src/conversions/either.rs:98:21
    |
98  | impl<'a, 'py, L, R> IntoPyObject<'py> for &'a Either<L, R>
    |                     ^^^^^^^^^^^^^^^^^     ^^^^^^^^^^^^^^^^
...
102 |     <&'a L as IntoPyObject<'py>>::Error: Into<PyErr>,
    |                                          ----------- unsatisfied trait bound introduced here
    = note: 127 redundant requirements hidden
    = note: required for `&HashMap<Either<Either<Either<Either<Either<Either<Either<Either<Either<Either<Either<Either<Either<Either<..., ...>, ...>, ...>, ...>, ...>, ...>, ...>, ...>, ...>, ...>, ...>, ...>, ...>, ...>, ...>` to implement `conversion::IntoPyObject<'_>`
    = note: the full name for the type has been written to '/home/tobias/pyo3-intopyobject-apis/target/debug/deps/pyo3-8c1bd657b3c9849b.long-type-7503351058969275665.txt'
    = note: consider using `--verbose` to print the full type name to the console

For more information about this error, try `rustc --explain E0275`.
error: could not compile `pyo3` (lib test) due to 1 previous error

I don't really have an idea why rustc is trying to evaluate that particular (kind of insane) type, but this does not seem to be specific about Either. If I comment out that particular impl, the error changes to another reference impl (until there are non left). I don't see anything wrong with our reference impls. They can be recursive of course, but they aren't generally. So I think this is a compiler bug, the closest issue I could find is rust-lang/rust#37748 (not sure if this is the identical problem, but it looks pretty similar). Just cloning the map works perfectly fine, with and without these impls.

The question is now, can anyone think of a workaround here (or maybe there is something wrong with these impl that I didn't see). I already tried for a bit now, but could not make anything work so far. It would be rather unfortunate if we could not provide these reference impls.

@davidhewitt
Copy link
Member Author

I suppose the problem is that type inference searches for types which might satisfy IntoPyObject for the hashmap's bounds. The T in the impl for &Bound<T> has no bound, so it might be itself &Bound<T>, and the compiler searches endlessly? Maybe similar for other types too 😞

I feel like it's definitely a compiler bug and that we worked around a similar overflow with DerefToPyAny, maybe a similar trick helps here?

@davidhewitt
Copy link
Member Author

I assume that annotating the hashmap fixes the problem?

@Icxolu
Copy link
Contributor

Icxolu commented Aug 23, 2024

I feel like it's definitely a compiler bug and that we worked around a similar overflow with DerefToPyAny, maybe a similar trick helps here?

Hmm, I will look into that, but the advantage with DerefToPyAny is that either we control the types or it is in macro context. For IntoPyObject that's not necessarily the case. So it could be annoying /surprising for users.

I assume that annotating the hashmap fixes the problem?

Actually no, it still does not pickup the impl and wants me to clone that map...

@Icxolu
Copy link
Contributor

Icxolu commented Aug 23, 2024

Actually no, it still does not pickup the impl and wants me to clone that map...

There was an error in the impl, the BuildHasher has the wrong bound 🤦. We still need the type annotation to prevent the overflow error, but at least there is a way to make it work. (On that particular test, for the other tests in that module inference works fine, I guess I really choose the worst test to start with 🤷 )

I will provide a PR shortly to fix the bounds on the hash impls.

diff --git a/src/conversions/std/map.rs b/src/conversions/std/map.rs
index f19b5671d..e16381505 100644
--- a/src/conversions/std/map.rs
+++ b/src/conversions/std/map.rs
@@ -76,7 +76,7 @@ impl<'a, 'py, K, V, H> IntoPyObject<'py> for &'a collections::HashMap<K, V, H>
 where
     &'a K: IntoPyObject<'py> + cmp::Eq + hash::Hash,
     &'a V: IntoPyObject<'py>,
-    &'a H: hash::BuildHasher,
+    H: hash::BuildHasher,
     PyErr: From<<&'a K as IntoPyObject<'py>>::Error> + From<<&'a V as IntoPyObject<'py>>::Error>,
 {

@Icxolu
Copy link
Contributor

Icxolu commented Oct 9, 2024

Progress IntoPyObject

I thought I'll do a small progress update here to collect whats done and whats left:

@davidhewitt
Copy link
Member Author

Additional thought on IntoPyObject related to the &T cases, I wonder if we want to have something like #[derive(IntoPyObjectRef)] (or maybe #[derive(RefIntoPyObject)] which implements IntoPyObject for &T (assuming all its fields support the reference form too).

@Icxolu
Copy link
Contributor

Icxolu commented Oct 26, 2024

Interesting thought, I'll play around with the idea! I think it should be possible to reuse nearly everything of the current derive. I believe there are only real assumption made about the Self type for transparent types. and the bounds on generic types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants