-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
specialize collections holding bytes to turn into PyBytes
rather than PyList
#4417
Conversation
bec3c73
to
16120ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm very excited to see this and fix the longstanding footgun! We should probably mention this in the migration guide, as this will be a breaking change for return values of types like Vec<u8>
.
PyBytes::new_with(py, len, |buf| { | ||
let mut counter = 0; | ||
for (slot, byte) in buf.iter_mut().zip(&mut iter) { | ||
*slot = byte; | ||
counter += 1; | ||
} | ||
|
||
assert!(iter.next().is_none(), "Attempted to create PyBytes but `iter` was larger than reported by its `ExactSizeIterator` implementation."); | ||
assert_eq!(len, counter, "Attempted to create PyBytes but `iter` was smaller than reported by its `ExactSizeIterator` implementation."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is interesting, I wonder, does this have a performance impact over just using PyBytes::new
? As far as I can see, the problem is that we can't trivially get a slice, hence this approach.
I think the way to use PyBytes::new
would be to make a trait like SliceableIterator: IntoIterator<Item = Self>
which can either return a slice of all the elements in one go, or just call .into_iterator()
to do normal iteration which transfers ownership. Then we use this trait instead of I: IntoIterator
as the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I haven't tested whether this has any performance impact. I initially tried to do this with slices rather than iterators, but ran into problems with the generics (there were always T
and &T
involved which caused all sorts of problems)
I'm not sure if I get the idea of SliceableIterator
. How would we specialize between slices and other IntoIterator
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah playing around with this locally I see exactly what you mean by the interaction between T
and &T
. I feel like there should be a solution so I'm going to continue to mull this over for a day or two.
So far I got to something like this:
pub trait SliceableIntoPyObjectIterator<'py>: Sized {
type SliceItem;
type IntoPyObjectItem: IntoPyObject<'py>;
type IntoIter: ExactSizeIterator<Item = Self::IntoPyObjectItem>;
fn into_py_object_iterator(self) -> Self::IntoIter;
fn as_slice(&self) -> &[Self::SliceItem];
}
where the bytes case then becomes
fn iter_into_pyobject<I>(
iter: I,
py: Python<'py>,
_: crate::conversion::private::Token,
) -> Result<Bound<'py, PyAny>, PyErr>
where
I: crate::conversion::SliceableIntoPyObjectIterator<'py, SliceItem = Self>,
{
Ok(PyBytes::new(py, iter.as_slice()).into_any())
}
... and the default implementation calls the .into_py_object_iterator()
method instead to consume the elements.
But as you found, I'm having trouble with T
and &T
. 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I gave this a really deep investigation and found that yes, new_with
is a bit slower than PyBytes::new
, but it's not so much slower that it's a deal breaker.
I opened #4423 and #4424 which are two attempts to try to optimise the conversion a little bit. Both branches include two commits; the first is a set of benchmarks which I was using to study performance.
Overall, I think I'm more comfortable with the approach taken in #4424 which doesn't attempt to use unsafe
to be able to then use PyBytes::new
, and instead is essentially a small tidy up on top of this branch. If users report performance issues we can always revisit the unsafe
option later. But this is already so much faster than the current Vec<u8>
conversion which creates a list of integers that I doubt users will really notice the overhead of us using PyBytes::new_with
here.
Would be interested to hear what you think of those options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the throughout investigation! I'm also way more comfortable with the iterator approach in #4424. Unsafely casting all these types seems like at footgun to me at best, even if this would be fully contained within PyO3.
For the general case I'm unsure about whether it matters for soundness to have an instantiation of impl or whether it needs to be valid for all T
in general. Having a safety guarantee about the instantiation of a generic type sounds a bit sketchy to me. If we're on the cautious side and say it has to be valid for all T
I think this is at least invalid for zero sized types, since these would only be valid for zero sized reads on their dangling pointer and not any following address.
And also for this specific case I think this is unsound for something like a Vec<&u8>
which would use <&u8>::iter_into_pyobject
, which itself would use Vec<&u8>::as_bytes_slice
and thus cast &u8
to u8
⚡
But this is already so much faster than the current
Vec<u8>
conversion which creates a list of integers that I doubt users will really notice the overhead of us usingPyBytes::new_with
here.
I agree here. This was also a reason why I initially did not put too much thought into it, since it looked like I would have at worst a similar performance characteristic to the list case we have currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we want to proceed here @davidhewitt? Should we merge this as is and then followup in #4424 or should we just cleanup #4424 and merge that directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, I just had a potentially better idea in #4442 which uses PyBytes::new
and allows &[u8]
target type to be PyBytes
. If you like it, maybe let's merge that one directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I just wasn't sure whether we were still thinking here of we reached consent 🙃
You're right, this has an immediate effect, because of the macro specialization. I totally forgot that 😅 , I will add a migration entry 👍 |
I mean I guess we can probably add another layer to the autoderef specialization to prefer |
I kind of hope that it won't be necessary 🙃 , since it's also fairly easy to work around, if that conversion is not desired. |
b8978fc
to
e1fe5ec
Compare
e1fe5ec
to
35c005b
Compare
Superseded by #4442 |
Followup to #4060
XRef #4182
This implements specialization for collections holding bytes (in the new
IntoPyObject
trait) by a provided, sealediter_into_pyobject
method. The method is intended to be overwritten by the element type (for nowu8
only) and turns an iterator of elements into an appropriate Python type in type erased form. The default implementation creates aPyList
while theu8
override returnsPyBytes
.This method is then used by the generic
IntoPyObject::into_pyobject
implementation of collection types. I updated the implementations ofVec<T>
,&[T]
,[T; N]
andSmallVec<A>
accordingly. (I hope did not forget something)From a quick test everything seems to work out. If this seems sensible to work with, I will add a few tests and the newsfragment.
Big thanks to @diliop for the previous work in #4182!