Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Extend SliceBox and unify Array creation from owned data #231

Merged
merged 4 commits into from
Jan 6, 2022
Merged

RFC: Extend SliceBox and unify Array creation from owned data #231

merged 4 commits into from
Jan 6, 2022

Conversation

adamreichold
Copy link
Member

@adamreichold adamreichold commented Dec 30, 2021

@kngwyu This is a follow-up to #230 and #233 and strictly optional, but I think getting rid of the pointer manipulations in Array: IntoPyArray is nice and unifying the different code paths should reduce the maintenance burden.

@adamreichold
Copy link
Member Author

The CI failure has #232 as its same root cause, however I am unsure how to fix this and still be able to store e.g. Array<A, D> within Owner which would make its size type-dependent. In this case, yielding separate PyTypeObjects for each T would certainly be preferable.

@davidhewitt Is it possible to make general statements on how often PyTypeInfo::type_object_raw is called, i.e. what the impact of having a type-ID-keyed hash table in there? Is there a way to construct a PyTypeObject without going through LazyStaticType?

@adamreichold adamreichold changed the title RFC: Simplify SliceBox and unify Array creation from owned data RFC: Extend SliceBox and unify Array creation from owned data Dec 30, 2021
@davidhewitt
Copy link
Member

Sorry for the slow reply. My laptop keyboard is having issues so I'm not keeping up with all discussions at the moment.

type_object_raw may be called quite freuently, e.g. every time this class is created and also every time a Python type needs to be type checked.

@kngwyu
Copy link
Member

kngwyu commented Jan 4, 2022

Thank you! (and sorry for my late reply... we have long new-year holidays in Japan).
The implementation looks nice, but what is the motivation to support Vec<T> directly? Efficiency?

@adamreichold
Copy link
Member Author

The implementation looks nice, but what is the motivation to support Vec directly? Efficiency?

Efficiency and simplicity: into_boxed_slice can re-allocate to shed excess capacity. Also the re-computation of the pointer to the first element is not necessary any more.

@adamreichold
Copy link
Member Author

(By the way, I think it would only make sense to review this PR after #230 and #233 are done with it as this might change significantly based on the feedback there. Especially #233 as it seems required to avoid memory unsafety.)

Copy link
Member

@kngwyu kngwyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I left some comments,

src/owner.rs Outdated
Comment on lines 12 to 17
pub(crate) struct Owner {
ptr: *mut u8,
len: usize,
cap: usize,
drop: unsafe fn(*mut u8, usize, usize),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't Owner a too broad name? I prefer to PySliceContainer, SliceContainer, SliceOwner and so on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went back to the "container" naming, i.e. PySliceContainer for the type and container: C for the bindings.

src/owner.rs Outdated
let cap = 0;
let drop = drop_boxed_slice::<T>;

mem::forget(data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to Box::into_raw for readability. We use mem::forget for various usages, but into_raw has only one purpose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too. :-) The problem is that Box::into_raw will yield *mut [T], i.e. a fat pointer, for the boxed slice which does not apply to the Vec<_> case. Hence I manually deconstructed the boxed slice to be able to handle both types in a more or less uniform way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh OK I got it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://doc.rust-lang.org/std/primitive.pointer.html#method.as_mut_ptr this would work but it's still unstable too 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can use std::slice::from_raw_parts for Vec side, but either approach is not so good, so either is fine for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the current approach is a reasonable compromise as it handles both types in the same way. I did add another FIXME comment w.r.t. Box::into_raw though.

src/owner.rs Outdated
Comment on lines 34 to 39
Self {
ptr,
len,
cap,
drop,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to avoid

        Self {
            ptr: data.as_ptr() as *mut u8,
            len: data.len(),
            cap: 0,
            drop: drop_boxed_slice::<T>,
        }

?
Readability?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mem::forget call consumes data which is why I had to put these into separate bindings.

src/owner.rs Outdated Show resolved Hide resolved
By providing type erasure for both Box<[T]> and Vec<T> we can avoid having to
transform Vec<T> and Array<A, D> into boxed slices which can potentially re-allocate.
Both create an array from existing data, they just differ w.r.t. how
the how the owner on the Python heap is handled.

/// Utility type to safely store Box<[_]> or Vec<_> on the Python heap
pub(crate) struct PySliceContainer {
ptr: *mut u8,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, numpy does not allow mutating the array that has a parent array, so this can be *const in theory. So, let's try it if you are also curious if it works 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think owning raw pointers are *mut by convention.

@adamreichold
Copy link
Member Author

Just noticed that I did not add a changelog entry for PyArray::borrow_from_array. Technically it does not belong here, but I added the commit to this PR nevertheless as it is a follow-up and so somewhat related.

@adamreichold adamreichold merged commit 0efa2cd into PyO3:main Jan 6, 2022
@adamreichold adamreichold deleted the simpler-slice-box branch January 6, 2022 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants