Skip to content

RFC: Check element types #257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

RFC: Check element types #257

wants to merge 1 commit into from

Conversation

adamreichold
Copy link
Member

This a single WIP commit on top of #256. I wonder if it would enable us to salvage custom element types but I have many questions:

  • Is Py<T> with T: PyTypeInfo really a sufficient bound to allow as elements of object arrays? @davidhewitt
  • It seems problematic to hand PyArray<Py<T>, D> to the check method when we are still figuring out whether the elements are indeed Py<T>?
  • Especially so when using our own iterator bindings which will produce &mut Py<T> which is almost surely UB if the element is not in fact Py<T>? So this would be mean using the FFI directly?

But finally, the array might be aliasing by Python code. So who is to say that after checking the element types, the Python code won't go ahead and replace individual elements by differently typed objects? This would suggest that we can never support object arrays containing anything but PyObject?

@adamreichold adamreichold marked this pull request as draft January 11, 2022 19:31
Comment on lines +261 to +268
let type_object = T::type_object(py);
let iterator = NpySingleIterBuilder::readwrite(array).build()?;

for element in iterator {
if !type_object.is_instance(element)? {
return Err(PyDowncastError::new(array, type_name::<T>()).into());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be probably written a bit shorter like

if NpySingleIterBuiler::readwrite(array).build()?.any(|el| !type_obj.is_instance(el)) {
    return Err(...);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but I am not sure we can use the iterators at all as they produce references to T, but we are still checking whether we actually have T or something else entirely.

const IS_COPY: bool = false;

fn check_element_types<D: Dimension>(py: Python, array: &PyArray<Self, D>) -> PyResult<()> {
Copy link
Contributor

@aldanor aldanor Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I personally don't like is that there's again logic being added into Element, so it goes from being a descriptive trait to an 'actionable' trait once again.

Perhaps something like this might be cleaner? (and then the element-type checking logic can be moved out of here somewhere into array)

pub unsafe trait Element {
    const IS_COPY: bool;
    fn py_type(py: Python) -> Option<&PyType>; // None for anything with IS_COPY=true or PyAny
    fn get_dtype(py: Python) -> &PyArrayDescr;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I personally don't like is that there's again logic being added into Element, so it goes from being a descriptive trait to an 'actionable' trait once again.

The point of this was to make the policy changeable by user code, i.e. I really known I'll get PyArray<T, D> and do not want to pay the cost of checking the types, I can implement this unsoundly myself. But I guess due to the general issues, using PyArray<PyObject, D> together with unchecked_downcast would be the way to make this work. We just need to document this...

Copy link
Contributor

@aldanor aldanor Jan 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my point was, it doesn't feel like it should be stuffed in here. Do you have an example in mind where you would implement something custom for your type that doesn't just go through the elements and checks instance types? Like, what else can you reasonably do?

Another disadantage here being, if I do want to do something custom downstream but similar to what's being done here, I would basically have to literally copy-paste rust-numpy source code that iterates over numpy array items - that's not very nice.

And finally, the "don't check this" part doesn't sound like it belongs to the type itself, but rather to the call point. If you bind it to the type, you can't do "here I know it's safe, don't check, but here it's arbitrary user input, need to check" which would be a fairly reasonable use case, I believe. This you could probably achieve by having a separate trait on PyArray<T, D> (pulling this stuff out of T: Element) with default impl that always checks instance types for any Py<T> that's not PyAny, and then add a newtype wrapper like Unchecked<PyArray<T, D>> that would override this and impl a FromPyObject for it. Just a thought. (and, Unchecked is probably too generic a name here)

(Note that you could probably also achieve the above by having an Unchecked<T> but this would be much worse from the ergonomics standpoint as now you have an array of ugly wrappers instead of a single wrapper of an array.)

@aldanor
Copy link
Contributor

aldanor commented Jan 13, 2022

But finally, the array might be aliasing by Python code. So who is to say that after checking the element types, the Python code won't go ahead and replace individual elements by differently typed objects? This would suggest that we can never support object arrays containing anything but PyObject?

There's a few distinct cases:

  1. You build an array of objects and transfer its ownership to Python. All is good. Why would we prevent this?
  2. If you get an array of objects from Python you need to somehow prevent Python to touch it while you hold it. This can only be done by setting flags.writeable := False by the master view whose flags.owndata == True - e.g. the following example would throw an exception:
x = np.arange(3)
assert x.flags.owndata
x.setflags(write=False)
y = x.view()
y.setflags(write=True)  # ValueError: cannot set WRITEABLE flag to True of this array

@adamreichold
Copy link
Member Author

You build an array of objects and transfer its ownership to Python. All is good. Why would we prevent this?

I would like to not prevent this, but I have to produce a sound interface handling both cases. Having separate PyArray and PyReadonlyArray types is already more complicated than I would like. The only option I see to still enable PyArray<Py<T>, D> for this purpose only would be to fail in extract when T != PyAny but otherwise allow this. But I feel like the ergonomics of this are bad.

@adamreichold
Copy link
Member Author

I have added stronger language to #255 based on the aliasing issue raised here.

@adamreichold
Copy link
Member Author

While considering whether to implement

The only option I see to still enable PyArray<Py, D> for this purpose only would be to fail in extract when T != PyAny but otherwise allow this.

to enable

You build an array of objects and transfer its ownership to Python. All is good. Why would we prevent this?

I noticed that even that isn't that simple: As soon as you one does something like

let array: PyArray<Py<T>, D> = ...;
callable.call1(py, (array,));

then array might stop containing Py<T>. So we would need to introduce a separate type from PyArray whose ownership could be transferred into PyArray before handing it off to Python (which can do only shared ownership). Luckily, we already have such a that, i.e. ndarray::Array.

So I guess the only thing we should provide is a zero-copy function to go from ndarray::Array<Py<T>, D> to PyArray<PyObject, D> where T: ToPyObject but Py<T>: !Element?

@adamreichold
Copy link
Member Author

Closing this in favour of #255 as I do not think this can work soundly with the one use case that can being better supported by PyArray::from_owned_object_array from #255.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants