RFC: Check element types #257

adamreichold · 2022-01-11T19:29:39Z

This a single WIP commit on top of #256. I wonder if it would enable us to salvage custom element types but I have many questions:

Is Py<T> with T: PyTypeInfo really a sufficient bound to allow as elements of object arrays? @davidhewitt
It seems problematic to hand PyArray<Py<T>, D> to the check method when we are still figuring out whether the elements are indeed Py<T>?
Especially so when using our own iterator bindings which will produce &mut Py<T> which is almost surely UB if the element is not in fact Py<T>? So this would be mean using the FFI directly?

But finally, the array might be aliasing by Python code. So who is to say that after checking the element types, the Python code won't go ahead and replace individual elements by differently typed objects? This would suggest that we can never support object arrays containing anything but PyObject?

aldanor · 2022-01-13T22:29:58Z

src/dtype.rs

+        let type_object = T::type_object(py);
+        let iterator = NpySingleIterBuilder::readwrite(array).build()?;
+
+        for element in iterator {
+            if !type_object.is_instance(element)? {
+                return Err(PyDowncastError::new(array, type_name::<T>()).into());
+            }
+        }


This could be probably written a bit shorter like

if NpySingleIterBuiler::readwrite(array).build()?.any(|el| !type_obj.is_instance(el)) { return Err(...); }

Possibly, but I am not sure we can use the iterators at all as they produce references to T, but we are still checking whether we actually have T or something else entirely.

aldanor · 2022-01-13T22:32:47Z

src/dtype.rs

    const IS_COPY: bool = false;

+    fn check_element_types<D: Dimension>(py: Python, array: &PyArray<Self, D>) -> PyResult<()> {


What I personally don't like is that there's again logic being added into Element, so it goes from being a descriptive trait to an 'actionable' trait once again.

Perhaps something like this might be cleaner? (and then the element-type checking logic can be moved out of here somewhere into array)

pub unsafe trait Element { const IS_COPY: bool; fn py_type(py: Python) -> Option<&PyType>; // None for anything with IS_COPY=true or PyAny fn get_dtype(py: Python) -> &PyArrayDescr; }

What I personally don't like is that there's again logic being added into Element, so it goes from being a descriptive trait to an 'actionable' trait once again.

The point of this was to make the policy changeable by user code, i.e. I really known I'll get PyArray<T, D> and do not want to pay the cost of checking the types, I can implement this unsoundly myself. But I guess due to the general issues, using PyArray<PyObject, D> together with unchecked_downcast would be the way to make this work. We just need to document this...

I guess my point was, it doesn't feel like it should be stuffed in here. Do you have an example in mind where you would implement something custom for your type that doesn't just go through the elements and checks instance types? Like, what else can you reasonably do?

Another disadantage here being, if I do want to do something custom downstream but similar to what's being done here, I would basically have to literally copy-paste rust-numpy source code that iterates over numpy array items - that's not very nice.

And finally, the "don't check this" part doesn't sound like it belongs to the type itself, but rather to the call point. If you bind it to the type, you can't do "here I know it's safe, don't check, but here it's arbitrary user input, need to check" which would be a fairly reasonable use case, I believe. This you could probably achieve by having a separate trait on PyArray<T, D> (pulling this stuff out of T: Element) with default impl that always checks instance types for any Py<T> that's not PyAny, and then add a newtype wrapper like Unchecked<PyArray<T, D>> that would override this and impl a FromPyObject for it. Just a thought. (and, Unchecked is probably too generic a name here)

(Note that you could probably also achieve the above by having an Unchecked<T> but this would be much worse from the ergonomics standpoint as now you have an array of ugly wrappers instead of a single wrapper of an array.)

aldanor · 2022-01-13T23:21:55Z

But finally, the array might be aliasing by Python code. So who is to say that after checking the element types, the Python code won't go ahead and replace individual elements by differently typed objects? This would suggest that we can never support object arrays containing anything but PyObject?

There's a few distinct cases:

You build an array of objects and transfer its ownership to Python. All is good. Why would we prevent this?
If you get an array of objects from Python you need to somehow prevent Python to touch it while you hold it. This can only be done by setting flags.writeable := False by the master view whose flags.owndata == True - e.g. the following example would throw an exception:

x = np.arange(3)
assert x.flags.owndata
x.setflags(write=False)
y = x.view()
y.setflags(write=True)  # ValueError: cannot set WRITEABLE flag to True of this array

adamreichold · 2022-01-14T07:43:55Z

You build an array of objects and transfer its ownership to Python. All is good. Why would we prevent this?

I would like to not prevent this, but I have to produce a sound interface handling both cases. Having separate PyArray and PyReadonlyArray types is already more complicated than I would like. The only option I see to still enable PyArray<Py<T>, D> for this purpose only would be to fail in extract when T != PyAny but otherwise allow this. But I feel like the ergonomics of this are bad.

adamreichold · 2022-01-15T13:02:50Z

I have added stronger language to #255 based on the aliasing issue raised here.

…ng Py<T> with custom types T.

adamreichold · 2022-01-15T18:16:57Z

While considering whether to implement

The only option I see to still enable PyArray<Py, D> for this purpose only would be to fail in extract when T != PyAny but otherwise allow this.

to enable

You build an array of objects and transfer its ownership to Python. All is good. Why would we prevent this?

I noticed that even that isn't that simple: As soon as you one does something like

let array: PyArray<Py<T>, D> = ...;
callable.call1(py, (array,));

then array might stop containing Py<T>. So we would need to introduce a separate type from PyArray whose ownership could be transferred into PyArray before handing it off to Python (which can do only shared ownership). Luckily, we already have such a that, i.e. ndarray::Array.

So I guess the only thing we should provide is a zero-copy function to go from ndarray::Array<Py<T>, D> to PyArray<PyObject, D> where T: ToPyObject but Py<T>: !Element?

adamreichold · 2022-01-17T18:49:56Z

Closing this in favour of #255 as I do not think this can work soundly with the one use case that can being better supported by PyArray::from_owned_object_array from #255.

adamreichold marked this pull request as draft January 11, 2022 19:31

aldanor reviewed Jan 13, 2022

View reviewed changes

WIP: Check dynamic type of individual elements to enable arrays holdi…

f31030e

…ng Py<T> with custom types T.

adamreichold force-pushed the custom-element branch from febacd6 to f31030e Compare January 15, 2022 18:11

adamreichold closed this Jan 17, 2022

adamreichold deleted the custom-element branch January 17, 2022 18:50

adamreichold mentioned this pull request Jan 19, 2022

Remove unsound custom element example. #255

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Check element types #257

RFC: Check element types #257

Uh oh!

adamreichold commented Jan 11, 2022

Uh oh!

aldanor Jan 13, 2022

Uh oh!

adamreichold Jan 14, 2022

Uh oh!

aldanor Jan 13, 2022 •

edited

Loading

Uh oh!

adamreichold Jan 14, 2022

Uh oh!

aldanor Jan 15, 2022 •

edited

Loading

Uh oh!

aldanor commented Jan 13, 2022

Uh oh!

adamreichold commented Jan 14, 2022

Uh oh!

adamreichold commented Jan 15, 2022

Uh oh!

adamreichold commented Jan 15, 2022

Uh oh!

adamreichold commented Jan 17, 2022

Uh oh!

Uh oh!

		const IS_COPY: bool = false;

		fn check_element_types<D: Dimension>(py: Python, array: &PyArray<Self, D>) -> PyResult<()> {

RFC: Check element types #257

RFC: Check element types #257

Uh oh!

Conversation

adamreichold commented Jan 11, 2022

Uh oh!

aldanor Jan 13, 2022

Choose a reason for hiding this comment

Uh oh!

adamreichold Jan 14, 2022

Choose a reason for hiding this comment

Uh oh!

aldanor Jan 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adamreichold Jan 14, 2022

Choose a reason for hiding this comment

Uh oh!

aldanor Jan 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aldanor commented Jan 13, 2022

Uh oh!

adamreichold commented Jan 14, 2022

Uh oh!

adamreichold commented Jan 15, 2022

Uh oh!

adamreichold commented Jan 15, 2022

Uh oh!

adamreichold commented Jan 17, 2022

Uh oh!

Uh oh!

aldanor Jan 13, 2022 •

edited

Loading

aldanor Jan 15, 2022 •

edited

Loading