-
Notifications
You must be signed in to change notification settings - Fork 121
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Discussion: dtype system and integrating record types #254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
W.r.t. the question on why
for lack of involvement in the other design decisions. Here, we discussed that a simple
I think having a GIL-protected lazily initialized static like in rust-numpy/src/slice_container.rs Lines 109 to 112 in eb1068b
might be preferable to a single global data structure?
While the overhead of producing a |
That seems to be the case indeed: So we can probably replace our calls to
I think it would actually be helpful to ignore this part completely for now and design the necessary types without any macro support first. The proc-macros should essentially only relieve programmers from writing repetitive and error prone code, shouldn't they? |
Then my initial guess was correct. I wonder though whether it's actually noticeable, i.e. has anyone tried to measure it? With all the other Python cruft on top, I would strongly suspect there'd be no visible difference at all. I understand where it comes from, it's just there's another implicit logical requirement in the
With record types, it's pretty much a requirement; anything else would be extremely boilerplate-ish and unsafe (with the user having to provide all offsets and descriptors for all fields manually). It might be nice to leave an option to construct descriptors manually at runtime, but I can't see a single real use case for it off top of my head. So, the necessary types and the infrastructure have to be designed in a way that would make it trivial to integrate them into proc macros, from the get go.
Yes, I think it does (and so does pybind11).
Yep - thanks for pointing that out, something like that should be sufficient. |
Just checked it out: if one replaces Line 850 in eb1068b
with something runtime like if !T::get_dtype(py).is_object() { the difference in For comparison, creating arange(1, 5, 1) is 31ns. So yea... you could say it's noticeable (but only if you create hundreds millions of arrays). |
ATM, the
I think if anything has measurable overhead, then it is acquiring the reference to a
proc-macros generate code that is injected into the dependent crate, i.e. they have to use this crate's public API and all the code they produce must have been possible to be written by hand. The point is not that I expect user code to manually produce those trait implementations (but avoiding the build time overhead of proc-macros would be one reason to do so), but that we can focus the discussion on the mechanisms and leave the proc-macro UI for later. |
One possible caveat with a targeted benchmark here is that deciding this at compile time is about removing code and thereby reducing instruction cache pressure which could only be noticeable when other code shares the address space. |
As a first actionable result of your findings, why not start with a PR to simplify the |
I fear that it might actually be unsound: During extraction, we only check the data type to be object, but not whether all the elements are of type I fear we need to remove this suggestion and limit ourselves to |
While I'm often guilty in micro-optimizing Rust code down to every single branch hint myself, I just don't think it matters here, the last thing you're going to want to be doing in numpy bindings is instantiate millions of array objects in a loop - that was kind of my point. Here's a few more thoughts on what would need to happen if
Oh yea, and also, record types can be nested - that would be fun to implement as const as well (at runtime we can just box it all on the heap). |
The note above only applies if it remains as If it's changed to |
Yea, I was asking myself whether I was reading that code correctly - it only checks for There's three options:
|
Yes, I've already started on this, as a first step, will try to push something shortly. There's actually a few more subtle dtype-related bugs I've uncovered in the process that I'll fix as well (the tests are very sparse so I guess noone noticed anything so far...). |
(This is a bit of a wall of text, so thanks in advance to whoever reads through it in its entirety. I tried to keep it as organized as possible.) tagging @adamreichold @kngwyu @davidhewitt Ok, now that step 1 (#256) is ready to be merged, let's think about next steps. I'll use pybind11 as reference here since that's what I'm most familiar with having implemented a big chunk of that myself in the past. Descriptor (format) strings and PEP 3118In pybind11, dtypes can be constructed from buffer descriptor strings (see The benefit of using descriptor strings - they're easy to hash if a registry is used, there's no nesting, etc - it's just a simple string that can cover any dtype possible, including scalar types, object types, multi-dimensionals subarrays, recarrays, and any combination of the above. We can also generate them with proc-macros at compile time and make them Now, if we go this route, we'll have to delve into "format chars", "type kinds", "buffer descriptor strings" etc. There's obviously a big overlap with Buffers,
|
Just my first thoughts after reading:
|
Yeah, I really don't like the current disagreement between PyBuffer and rust-numpy.
I don't think using |
One could maybe say it is strongly wrongly typed. ;-) More seriously though, I just meant to say that it would be nice to represent the descriptors within the type system directly which we will probably do anyway if do not consistently delegate their interpretation to Python code. |
Ok then. I think I'll try to sketch type descriptor system (focused on buffers to start with, not numpy, as the only real difference would be datetime/timedelta types) in a separate temporary crate and then share it here for further discussion if it works out. Here's a few random points, assuming there's some magic
|
Sorry but I still don't understand what kind of recursiveness/hierarchy should be allowed to support record types... I'll come back to discussion after reading some PEPs 😓 |
I see two options from the top of my head: Using arena allocation if recursion is always handled by references. Or using a Cow-like type for static references or boxes. Admittedly the ergonomics of neither approach are particularly nice as Rust is highly explicit about memory management as usual... (If we are out for efficiency |
@kngwyu See below in this sketch where type recursion is (sub-array types - an array may contain any type as element type, and element type may be another subarray). @adamreichold Yes, that's what I've already implemented ( pub struct FieldDescriptor<T: ValueType> {
ty: TypeDescriptor<T>,
name: Option<Cow<'static, str>>,
offset: usize,
}
pub struct RecordDescriptor<T: ValueType> {
fields: Cow<'static, [FieldDescriptor<T>]>,
itemsize: usize,
}
pub struct ArrayDescriptor<T: ValueType> {
ty: BoxCow<'static, TypeDescriptor<T>>,
shape: Cow<'static, [usize]>,
}
pub enum TypeDescriptor<T: ValueType> {
Object,
Value(T),
Record(RecordDescriptor<T>),
Array(ArrayDescriptor<T>),
} |
Ok, I have started sketching a few things out, please feel free to browse around (although it's obviously a very early WIP sketch, I figured I'd better share it asap so as not to spend too much time on something that will be thrown away): What's implemented so far:
I'm currently pondering on how to link the two What I've tried to do: // pyo3
pub enum Scalar {
// all the PEP 3118 types: bools, ints, floats, complex, char strings, etc
}
pub unsafe trait Element: Clone + Send {
const TYPE: TypeDescriptor<Scalar>;
// this Scalar covers the most of PEP 3118 type system
}
// numpy
pub enum Scalar {
Base(pyo3_type_desc::Scalar),
Datetime, // 'M'
Timedelta, // 'm'
}
pub unsafe trait Element: Clone + Send {
const TYPE: TypeDescriptor<Scalar>;
// this Scalar is either the PEP 3118 or M/m types
} The problem is in implementing a blanket impl like this: impl<T: pyo3_type_desc::Element> Element for T {
const TYPE: TypeDescriptor<Scalar> = ???;
// How do we convert TypeDescriptor<A> to TypeDescriptor<B> here?
// See TypeDescriptor::map() - but it would not work in const context :(
} Having a Anyways, any thoughts welcome. I have a feeling we can do it 😺 |
Just for completeness, the hack/solution I've mentioned in the last paragraph is basically // numpy
pub trait Element: Clone + Send {
fn type_descriptor() -> TypeDescriptor<Scalar>;
}
impl<T: pyo3_type_desc::Element> Element for T {
fn type_descriptor() -> TypeDescriptor<Scalar> {
T::TYPE.map(|&scalar| Scalar::Base(scalar))
}
} So, unsafe impl Element for MyType {
fn type_descriptor() -> TypeDescriptor<Scalar> {
const TYPE: TypeDescriptor<Scalar> = ...;
TYPE
}
} |
And here's another problem with the above system :( // pyo3
unsafe impl<T: Element> Element for (T,) {
const TYPE: TypeDescriptor<Scalar> = T::TYPE;
}
// numpy
unsafe impl<T: Element> Element for (T,) {
fn type_descriptor() -> TypeDescriptor<Scalar> {
T::type_descriptor()
}
}
So now I'm even more unsure of how to proceed. |
It just my first so it might be completely bogus: Could we make pub unsafe trait Element<S>: Clone + Send {
const TYPE: TypeDescriptor<S>;
} so that there is only impl for tuples like unsafe impl<T: Element, S> Element<S> for (T,) { and rust-numpy would not define another I am not sure how a trait bound for |
And testing a few things out further, in order to e.g. map generic tuples to type descriptors, use memoffset::*;
fn main() {
const O: usize = offset_of_tuple!((u8, i32, u8, i32), 0);
dbg!(offset_of_tuple!((u8, i32, u8, i32), 0));
dbg!(offset_of_tuple!((u8, i32, u8, i32), 1));
dbg!(offset_of_tuple!((u8, i32, u8, i32), 2));
dbg!(offset_of_tuple!((u8, i32, u8, i32), 3));
} outputs
So while all this will get probably inlined to a compile-time constant in the end, we can't force it into a |
@adamreichold Thanks for the input - I had this idea in the beginning but somewhy it was discarded and forgotten. Any ideas are really welcome - I think I can handle most of the technical numpy/rust details and quirks myself, but brainstorming design decisions together is exponentially more efficient. The pub trait ScalarDescriptor: 'static + Clone {
fn itemsize(&self) -> usize;
fn alignment(&self) -> usize;
} I guess we could further restrict it by |
@adamreichold So, from the first glance, this actually seems to work, see below (need to dig around it further to confirm). The magic key here is the // pyo3
pub trait ScalarDescriptor: 'static + Clone + From<Scalar> {
fn itemsize(&self) -> usize;
fn alignment(&self) -> usize;
}
pub unsafe trait Element<S: ScalarDescriptor>: Clone + Send {
fn type_descriptor() -> TypeDescriptor<S>;
}
#[derive(Clone, Copy, PartialEq, Eq, Hash)]
pub enum Scalar { ... }
// implement `Element` for built-in int/float types etc like so:
macro_rules! impl_element {
($ty:ty, $expr:expr) => {
unsafe impl<S: ScalarDescriptor> Element<S> for $ty {
#[inline]
fn type_descriptor() -> TypeDescriptor<S> {
const TYPE: Scalar = $expr;
debug_assert_eq!(std::mem::size_of::<$ty>(), TYPE.itemsize());
TypeDescriptor::Scalar(S::from(TYPE))
}
}
};
...
}
// <--- here (in pyo3) is where we can implement generic logic for arrays/tuples etc
// (proc-macro for record types can also be generic and numpy-independent)
// numpy
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
pub enum Scalar {
Base(BaseScalar),
Datetime,
Timedelta,
}
impl From<BaseScalar> for Scalar {
fn from(scalar: BaseScalar) -> Self {
Self::Base(scalar)
}
}
impl ScalarDescriptor for Scalar { ... } |
To think about, alternatively we can not require |
For scalar types, alignment is trivial. For object types, it's pointer-width, I guess? ( But for record types... I'm a bit lost. Given that we will likely never call
Again, I'm not exactly sure about how this should be handled, so any ideas welcome. P.S. this is the last bit left for me to finish the entire |
I am probably missing something, but isn't calling |
Yea. It's an option for when we're going down the "Rust type -> TypeDescriptor -> numpy dtype" path. In order to enable this, there would definitely need to be an There's more paths, like One more catch is that I |
Another thing that all of this is slowly leading to, and that's one of the most interesting parts - logic of how to treat two type descriptors "equivalent" or "compatible". IIUC, numpy ignores the alignment there. So the alignment information we provide to numpy will be mostly useful if new dtypes are created out of the ones we export (e.g. wrapping them in rec types). |
@adamreichold Ah, yea, I forgot, the confusing part, here's a quote from dtype creation logic in /* Structured arrays get a sticky aligned bit */
if (align) {
new->flags |= NPY_ALIGNED_STRUCT;
} The Normally, it will only be set when manually requested via passing |
I think that is as good/bad as any other
If I understand this somewhat correctly, then I think we should compare the alignment reported by |
It's not that simple, I think. I'll try to explain how I understand what I know:
So, I guess the question is, in the |
So far I went with the latter (leaving some todos around) - if a struct layout fits all three criteria, we will just mark it as aligned. One quirk, in 'unaligned' mode, I think numpy always just sets the alignment to 1 (regardless of struct layout). We have a choice there, either set it to 1 or set it to the actual alignment (I went with the latter for now but we can switch it if we want). Anyways, some good progress - untested but it already compiles: |
Just reporting on a bit more progress: started writing tests (everything works so far), and, as an experiment, implemented A few questions that came up related to arrays/tuples (any thoughts welcome):
|
Just wanted to say I've finally had a chance to read this thread. There's a huge amount of content here and I think you're both more knowledgeable than I on this topic. Thank you so much for working on it 👍 Regarding the PyO3 parts - yes I would definitely welcome some love to the PyBuffer code! And we already have some implementations for const generics which automatically get enabled on new enough compilers - see https://github.com/PyO3/pyo3/blob/91648b23159acbf6b44d1245060efa86cfbdf73f/src/conversions/array.rs#L3 If there's anything you need specifically from me, please ping and I'll do my best to comment. My laptop is currently OOO so I'm quite behind on more complex tasks, am limited to when my family doesn't need me and I can be at a desktop. Please have my blessing to be free to design this as appropriate; I'll try to drop in to read in future again (hopefully it's an easier read second time around 😂). |
@davidhewitt Cheers & no worries, thanks for taking time to skim through this, there's quite a lot indeed :) There's no rush since there's still quite a lot left to do and to test, but it's gradually converging. For now, I'm working on a shared pyo3/numpy type descriptor prototype in here. The code in Re: const generics, yes, I've started implementing them in exactly the same way (and even named the feature identically). Should be all good then. |
(Just wanted to say that I know that I have unread stuff here since 2021-01-15. I just have not yet found the time to go through it.) |
A little bit of a progress report as I found a bit of time to work on it today (all of it can be seen in the recent commits in the repo, link above):
Next thing, I'll probably work a bit on converting dtype -> type descriptor (the other direction). Also maybe sketch the |
Some more progress - due to In a nutshell, you can just use |
Some more progress updates (this is kinda where all of this has been heading from the start) - here's one of the most recent tests. This compiles and runs: #[derive(Clone, numpy::Record)]
#[repr(C)]
struct Foo<T, const N: usize> {
x: [[T; N]; 3],
y: (PyObject, Timedelta64<units::Nanosecond>),
}
assert_eq!(
Foo::<i64, 2>::type_descriptor(),
td!({"x":0 => [(3, 2); <i64], "y":48 => {0 => O, 8 => m8[ns] [16, 8]} [64, 8]})
); There's still lots of edge cases and bugs to be squashed and tests to be written etc, but on the surface, the core of it seems to be working quite nicely. |
If the semantics of the same type definition with and without the flag are different on the Python side, maybe we should give the user access to it, e.g. make it a parameter to
I think we should definitely add this and do it in the same way as std or PyO3: Add impls up to length 32 using macros on older compilers and use const generics on newer ones dropping the small array macros when our MSRV includes support for const generics.
Personally, I would very much like this crate to provide only a minimal and efficient conversion API without providing any operations on the wrapper types itself. (I think this also applies to arrays itself and we should aim to focus the API to getting from/to ndarray and nalgebra types.) |
Yea, I had the same feeling. For now I've just added transparent wrapper types themselves that can only do two things (aside from being registered in the dtype system) - be constructed from
I've added the min-const-generics part of it, but I can also add the macro-based impl for up-to-length-32 arrays for older compiler versions. Personally, I'm not a huge fan of #[rustversion::since(1.51)]
// impl via min const geneics
#[rustversion::before(1.51)]
// impl via macros The main benefit is that we won't break some dependent crates the day we kill the
I think, first of all, we need to categorise how a type descriptor can be attached to a type. There will be two ways:
There's an |
Yeah, I think using a crate like
I guess this where I am not sure ATM. Reading the NumPy C API, I get the impression that the idea is that I think we should always have an alignment (even if trivial, i.e. one) and we should set the My understanding of the |
The way it works in numpy itself ( PyArray_Descr *new = PyArray_DescrNewFromType(NPY_VOID); // <-- new->alignment := 1
...
if (align) { // <-- align comes from either align=True or from being called recursively
new->alignment = maxalign; // <-- maxalign is computed C-alignment if offsets are not provided
}
...
/* Structured arrays get a sticky aligned bit */
if (align) {
new->flags |= NPY_ALIGNED_STRUCT;
} IIUC, There's 4 cases in numpy constructor itself:
So, the following must hold true: "if Back to our business - I guess we can do something like this? struct RecordDescriptor<T> {
...
alignment: usize, // always set
is_aligned: Option<bool>,
} Where
|
Here's an illustration: def check_alignment(*, offsets=None, align=False):
print(f'offsets={offsets}, align={align} => ', end='')
try:
args = {
'names': ['x', 'y'],
'formats': ['u2', 'u8'],
}
if offsets is not None:
args['offsets'] = offsets
dtype = np.dtype(args, align=align)
dtype_offsets = [dtype.fields[n][1] for n in args['names']]
print(
f'alignment={dtype.alignment}, '
f'isalignedstruct={dtype.isalignedstruct}, '
f'offsets={dtype_offsets}'
)
except BaseException as e:
print(f'error: "{e}"') check_alignment(offsets=None, align=False)
check_alignment(offsets=None, align=True)
check_alignment(offsets=[0, 8], align=False)
check_alignment(offsets=[0, 8], align=True)
check_alignment(offsets=[0, 5], align=False)
check_alignment(offsets=[0, 5], align=True) which outputs offsets=None, align=False => alignment=1, isalignedstruct=False, offsets=[0, 2]
offsets=None, align=True => alignment=8, isalignedstruct=True, offsets=[0, 8]
offsets=[0, 8], align=False => alignment=1, isalignedstruct=False, offsets=[0, 8]
offsets=[0, 8], align=True => alignment=8, isalignedstruct=True, offsets=[0, 8]
offsets=[0, 5], align=False => alignment=1, isalignedstruct=False, offsets=[0, 5]
offsets=[0, 5], align=True => error: "offset 5 for NumPy dtype with fields is not divisible by the field alignment 8 with align=True" |
The semantics look reasonable to me. I do still wonder whether
So this means that NumPy treats this like a |
Yes, it sort of does. But I don't think it makes a distinction between aligned and unaligned reads, all that is left to the compiler. You can see that the usage of |
Ok, re: the last comment, maybe I wasn't entirely clear. In some cases, NumPy does distinguish between aligned and unaligned arrays. /* TODO: We may need to distinguish aligned and itemsize-aligned */
aligned &= PyArray_ISALIGNED(arrays[i]);
}
if (!aligned && !(self->method->flags & NPY_METH_SUPPORTS_UNALIGNED)) {
PyErr_SetString(PyExc_ValueError,
"method does not support unaligned input.");
return NULL;
} Whether the array is aligned or not, e.g. if you select a field like So, in the example above, examples 2, 3 and 4 will have |
My impression is that indeed this means that NumPy will store these records unaligned? So resulting type descriptor would match a Rust type with |
(Sorry, didn't mention that: Personally, I don't think we need roundtripping. I am basically only interested in the |
Well, to start with... NumPy allows overlapping fields. As long as neither of them contains
I think we have to do the reverse mapping, which will be used in
What exactly do you mean by that? You can pass offsets It will only be 'packed' if you don't specify offsets. In this case |
I have to think about this. My first instinct is that this would rather be an argument for improving the NumPy code instead of rolling our own.
From your example in #254 (comment), I got the impression that |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
I've been looking at how record types can be integrated in rust-numpy and here's an unsorted collection of thoughts for discussion.
Let's look at
Element
:npy_type()
is used inPyArray::new()
and the like. Instead, one should usePyArray_NewFromDescr()
to make use of the custom descriptor. Should all places wherenpy_type()
is used split between "simple type, useNew
" and "user type, useNewFromDescr
"? Or, alternatively, should arrays always be constructed from descriptor? (in which case,npy_type()
becomes redundant and should be removed)same_type()
needed at all? It is only used inFromPyObject::extract
where one could simply usePyArray_EquivTypes
(like it's done in pybind11). Isn't it largely redundant? (or does it exist for optimization purposes? In which case, is it even noticeable performance-wise?)DATA_TYPE
constant is really only used to check if it's an object or not in 2 places, like this:Element
essentially is justDataType
? E.g.:DataType::Void
? In which case, how does one recover record type descriptor? (it can always be done through numpy C API of course, viaPyArrayDescr
).&PyArrayDescr
would probably require:Element
should probably be implemented for tuples and fixed-size arrays.#[repr(C)]
,#[repr(packed)]
or#[repr(transparent)]
#[numpy(record, repr = "C")]
. (or not)Copy
? (or not? technically, you could have object-type fields inside)Element
for it manually. This seems largely redundant, given that theDATA_TYPE
will always beObject
. It would be nice if any#[pyclass]
-wrapped types could automatically implementElement
, but it would be impossible due to orphan rule. An alternative would be something like this:OrderedFloat<f32>
orWrapping<u64>
or somePyClassFromOtherCrate
? We can try doing something like what serde does for remote types.The text was updated successfully, but these errors were encountered: