-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-550: [Format] Draft experimental Tensor flatbuffer message type #435
Conversation
Change-Id: I29a980e132d31711a49ddd3a68824dfeca262d50
format/Tensor.fbs
Outdated
/// The size of a memory increment necessary to advance 1 cell along a given | ||
/// axis. If the strides member is null or has 0 length, then the strides | ||
/// will be computed from the shape according to row major order | ||
strides: [long]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably makes sense to disallow negative strides. NumPy allows this, but it makes serialization very difficult
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 the used of signed types for strides in the buffer protocol of python is really annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using int64_t
is useful for maximum compatibility, but at least having a comment to say that negative strides aren't supported. I guess if someone tries to write memory with negative strides using arrow::ipc
it will need to be transformed to be all positive (more expensive)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it also make sense to add the stride
of a dimension to the TensorDim
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is possible, but a single stride doesn't have much meaning without looking at all of the dimensions today
Nice, this looks reasonable! |
Another option would be to not have strides whatsoever and only allow two orders for serialization
A single byte would be sufficient for that, and it would improve the memory footprint while covering most use cases. |
@SylvainCorlay that had been my initial thought. I'll change the spec to be either row/column-major, and if there is sufficient demand in the future we can add non-contiguous/strided ability. I agree that arbitrary striding mostly only makes sense in-memory |
If by "either row/column-major" you mean with a boolean switch between them, I am 👍 . ( there is probably a good demand for both row-major (e.g. numpy) and column-major (e.g. julia). ) |
Change-Id: I2ffa55305e770d9aa5c1ecdea74ee65b91589a8a
@SylvainCorlay I made it an enum so that we can add a |
format/Tensor.fbs
Outdated
name: string; | ||
} | ||
|
||
enum TensorOrder : short { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
short
is two bytes long (in most implementations).
could this be a 1-byte bool
, char
, or unsigned char
enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool:
enum TensorOrder : bool {
ROW_MAJOR = false,
COLUMN_MAJOR = true,
};
char:
enum TensorOrder : char {
ROW_MAJOR = 'r',
COLUMN_MAJOR = 'c',
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer char
over bool
, it is more expressive and easier to extend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it from short to byte (there is no char
in Flatbuffers)
Change-Id: I129312f80ea6f561ee5a56b2393bda2ddcddefca
Does flatbuffer requires buffers to be aligned on a certain number of bytes or is it up to this spec to potentially specify that there would be padding in the end of the header for it to be aligned at a given level? |
What kind of buffers are you referring to? When we write the Flatbuffer in the stream/file format with a length prefix, it includes padding to an 8 byte boundary: https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/writer.cc#L169 |
Merging this to move on to a prototype of writing and reading tensors to shared memory, will have a patch together in the next few days hopefully |
SIMD intrinsics to load data into an SIMD register have aligned and unaligned versions. The former is generally much faster than the latter. So if the main buffer of the tensor can be aligned on a e.g.
basis. So if the entire buffer (including shape and order) is 32-bytes aligned, it is good for the performances that the offset for the data also has a 32 bytes alignment. So adding some padding after the information on type / shape / order to align the data buffer can make things faster at a library level. |
@SylvainCorlay sounds good -- I will plan to ensure that the tensor memory is written with at least 32-byte alignment and padding. |
@JohanMabille has written a standard STL allocator for aligned allocation, which when used, would guarantee that the buffer (including type/shape/order) would be aligned so all that would be required from the ARROW spec to have good SIMD perfs is that the offset of the beginning of the memory is a multiple of 32 bytes. |
Tensor-like data occurs very frequently in scientific computing and machine learning applications that are mostly implemented in C and C++. Arrow's C++ memory management and shared memory utilities can help serve these use cases for zero-copy data transfer to other tensor-like data structures (like NumPy ndarrays, or the tensor objects used in machine learning libraries like TensorFlow or Torch).
The Tensor data structure is loosely modeled after NumPy's ndarray object and TensorFlow's tensor protocol buffers type (https://github.com/tensorflow/tensorflow/blob/754048a0453a04a761e112ae5d99c149eb9910dd/tensorflow/core/framework/tensor.proto).
cc @pcmoritz @robertnishihara @SylvainCorlay @JohanMabille