Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-550: [Format] Draft experimental Tensor flatbuffer message type #435

Closed
wants to merge 3 commits into from

Conversation

wesm
Copy link
Member

@wesm wesm commented Mar 23, 2017

Tensor-like data occurs very frequently in scientific computing and machine learning applications that are mostly implemented in C and C++. Arrow's C++ memory management and shared memory utilities can help serve these use cases for zero-copy data transfer to other tensor-like data structures (like NumPy ndarrays, or the tensor objects used in machine learning libraries like TensorFlow or Torch).

The Tensor data structure is loosely modeled after NumPy's ndarray object and TensorFlow's tensor protocol buffers type (https://github.com/tensorflow/tensorflow/blob/754048a0453a04a761e112ae5d99c149eb9910dd/tensorflow/core/framework/tensor.proto).

cc @pcmoritz @robertnishihara @SylvainCorlay @JohanMabille

Change-Id: I29a980e132d31711a49ddd3a68824dfeca262d50
/// The size of a memory increment necessary to advance 1 cell along a given
/// axis. If the strides member is null or has 0 length, then the strides
/// will be computed from the shape according to row major order
strides: [long];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes sense to disallow negative strides. NumPy allows this, but it makes serialization very difficult

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 the used of signed types for strides in the buffer protocol of python is really annoying.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using int64_t is useful for maximum compatibility, but at least having a comment to say that negative strides aren't supported. I guess if someone tries to write memory with negative strides using arrow::ipc it will need to be transformed to be all positive (more expensive)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it also make sense to add the stride of a dimension to the TensorDim?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is possible, but a single stride doesn't have much meaning without looking at all of the dimensions today

@robertnishihara
Copy link
Contributor

Nice, this looks reasonable!

@SylvainCorlay
Copy link

SylvainCorlay commented Mar 24, 2017

Another option would be to not have strides whatsoever and only allow two orders for serialization

  • last-index-varies-faster (commonly called C order or row-major)
  • first-index-varies-faster (commonly called Fortran order or column-major).

A single byte would be sufficient for that, and it would improve the memory footprint while covering most use cases.

@wesm
Copy link
Member Author

wesm commented Mar 24, 2017

@SylvainCorlay that had been my initial thought. I'll change the spec to be either row/column-major, and if there is sufficient demand in the future we can add non-contiguous/strided ability. I agree that arbitrary striding mostly only makes sense in-memory

@SylvainCorlay
Copy link

If by "either row/column-major" you mean with a boolean switch between them, I am 👍 . ( there is probably a good demand for both row-major (e.g. numpy) and column-major (e.g. julia). )

Change-Id: I2ffa55305e770d9aa5c1ecdea74ee65b91589a8a
@wesm
Copy link
Member Author

wesm commented Mar 24, 2017

@SylvainCorlay I made it an enum so that we can add a TensorOrder_STRIDED option later if there is demand

name: string;
}

enum TensorOrder : short {
Copy link

@SylvainCorlay SylvainCorlay Mar 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

short is two bytes long (in most implementations).

could this be a 1-byte bool, char, or unsigned char enum?

Copy link

@SylvainCorlay SylvainCorlay Mar 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool:

enum TensorOrder : bool {
    ROW_MAJOR = false,
    COLUMN_MAJOR = true,
};

char:

enum TensorOrder : char {
    ROW_MAJOR = 'r',
    COLUMN_MAJOR = 'c',
};

Copy link
Contributor

@JohanMabille JohanMabille Mar 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer char over bool, it is more expressive and easier to extend.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it from short to byte (there is no char in Flatbuffers)

Change-Id: I129312f80ea6f561ee5a56b2393bda2ddcddefca
@SylvainCorlay
Copy link

Does flatbuffer requires buffers to be aligned on a certain number of bytes or is it up to this spec to potentially specify that there would be padding in the end of the header for it to be aligned at a given level?

@wesm
Copy link
Member Author

wesm commented Mar 24, 2017

What kind of buffers are you referring to? When we write the Flatbuffer in the stream/file format with a length prefix, it includes padding to an 8 byte boundary:

https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/writer.cc#L169

@wesm
Copy link
Member Author

wesm commented Mar 24, 2017

Merging this to move on to a prototype of writing and reading tensors to shared memory, will have a patch together in the next few days hopefully

@asfgit asfgit closed this in dc3cb30 Mar 24, 2017
@SylvainCorlay
Copy link

SylvainCorlay commented Mar 24, 2017

SIMD intrinsics to load data into an SIMD register have aligned and unaligned versions. The former is generally much faster than the latter.

So if the main buffer of the tensor can be aligned on a e.g.

  • 16 bytes (SSE2)
  • 32 bytes (AVX)

basis.

So if the entire buffer (including shape and order) is 32-bytes aligned, it is good for the performances that the offset for the data also has a 32 bytes alignment.

So adding some padding after the information on type / shape / order to align the data buffer can make things faster at a library level.

@wesm wesm deleted the ARROW-550 branch March 24, 2017 16:00
@wesm
Copy link
Member Author

wesm commented Mar 24, 2017

@SylvainCorlay sounds good -- I will plan to ensure that the tensor memory is written with at least 32-byte alignment and padding.

@SylvainCorlay
Copy link

SylvainCorlay commented Mar 24, 2017

@JohanMabille has written a standard STL allocator for aligned allocation, which when used, would guarantee that the buffer (including type/shape/order) would be aligned

so all that would be required from the ARROW spec to have good SIMD perfs is that the offset of the beginning of the memory is a multiple of 32 bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants