Optimize the layout of the Tensor types

Work is being done to introduce a `Tensor<T>` and supporting types. Due to often representing slices of multi-dimensional memory, there is quite a lot of additional data that needs to be tracked beyond what something like `Span<T>` needs. Correspondingly, the naive approach requires tracking multiple `nint[]` to support the potential for an arbitrary number of dimensions and therefore for an allocation to be made per slice. Doing these allocations every time a slice needs to be produced can get expensive and should ideally be optimized to allow avoiding it for common dimension counts.

A simple approach would be to track a single `nint[]` where it has `rank` pieces of data tracking the length of each dimension and then `rank` more pieces of data tracking the stride of each dimension. But, this still necessitates an allocation every time. The next best thing would be to track data inline for some common dimension counts, but this quickly grows the size of the `TensorSpan` and that can itself have negative impact due to the larger copies required when passing the data by value, it can also negatively impact the CPU cache if it grows too large.

As such, the optimal setup is likely to pick a limit that is representative of commonly encountered dimension counts and which is no larger than a single cache line (typically assumed to be 64 bytes).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize the layout of the Tensor types #102268

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize the layout of the Tensor types #102268

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions