-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Work is being done to introduce a Tensor<T> and supporting types. Due to often representing slices of multi-dimensional memory, there is quite a lot of additional data that needs to be tracked beyond what something like Span<T> needs. Correspondingly, the naive approach requires tracking multiple nint[] to support the potential for an arbitrary number of dimensions and therefore for an allocation to be made per slice. Doing these allocations every time a slice needs to be produced can get expensive and should ideally be optimized to allow avoiding it for common dimension counts.
A simple approach would be to track a single nint[] where it has rank pieces of data tracking the length of each dimension and then rank more pieces of data tracking the stride of each dimension. But, this still necessitates an allocation every time. The next best thing would be to track data inline for some common dimension counts, but this quickly grows the size of the TensorSpan and that can itself have negative impact due to the larger copies required when passing the data by value, it can also negatively impact the CPU cache if it grows too large.
As such, the optimal setup is likely to pick a limit that is representative of commonly encountered dimension counts and which is no larger than a single cache line (typically assumed to be 64 bytes).