You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So instead of a nested [][]K we have a flat slice. This matches the memory representation the kernel expects 1:1 and opens up the possibility for zero copy optimisations later. (We currently do copies due to alignment issues, and might have to do that in the future as well.) Unfortunately it's also very confusing to write code that deals with this, both as a user and when maintaining the library.
Before the next release we should investigate how much more overhead [][]K would be.
The text was updated successfully, but these errors were encountered:
I don't see how we can both make the API efficient and at the same time preserve the nestedness of the type. What we'd like is a slice of arrays like
[][PossibleCPU]T
but of course PossibleCPU changes at runtime and therefore can't be an array. Ultimately the right answer might be to hide this behind MapIterator or similar.
The proposed API in #1192 for batch per-CPU operations is:
So instead of a nested
[][]K
we have a flat slice. This matches the memory representation the kernel expects 1:1 and opens up the possibility for zero copy optimisations later. (We currently do copies due to alignment issues, and might have to do that in the future as well.) Unfortunately it's also very confusing to write code that deals with this, both as a user and when maintaining the library.Before the next release we should investigate how much more overhead
[][]K
would be.The text was updated successfully, but these errors were encountered: