Distributed Indexes #18

shashi · 2020-08-16T22:55:25Z

For indexing distributed arrays, there are two invariants to think of:

A DArray is a chain of subarrays
The subarrays are on other processes

Maybe the indexes are to be kept on the processes in the interest of reducing communication.

You can assume there is a way to do distributed sort and sortperm.

What should be stored on the master process such that it knows enough to dispatch groupby and join operations efficiently?

Unrelated question: Can we use these operations to implement distributed arrays itself? (e.g. matrix multiply is a groupjoin on the remote references to the subarrays)

Thoughts?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed Indexes #18

Distributed Indexes #18

shashi commented Aug 16, 2020

Distributed Indexes #18

Distributed Indexes #18

Comments

shashi commented Aug 16, 2020