Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Indexes #18

Open
shashi opened this issue Aug 16, 2020 · 0 comments
Open

Distributed Indexes #18

shashi opened this issue Aug 16, 2020 · 0 comments

Comments

@shashi
Copy link

shashi commented Aug 16, 2020

For indexing distributed arrays, there are two invariants to think of:

  1. A DArray is a chain of subarrays
  2. The subarrays are on other processes

Maybe the indexes are to be kept on the processes in the interest of reducing communication.

You can assume there is a way to do distributed sort and sortperm.

What should be stored on the master process such that it knows enough to dispatch groupby and join operations efficiently?

Unrelated question: Can we use these operations to implement distributed arrays itself? (e.g. matrix multiply is a groupjoin on the remote references to the subarrays)

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant