Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order by/ Sort #174

Open
domoritz opened this issue Apr 25, 2014 · 15 comments
Open

Order by/ Sort #174

domoritz opened this issue Apr 25, 2014 · 15 comments
Assignees

Comments

@domoritz
Copy link
Member

MyriaL should support sorting. We have support for sorted scans and an in memory order by. Later, we probably want a sort merge join.

@dhalperi
Copy link
Member

(sort-merge join is not a language feature, but rather a rule in our eventual optimizer.)

@domoritz
Copy link
Member Author

But somebody has to write the optimizer rule for that. This issue is for language support and optimizer support.

@7andrew7
Copy link
Contributor

It isn't clear whether this refers to ORDER BY (a language feature) or sort-merge join (an optimizer feature). These should be two separate issues.

@domoritz
Copy link
Member Author

ORDER BY because we don't have a merge consumer in myria yet.

@senderista senderista self-assigned this May 22, 2015
@senderista
Copy link
Contributor

Distributed merge can lead to deadlock for multiple producers/multiple consumers: see discussion in http://dl.acm.org/citation.cfm?id=152611. There are various workarounds discussed in that reference.

@senderista
Copy link
Contributor

We already have a global Merge consumer that can run on the coordinator, along with local MergeJoin and InMemoryOrderBy operators. We would need to push sorts down to Postgres for data that didn't fit in memory, but we may already have what we need to support ORDER BY explicitly in MyriaL. The Raco and Myria algebras already support ORDER BY (there is a rewrite rule to convert the Raco OrderBy logical operator to the Myria MyriaInMemoryOrderBy physical operator), but there is no syntax support in the MyriaL parser. We should prioritize implementing ORDER BY in MyriaL based on user needs and limitations of the supporting operators.

@senderista
Copy link
Contributor

We also need to implement a rewrite rule to decompose a global OrderBy logical operator into a global Merge of local OrderBy operators, somewhat akin to the DecomposeGroupBy rule (do we also need a new Merge logical operator?). We should also have a rewrite rule for ORDER BY + LIMIT, so the global Merge only fetches the top k results from each local OrderBy (the existing CollectBeforeLimit rule doesn't seem to preserve order in the final output).

@domoritz
Copy link
Member Author

I totally don't remember writing uwescience/myria@555907f. Anyway, note that this is not a merge consumer! So it won't merge the inputs from children but only the inputs from different operators.

@senderista
Copy link
Contributor

Right, so you're saying we need a proper exchange operator for Merge? I'll open an issue for that, thanks!

@senderista
Copy link
Contributor

@bmyerz would appreciate feedback on adding a Merge operator to the logical algebra (possibly unnecessary since it only exists to physically implement global logical OrderBy).

@billhowe
Copy link
Contributor

I'd prefer not to see a merge operator at the logical level.

Is the use case here top k? That's a reasonable logical operator.

In fact, I'd go further and claim logical order by is always unnecessary
except at the root, just like in SQL, except for plans involving stateful
apply.

(SQL gets around that by using a separate order by clause specifically for
window functions.)

On Monday, May 9, 2016, Tobin Baker notifications@github.com wrote:

@bmyerz https://github.com/bmyerz would appreciate feedback on adding a
Merge operator to the logical algebra (possibly unnecessary since it only
exists to physically implement global logical OrderBy).


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#174 (comment)

@bmyerz
Copy link
Member

bmyerz commented May 11, 2016

@senderista I agree with Bill on this

@senderista
Copy link
Contributor

@billhowe @bmyerz OK, sounds like we just need a new Myria Merge physical operator (corresponding to the unimplemented Merge exchange operator) targeted by a rewrite rule which decomposes logical OrderBy into Merge + physical (local) OrderBy?

Also, can we go ahead and push logical OrderBy into SQL even in the absence of ORDER BY support in MyriaL? That would allow us to use Postgres indexes to optimize local sorting (and may be the only feasible alternative when local data is too large for the InMemoryOrderBy operator). Since this will pretty much be a requirement for efficiently supporting ORDER BY in MyriaL, why not get it implemented and tested first?

@bmyerz
Copy link
Member

bmyerz commented May 11, 2016

@senderista Ya, we can do the local OrderBy (and insert a Merge so that the plan is consistent with logical OrderBy).

  • add Merge to physical algebra
  • add order by translation to Merge(Order by(...
  • implement order by push down into sql
  • add order by to MyriaL

@senderista
Copy link
Contributor

Something @mbalazin pointed out is that for top-k queries with small k, we can just run local top-k queries at each worker (possibly pushing them into Postgres), and sort the combined results in memory at the coordinator using the existing InMemoryOrderBy operator. That should be a pretty simple rewrite rule in Raco, and would eliminate the dependency on a global Merge operator for the top-k scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants