Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support H2O groupby queries in OmnisciOnRay engine #1541

Closed
7 of 10 tasks
ienkovich opened this issue Jun 4, 2020 · 2 comments
Closed
7 of 10 tasks

Support H2O groupby queries in OmnisciOnRay engine #1541

ienkovich opened this issue Jun 4, 2020 · 2 comments
Assignees
Labels
HDK Related to HDK (OmniSci successor) engine or backend

Comments

@ienkovich
Copy link
Collaborator

ienkovich commented Jun 4, 2020

We need to cover aggregation API used in H2O groupby queries. Issue doesn't cover other features required for H2O benchmarks (type cast, data transfer etc.)

Current status:

@ienkovich ienkovich added Benchmarking 🏁 Issues and pull requests for evaluating runtime new feature/request 💬 Requests and pull requests for new features labels Jun 4, 2020
@ienkovich ienkovich self-assigned this Jun 4, 2020
@ienkovich ienkovich removed Benchmarking 🏁 Issues and pull requests for evaluating runtime new feature/request 💬 Requests and pull requests for new features labels Jun 4, 2020
@ienkovich ienkovich changed the title Run H2O groupby queries on Modin using OmnisciOnRay engine Support H2O groupby queries in OmnisciOnRay engine Jun 4, 2020
@ienkovich
Copy link
Collaborator Author

Current issues with remaining queries:

Query #6 - we need to implement std and median aggregators in OmniSci

Query #8 - we need to use window functions to implement groupby.head method. Unfortunately, OmniSci doesn't preserve rows order in partitions when window function is applied without order by. In our case we have sorted order prior groupby call. So, it is possible to translate, but requires sort_values implementation to become backend specific first.

Query #9 - here we have grouby.apply with lambda. Passed lambda uses unsupported corr aggregate. Also it returns new Series built on cell values extracted from corr result. With would be very challenging to support using lazy computations.

@ienkovich ienkovich added the HDK Related to HDK (OmniSci successor) engine or backend label Jun 18, 2020
@anmyachev
Copy link
Collaborator

@ienkovich is this still relevant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HDK Related to HDK (OmniSci successor) engine or backend
Projects
None yet
Development

No branches or pull requests

3 participants