-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add roadmap to readme #1616
Add roadmap to readme #1616
Conversation
Thank you, @alamb @houqp @xudong963 @yjshen @liukun4515 @hntd187 @realno @pjmore for contributions to roadmap. Ive created PR here adding roadmap to the datafusion readme. Let me know your thoughts :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @matthewmturner, nice work!
- Inclusion in Db-Benchmark with all quries covered | ||
- All TPCH queries covered |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these two duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my knowledge, no. I thought they were two independent benchmarks that we wanted to cover. However, I don't have much experience on the TPCH side / i've only been working on the db-benchmark solution.
I don't see TPCH mentioned on db-benchmark. Would you be able to expand on how you think they are duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check it after getting up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my mistake. I believe the Db-Benchmark you mentioned is datafusion/benches/
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The db-benchmark im referring to is getting datafusion included here https://h2oai.github.io/db-benchmark/
I've opened a PR there to get it added.
README.md
Outdated
### Performance Improvements | ||
|
||
- Predicate evaluation | ||
- Multi-column comparisons that can't be vectorized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't be
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Multi-column comparisons that can't be vectorized | |
- Improve multi-column comparisons (that can't be vectorized at the moment) |
README.md
Outdated
|
||
- Custom SQL support | ||
- Split DataFusion into multiple crates | ||
- Push based query execution and code gen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Push based query execution and code gen | |
- Push based query execution and codegen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think i will actually just write out the whole word
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both are ok
What is the difference between a new feature and extension? They both sound like kinda the same thing. Semantics aside this sounds good to me. |
Extension is specifically referring to topics that would be in |
@yahoNanJing FYI - if you want to add anything ballista related just let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @matthewmturner
README.md
Outdated
### Performance Improvements | ||
|
||
- Predicate evaluation | ||
- Multi-column comparisons that can't be vectorized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Multi-column comparisons that can't be vectorized | |
- Improve multi-column comparisons (that can't be vectorized at the moment) |
LGTM |
Thanks all who contributed to the roadmap -- would love to keep it as a living document |
@alamb i have a reminder for myself to refresh every 3 months. |
|
||
### DataFusion Core | ||
|
||
- Publish official Arrow2 branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alamb! I'm curious what "arrow2" means here. Is it related to https://github.com/jorgecarleitao/arrow2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's what it refers to and it has been completed. However it's not completely up to date with master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that is exactly what arrow2 means
There is a branch https://github.com/apache/arrow-datafusion/tree/arrow2 and a discussion ticket #1532 that has more information if you are interested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Which issue does this PR close?
Closes #1515
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?