Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add roadmap to readme #1616

Merged
merged 3 commits into from
Jan 20, 2022
Merged

Add roadmap to readme #1616

merged 3 commits into from
Jan 20, 2022

Conversation

matthewmturner
Copy link
Contributor

Which issue does this PR close?

Closes #1515

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@matthewmturner
Copy link
Contributor Author

Thank you, @alamb @houqp @xudong963 @yjshen @liukun4515 @hntd187 @realno @pjmore for contributions to roadmap.

Ive created PR here adding roadmap to the datafusion readme.

Let me know your thoughts :)

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 19, 2022
Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @matthewmturner, nice work!

Comment on lines +157 to +158
- Inclusion in Db-Benchmark with all quries covered
- All TPCH queries covered
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these two duplicates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my knowledge, no. I thought they were two independent benchmarks that we wanted to cover. However, I don't have much experience on the TPCH side / i've only been working on the db-benchmark solution.

I don't see TPCH mentioned on db-benchmark. Would you be able to expand on how you think they are duplicates?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check it after getting up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my mistake. I believe the Db-Benchmark you mentioned is datafusion/benches/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The db-benchmark im referring to is getting datafusion included here https://h2oai.github.io/db-benchmark/

I've opened a PR there to get it added.

README.md Outdated
### Performance Improvements

- Predicate evaluation
- Multi-column comparisons that can't be vectorized
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't be ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Multi-column comparisons that can't be vectorized
- Improve multi-column comparisons (that can't be vectorized at the moment)

README.md Outdated

- Custom SQL support
- Split DataFusion into multiple crates
- Push based query execution and code gen
Copy link
Member

@xudong963 xudong963 Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Push based query execution and code gen
- Push based query execution and codegen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i will actually just write out the whole word

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both are ok

@hntd187
Copy link
Contributor

hntd187 commented Jan 19, 2022

What is the difference between a new feature and extension? They both sound like kinda the same thing. Semantics aside this sounds good to me.

@matthewmturner
Copy link
Contributor Author

matthewmturner commented Jan 19, 2022

What is the difference between a new feature and extension? They both sound like kinda the same thing. Semantics aside this sounds good to me.

Extension is specifically referring to topics that would be in datafusion-contrib as opposed to a new feature that is in the datafusion crate. I can make that more clear.

@matthewmturner
Copy link
Contributor Author

@yahoNanJing FYI - if you want to add anything ballista related just let me know.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README.md Outdated
### Performance Improvements

- Predicate evaluation
- Multi-column comparisons that can't be vectorized
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Multi-column comparisons that can't be vectorized
- Improve multi-column comparisons (that can't be vectorized at the moment)

@liukun4515
Copy link
Contributor

Thank you, @alamb @houqp @xudong963 @yjshen @liukun4515 @hntd187 @realno @pjmore for contributions to roadmap.

Ive created PR here adding roadmap to the datafusion readme.

Let me know your thoughts :)

LGTM

@alamb alamb merged commit d93cf79 into apache:master Jan 20, 2022
@alamb
Copy link
Contributor

alamb commented Jan 20, 2022

Thanks all who contributed to the roadmap -- would love to keep it as a living document

@matthewmturner
Copy link
Contributor Author

@alamb i have a reminder for myself to refresh every 3 months.


### DataFusion Core

- Publish official Arrow2 branch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alamb! I'm curious what "arrow2" means here. Is it related to https://github.com/jorgecarleitao/arrow2?

Copy link
Contributor Author

@matthewmturner matthewmturner Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's what it refers to and it has been completed. However it's not completely up to date with master.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is exactly what arrow2 means

There is a branch https://github.com/apache/arrow-datafusion/tree/arrow2 and a discussion ticket #1532 that has more information if you are interested

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

high level roadmap for Arrow / Datafusion
6 participants