Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting started guide for new users (who want to use DataFusion in their project) #7014

Open
6 of 8 tasks
alamb opened this issue Jul 18, 2023 · 4 comments
Open
6 of 8 tasks
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Jul 18, 2023

Is your feature request related to a problem or challenge?

If we want to have DataFusion used as the core of many new systems, we need it to be as easy as possible for someone to get their idea working on top of DataFusion.

The current user guide I think helps setup the basics of the project and get a "hello world" style program going but then kind of leave the reader in a "now what" type situation: https://arrow.apache.org/datafusion/user-guide/example-usage.html

Describe the solution you'd like

I would like a document, perhaps similar in style to the polars user guide: https://pola-rs.github.io/polars-book/user-guide/

This User Guide is an introduction to the Polars DataFrame library. Its goal is to introduce you to Polars by going through examples and comparing it to other solutions. Some design choices are introduced here. The guide will also introduce you to optimal usage of Polars.

Basically I am thinking of something that would have helped @bubbajoe get up to speed

The examples directory holds a bunch of examples: https://github.com/apache/arrow-datafusion/tree/main/datafusion-examples

Potential outline:

Describe alternatives you've considered

No response

Additional context

This idea was suggested by @MrPowers

@alamb alamb added enhancement New feature or request good first issue Good for newcomers labels Jul 18, 2023
@alamb
Copy link
Contributor Author

alamb commented Jul 18, 2023

If someone wanted to help out the DataFusion project helping with this one would be awesome. A good first step would be to make the skeleton of the topics above in https://github.com/apache/arrow-datafusion/tree/main/docs and leave placeholder text (like "Coming Soon")

Then we can work together on writing the content in a few different PRs

@MrPowers
Copy link

This sounds great, really excited!

We'll either want two user guides or one user guide that's half in Python / half in Rust.

I guess that 99% of the users that want to query data via an API will want to do so in SQL / Python. The Python DataFrame user guide is way more important than the Rust one.

Users leveraging DataFusion to build tools for other engines (e.g. delta-rs) are much more likely to be using Rust.

Perhaps we divide the documentation as follows:

I don't think we should invest in building out the DataFusion Rust DataFrame API docs yet because it's a lower ROI activity. We should build a URL structure that allows for this however.

@alamb
Copy link
Contributor Author

alamb commented Jul 19, 2023

The Python DataFrame user guide is way more important than the Rust one.

I agree this is more important for "end users" rather than developers who are building with Rust

Perhaps we divide the documentation as follows:

That sounds great -- I filed apache/datafusion-python#432 to track the work for the python bindings

@alamb
Copy link
Contributor Author

alamb commented Aug 16, 2023

I filed a bunch of tickets for follow on work and update the description of this ticket
#7302
#7304
#7305
#7306
#7307
#7308

@alamb alamb removed the devrel label Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants