-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOCS]: consolidate doc site content simplify navbar #5962
Conversation
@@ -50,22 +49,17 @@ community. | |||
|
|||
user-guide/introduction | |||
user-guide/example-usage | |||
user-guide/users |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consolidated the content of these pages into other pages
user-guide/cli | ||
user-guide/dataframe | ||
user-guide/expressions | ||
user-guide/sql/index | ||
user-guide/configs | ||
user-guide/faq | ||
Rust Crate Documentation <https://docs.rs/crate/datafusion/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was redundant with the crates.io link above
|
||
.. _toc.contributor-guide: | ||
|
||
.. toctree:: | ||
:maxdepth: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stops listing H2 headings ( ##
) on the main table of contents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope you don't mind but took the opportunity to review the docs and point out a few parts that are outdated/could be improved
|
||
```toml | ||
[dependencies] | ||
datafusion = "11.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump to latest here? (ditto for anywhere else version is mentioned)
|
||
## Create a main function | ||
|
||
Update the main.rs file with your first datafusion application based on [Example usage](https://arrow.apache.org/datafusion/user-guide/example-usage.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this self link to same page, this page?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is a good a catch. This whole page has some non trivial redundancy. I will try and fix it up
```rust | ||
use datafusion::prelude::*; | ||
|
||
#[tokio::main] | ||
async fn main() -> datafusion::error::Result<()> { | ||
// register the table | ||
let ctx = SessionContext::new(); | ||
ctx.register_csv("test", "<PATH_TO_YOUR_CSV_FILE>", CsvReadOptions::new()).await?; | ||
|
||
// create a plan to run a SQL query | ||
let df = ctx.sql("SELECT * FROM test").await?; | ||
|
||
// execute and print results | ||
df.show().await?; | ||
Ok(()) | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example feels kinda redundant compared with example code in above sections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree -- removed
docs/source/user-guide/faq.md
Outdated
Here is a comparison with similar projects that may help understand | ||
when DataFusion might be be suitable and unsuitable for your needs: | ||
|
||
- [DuckDB](http://www.duckdb.org) is an open source, in process analytic database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to https link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in c187863
docs/source/user-guide/faq.md
Outdated
- [Polars](http://pola.rs): Polars is one of the fastest DataFrame | ||
libraries at the time of writing. Like DataFusion, it is also | ||
written in Rust and uses the Apache Arrow memory model, but unlike | ||
DataFusion it does not provide SQL nor as many extension points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to https url
also i think polars might support sql now, according to their doc: https://pola-rs.github.io/polars-book/user-guide/sql.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think you are correct.
docs/source/user-guide/faq.md
Outdated
written in Rust and uses the Apache Arrow memory model, but unlike | ||
DataFusion it does not provide SQL nor as many extension points. | ||
|
||
- [Facebook Velox](https://engineering.fb.com/2022/08/31/open-source/velox/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could switch to github link: https://github.com/facebookincubator/velox since this link seems dead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing up this topic. I also left some unrelated comment 🙇♂️
[Crate Documentation], to keep it as close to the source as | ||
possible. | ||
[crates.io documentation], to keep it as close to the source as | ||
possible. You can find the most up to date version in the [source code]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about hosting the latest document generated from the source code on github pages (or other static page hoster)? Like greptimedb.rs which is generated from https://github.com/GreptimeTeam/greptimedb/deployments/activity_log?environment=github-pages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a great idea -- thank you @waynexia
I actually think if we could build those API docs as part of the https://github.com/apache/arrow-datafusion/blob/main/docs build, they would "automatically" get hosted on https://arrow.apache.org/datafusion/
https://arrow.apache.org/datafusion/ is published via some ASF mechanism that is similar to github pages
Specifically, this workflow
pushes to the https://github.com/apache/arrow-datafusion/tree/asf-site branch which then gets hosted via this magic yaml:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #5981
|
||
```toml | ||
[dependencies] | ||
datafusion = { version = "11.0" , features = ["simd"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also outdated. Wondering if there are someway to render code / file from github? So we needn't update this file every time but rather render our example codes. I find this but it looks only works inside github.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could update the script here to automatically clean it up: https://github.com/apache/arrow-datafusion/blob/main/dev/update_datafusion_versions.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I filed #5983
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Ok, I think this is better than what is on main so I will merge it in. We clearly have a ways to go to have wonderful docs |
* [DOCS]: consolidate doc site content simplify navbar * prettier * Update docs/source/user-guide/faq.md Co-authored-by: Ruihang Xia <waynestxia@gmail.com> * Update versions to latest * remove reundant example * update duckdb link and polars description * update velox link * prettier --------- Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Which issue does this PR close?
Closes #5935
Related to #1814 and #5501
Rationale for this change
The main page / index of the https://arrow.apache.org/datafusion/ site is somewhat disorganized, redundant and has so many entries it is causing issues such as #5935
Here is a screenshot of the current site:
![Screenshot 2023-04-11 at 7 20 00 AM](https://user-images.githubusercontent.com/490673/231146819-e9a80483-30ee-4667-993a-5eb49eb5ef4b.png)
What changes are included in this PR?
Are these changes tested?
I rendered the site locally and it looks better to me:
Are there any user-facing changes?