Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS]: consolidate doc site content simplify navbar #5962

Merged
merged 9 commits into from
Apr 12, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Apr 11, 2023

Which issue does this PR close?

Closes #5935
Related to #1814 and #5501

Rationale for this change

The main page / index of the https://arrow.apache.org/datafusion/ site is somewhat disorganized, redundant and has so many entries it is causing issues such as #5935

Here is a screenshot of the current site:
Screenshot 2023-04-11 at 7 20 00 AM

What changes are included in this PR?

  1. Consolidate some top level pages into lower level pages ("improve the site navigation") to reduce the size of the initial index and organize the content better.
  2. Various small improvements

Are these changes tested?

I rendered the site locally and it looks better to me:

Screenshot 2023-04-11 at 7 22 45 AM

Are there any user-facing changes?

@@ -50,22 +49,17 @@ community.

user-guide/introduction
user-guide/example-usage
user-guide/users
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consolidated the content of these pages into other pages

user-guide/cli
user-guide/dataframe
user-guide/expressions
user-guide/sql/index
user-guide/configs
user-guide/faq
Rust Crate Documentation <https://docs.rs/crate/datafusion/>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was redundant with the crates.io link above


.. _toc.contributor-guide:

.. toctree::
:maxdepth: 2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stops listing H2 headings ( ##) on the main table of contents

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope you don't mind but took the opportunity to review the docs and point out a few parts that are outdated/could be improved


```toml
[dependencies]
datafusion = "11.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump to latest here? (ditto for anywhere else version is mentioned)


## Create a main function

Update the main.rs file with your first datafusion application based on [Example usage](https://arrow.apache.org/datafusion/user-guide/example-usage.html)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this self link to same page, this page?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a good a catch. This whole page has some non trivial redundancy. I will try and fix it up

Comment on lines 179 to 195
```rust
use datafusion::prelude::*;

#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
// register the table
let ctx = SessionContext::new();
ctx.register_csv("test", "<PATH_TO_YOUR_CSV_FILE>", CsvReadOptions::new()).await?;

// create a plan to run a SQL query
let df = ctx.sql("SELECT * FROM test").await?;

// execute and print results
df.show().await?;
Ok(())
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example feels kinda redundant compared with example code in above sections

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- removed

Here is a comparison with similar projects that may help understand
when DataFusion might be be suitable and unsuitable for your needs:

- [DuckDB](http://www.duckdb.org) is an open source, in process analytic database.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to https link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in c187863

Comment on lines 51 to 54
- [Polars](http://pola.rs): Polars is one of the fastest DataFrame
libraries at the time of writing. Like DataFusion, it is also
written in Rust and uses the Apache Arrow memory model, but unlike
DataFusion it does not provide SQL nor as many extension points.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to https url

also i think polars might support sql now, according to their doc: https://pola-rs.github.io/polars-book/user-guide/sql.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think you are correct.

written in Rust and uses the Apache Arrow memory model, but unlike
DataFusion it does not provide SQL nor as many extension points.

- [Facebook Velox](https://engineering.fb.com/2022/08/31/open-source/velox/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could switch to github link: https://github.com/facebookincubator/velox since this link seems dead

Copy link
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing up this topic. I also left some unrelated comment 🙇‍♂️

[Crate Documentation], to keep it as close to the source as
possible.
[crates.io documentation], to keep it as close to the source as
possible. You can find the most up to date version in the [source code].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about hosting the latest document generated from the source code on github pages (or other static page hoster)? Like greptimedb.rs which is generated from https://github.com/GreptimeTeam/greptimedb/deployments/activity_log?environment=github-pages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a great idea -- thank you @waynexia

I actually think if we could build those API docs as part of the https://github.com/apache/arrow-datafusion/blob/main/docs build, they would "automatically" get hosted on https://arrow.apache.org/datafusion/

https://arrow.apache.org/datafusion/ is published via some ASF mechanism that is similar to github pages

Specifically, this workflow

https://github.com/apache/arrow-datafusion/blob/388f9ec3e7f7c09dac56ee0fe074ca97a6af9d44/.github/workflows/docs.yaml#L12-L64

pushes to the https://github.com/apache/arrow-datafusion/tree/asf-site branch which then gets hosted via this magic yaml:

https://github.com/apache/arrow-datafusion/blob/388f9ec3e7f7c09dac56ee0fe074ca97a6af9d44/.asf.yaml#L48-L52

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #5981

docs/source/user-guide/faq.md Outdated Show resolved Hide resolved

```toml
[dependencies]
datafusion = { version = "11.0" , features = ["simd"]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also outdated. Wondering if there are someway to render code / file from github? So we needn't update this file every time but rather render our example codes. I find this but it looks only works inside github.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could update the script here to automatically clean it up: https://github.com/apache/arrow-datafusion/blob/main/dev/update_datafusion_versions.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I filed #5983

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
@alamb
Copy link
Contributor Author

alamb commented Apr 12, 2023

Hope you don't mind but took the opportunity to review the docs and point out a few parts that are outdated/could be improved

Not at all - thank you @Jefffrey and thank you @waynexia

@alamb
Copy link
Contributor Author

alamb commented Apr 12, 2023

Ok, I think this is better than what is on main so I will merge it in. We clearly have a ways to go to have wonderful docs

@alamb alamb merged commit 4c7833e into apache:main Apr 12, 2023
@alamb alamb deleted the alamb/fix_doc_sidebar branch April 12, 2023 20:22
korowa pushed a commit to korowa/arrow-datafusion that referenced this pull request Apr 13, 2023
* [DOCS]: consolidate doc site content simplify navbar

* prettier

* Update docs/source/user-guide/faq.md

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

* Update versions to latest

* remove reundant example

* update duckdb link and polars description

* update velox link

* prettier

---------

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

doc site sidebar icon is glitched/bugged
3 participants