Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix running examples readme #11225

Merged

Conversation

findepi
Copy link
Member

@findepi findepi commented Jul 2, 2024

Some examples are runnable from any place (e.g. csv_sql), but some expect a specific working directory (e.g. regexp). Running from datafusion-examples/examples is tested on CI so guaranteed to work, let's put this path in the README.

As a follow-up, we should look what it would take to make examples runnable directly from an IDE such as RustRover.

@alamb
Copy link
Contributor

alamb commented Jul 2, 2024

Some examples are runnable from any place (e.g. csv_sql), but some expect a specific working directory (e.g. regexp). Running from datafusion-examples/examples is tested on CI so guaranteed to work, let's put this path in the README.

As a follow-up, we should look what it would take to make examples runnable directly from an IDE such as RustRover.

Here is one idea I had -- to move the examples to the user guide: #11217. This ensures that the examples run,

However, I don't think this really solves the problem you are after

Maybe we could look to make them as self contained as possible (perhaps provide a self contained example of querying from record batches in memory) 🤔

As a relatively new user of DataFusion what would you find most helpful?

@findepi
Copy link
Member Author

findepi commented Jul 2, 2024

Maybe we could look to make them as self contained as possible (perhaps provide a self contained example of querying from record batches in memory) 🤔

that's a good idea.
but CSV APIs still deserve some examples, so the problem remains

to move the examples to the user guide: #11217.

Improving the docs is definitely a good idea and having examples in docs tested is so nice.
it has one downside -- it feels less approachable for a new contributor.
A contributor may want to run examples directly and be able to modify them, and I assume it's less easy to do when they are embedded in markdown.
(also don't know whether cargo's examples feature is important to keep)

As a relatively new user of DataFusion what would you find most helpful?

As a user, i would prefer to have working examples within docs.
As soon as a I clone the repository, so as a contributor, I would prefer to have examples runnable from my IDE.
I don't know whether it's possible to have both without duplicating the content.

@alamb
Copy link
Contributor

alamb commented Jul 2, 2024

As a user, i would prefer to have working examples within docs.

As soon as a I clone the repository, so as a contributor, I would prefer to have examples runnable from my IDE.
I don't know whether it's possible to have both without duplicating the content.

This division makes sense. Maybe we can move the "simple" examples to the user guide (e.g. that are less than 1 page long or something) and then we can leave the more complex eamples as datafusion-examples 🤔

For things like csv reading and dataframe aggregation, once you have the code in the IDE, perhaps those examples would be better directly in the doc as run doc examples

@findepi
Copy link
Member Author

findepi commented Jul 2, 2024

For things like csv reading and dataframe aggregation, once you have the code in the IDE, perhaps those examples would be better directly in the doc as run doc examples

Might be. I think it's good reasoning to table #11219 for now, I closed that PR. Thanks for feedback!

What do you think about this PR?
As long as we have directly runnable examples, we should ensure they work. CI and README don't agree on the current working directory. We can sync CI and README. Or we can assume cwd doesn't matter (and have CI prove that), which sounds more complex.

alamb
alamb previously approved these changes Jul 3, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion @findepi -- I am sorry it took so long to review this PR. I wanted to verify the commands locally.

I tried the existing commands and they seemed to work -- so maybe I am missing something.

@@ -19,7 +19,6 @@

set -ex
cd datafusion-examples/examples/
cargo fmt --all -- --check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this? I think it is intended to ensure that cargo fmt is run on the examples.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, and i see how this is not obvious. That's why it's separate commit with an explanation:

Remove doubled fmt check from CI

Examples' format is checked in `check-fmt` job, so can be skipped in
`rust_example.sh`.

@@ -35,6 +35,9 @@ cd datafusion
# Download test data
git submodule update --init

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this locally and the existing command (in the main datafusion directory) also seems to work fine

~/Software/datafusion2$ cargo run --example csv_sql

     Running `target/debug/examples/csv_sql`
+----+-----------------------------+-----------------------------+
| c1 | MIN(aggregate_test_100.c12) | MAX(aggregate_test_100.c12) |
+----+-----------------------------+-----------------------------+
| c  | 0.2667177795079635          | 0.991517828651004           |
| a  | 0.02182578039211991         | 0.9567595541247681          |
| b  | 0.04893135681998029         | 0.9185813970744787          |
| e  | 0.01479305307777301         | 0.9965400387585364          |
| d  | 0.061029375346466685        | 0.9748360509016578          |
+----+-----------------------------+-----------------------------+
+----+------+
| c2 | c3   |
+----+------+
| 1  | -85  |
| 3  | 13   |
| 4  | -38  |
| 4  | -54  |
| 5  | 36   |
| 1  | -25  |
| 5  | -31  |
| 2  | 45   |
| 3  | 13   |
| 3  | 17   |
| 4  | 65   |
| 4  | -101 |
| 2  | -48  |
| 1  | -56  |
| 1  | -5   |
| 3  | 14   |
| 1  | 83   |
| 3  | -12  |
| 3  | -72  |
| 2  | -43  |
| 5  | -101 |
+----+------+

It does also work in datafusion-examples/examples/ but cding into that directory seems unecessary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @alamb for testing this.

let me quote the commit message, maybe it explains my reasoning better:

...
Some examples are runnable from any place (e.g. `csv_sql`), but some
expect a specific working directory (e.g. `regexp`). Running from
`datafusion-examples/examples` is tested on CI so guaranteed to work,
let's put this path in the README.
...

It does also work in datafusion-examples/examples/ but cding into that directory seems unecessary

i agree.
It happened to be necessary for the first example i tried to run. maybe i was just unlucky

per the commit message

...
As a follow-up, we should look what it would take to make examples
runnable directly from an IDE such as RustRover.

-- I also am against cd-ing into directory as unnecessary and also want the examples to be runnable from an IDE.
If we want this to be the case, this validated by the CI scripts.

@alamb alamb dismissed their stale review July 3, 2024 20:49

clicked wrong button

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement and the explanations @findepi

I think this PR makes the code better so let's merge it in and keep iterating to make things better 🙏

@alamb
Copy link
Contributor

alamb commented Jul 5, 2024

This branch sadly has some merge conflicts that need to be resolved before we can merge it:

Screenshot 2024-07-05 at 9 06 29 AM

@comphead
Copy link
Contributor

comphead commented Jul 5, 2024

This branch sadly has some merge conflicts that need to be resolved before we can merge it:

There is no conflicted files though, hope rebase solves it @findepi

Some examples are runnable from any place (e.g. `csv_sql`), but some
expect a specific working directory (e.g. `regexp`). Running from
`datafusion-examples/examples` is tested on CI so guaranteed to work,
let's put this path in the README.

As a follow-up, we should look what it would take to make examples
runnable directly from an IDE such as RustRover.
Examples' format is checked in `check-fmt` job, so can be skipped in
`rust_example.sh`.
@findepi findepi force-pushed the findepi/fix-running-examples-readme-31ea9d branch from 7e5a8d0 to 5ddec76 Compare July 5, 2024 17:09
@alamb alamb merged commit 682fc05 into apache:main Jul 6, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 6, 2024

Thanks again @findepi

@findepi findepi deleted the findepi/fix-running-examples-readme-31ea9d branch July 6, 2024 15:17
findepi added a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* Fix running examples readme

Some examples are runnable from any place (e.g. `csv_sql`), but some
expect a specific working directory (e.g. `regexp`). Running from
`datafusion-examples/examples` is tested on CI so guaranteed to work,
let's put this path in the README.

As a follow-up, we should look what it would take to make examples
runnable directly from an IDE such as RustRover.

* Remove doubled fmt check from CI

Examples' format is checked in `check-fmt` job, so can be skipped in
`rust_example.sh`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants