Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for -f command to run commands from files #120

Merged
merged 5 commits into from
Sep 9, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Sep 8, 2024

This is my (very tardy) downpayment to help dft

It adds a simple "run commands from a file" mode via the -f argument and adds some basic tests

Here is an example:

andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion-tui$ cat ~/Downloads/q.sql
show all
andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion-tui$ dft -f ~/Downloads/q.sql  | head
+-------------------------------------------------------------------------+---------------------------+
| name                                                                    | value                     |
+-------------------------------------------------------------------------+---------------------------+
| datafusion.catalog.create_default_catalog_and_schema                    | true                      |
| datafusion.catalog.default_catalog                                      | datafusion                |
| datafusion.catalog.default_schema                                       | public                    |
| datafusion.catalog.format                                               |                           |
| datafusion.catalog.has_header                                           | false                     |
| datafusion.catalog.information_schema                                   | false                     |
| datafusion.catalog.location                                             |                           |

Things I hope to do next:

  1. Add support for -c <sql string>
  2. Move the execution machinery into its own module (out of app) and start adding other integrations


DataFusion-tui provides an extensible terminal based data analysis tool that uses [DataFusion](https://github.com/apache/arrow-datafusion) as query execution engines. It has drawn inspiration and several features from `datafusion-cli`. In contrast to `datafusion-cli` a focus of `dft` is to provide an interface for leveraging DataFusions extensibility (for example connecting to `ObjectStore`s or querying custom `TableProvider`s).
`dft` provides two interfaces to the [DataFusion](https://github.com/apache/arrow-datafusion) query execution engine:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also took a shot at updating the description here to describe the two different UIs. Let me know what you think

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and I like the scripting option. Only comment from the updated description is that right now there isnt any MySQL integration.

@@ -300,32 +301,75 @@ pub async fn run_app(cli: cli::DftCli, state: state::AppState<'_>) -> Result<()>
info!("Running app with state: {:?}", state);
let mut app = App::new(state, cli.clone());

match &cli.command {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this just removes a seemingly unecessary level of wrapping of Clap argument parsing

@@ -0,0 +1,236 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here are a bunch of tests for the CLI

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably move some of these to a common or something similar test since it covers both the CLI and TUI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment I think it only covers the "CLI" as I have defined it (aka running dft without any interaction)

In terms of testing the TUI I do think that would be valuable too (like fire up the tui and send key strokes like q and x to it). I think there are some frameworks that could help here, but I haven't looked into them

What I was thinking is that we could use the "CLI" mode (aka batch mode) to test the various integrations like delta lake and other table providers and focus the UI testing on just the user interaction (rather than the details of execution)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense. we'll need to do some testing of the execution triggered by the UI but it definitely doesnt need to be concerned with details - more just making sure the results make it into app state.


/// run and execute SQL statements and commands from a file, against a context
/// with the given print options
pub async fn exec_from_file(ctx: &ExecutionContext, file: &Path) -> Result<()> {
Copy link
Contributor Author

@alamb alamb Sep 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "process commands from a file" code -- it is based on the code in datafusion-cli

@alamb alamb requested a review from matthewmturner September 8, 2024 17:23
Comment on lines +69 to +77
## `dft` CLI

The `dft` CLI is a scriptable engine for executing queries from files. It is used in a similar manner to `datafusion-cli` but with the added benefit of being able to query from multiple data sources.

For example you can run the contents of `query.sql` with

```shell
$ dft -f query.sql
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, love this addition

Comment on lines +64 to +66
- "Catalog File" support
- Save table definitions *and* data
- Save parquet metadata from remote object stores
Copy link
Collaborator

@matthewmturner matthewmturner Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful if we had an issue that provided more details on the aspirations here

Copy link
Contributor Author

@alamb alamb Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #122 and will link here

@@ -121,7 +121,8 @@ impl ExecutionContext {
&self.session_ctx
}

pub async fn execute_stream_sql(&mut self, query: &str) -> Result<()> {
/// Executes the specified query and prints the result to stdout
pub async fn execute_stream_sql(&self, query: &str) -> Result<()> {
let df = self.session_ctx.sql(query).await.unwrap();
let physical_plan = df.create_physical_plan().await.unwrap();
// We use small batch size because web socket stream comes in small increments (each
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to remove this comment

help = "Execute commands from file(s), then exit",
value_parser(parse_valid_file)
)]
pub file: Vec<PathBuf>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub file: Vec<PathBuf>,
pub files: Vec<PathBuf>,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call -- this was just copied from datafusion-cli. files is better

Comment on lines +30 to +32
dft provides a rich terminal UI as well as a broad array of pre-integrated
data sources, formats to make querying and analyzing data from
various sources.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dft provides a rich terminal UI as well as a broad array of pre-integrated
data sources, formats to make querying and analyzing data from
various sources.
dft provides a rich terminal UI as well as a broad array of pre-integrated
data sources and formats for querying and analyzing data from
various sources.

@@ -0,0 +1,236 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably move some of these to a common or something similar test since it covers both the CLI and TUI

@matthewmturner
Copy link
Collaborator

matthewmturner commented Sep 9, 2024

Thanks for the contribution Andrew! I think this is a great addition. I added some comments / feedback, none of which need to addressed now. I also tested the app locally with all features (with and without a config file) to ensure it still worked and it still ran as expected. Either one of us can decide to work on the comments in follow on PRs.

@matthewmturner matthewmturner merged commit 40b0901 into datafusion-contrib:main Sep 9, 2024
3 checks passed
@alamb alamb deleted the alamb/file branch September 9, 2024 12:57
@alamb alamb mentioned this pull request Sep 9, 2024
alamb added a commit to alamb/datafusion-dft that referenced this pull request Sep 9, 2024
@alamb
Copy link
Contributor Author

alamb commented Sep 9, 2024

Implemented follow ons in #123

matthewmturner pushed a commit that referenced this pull request Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants