-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: CTAS on multiple statements #12188
Conversation
1817053
to
f795874
Compare
Codecov Report
@@ Coverage Diff @@
## master #12188 +/- ##
==========================================
- Coverage 66.61% 63.51% -3.11%
==========================================
Files 994 484 -510
Lines 49079 29771 -19308
Branches 4982 0 -4982
==========================================
- Hits 32695 18908 -13787
+ Misses 16254 10863 -5391
+ Partials 130 0 -130
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for a verbose description, made this very easy to review 👍 This will surely be appreciated by advanced SQL Lab users. Minor non-blocking comment, LGTM!
if ( | ||
query.select_as_cta | ||
and query.ctas_method == CtasMethod.TABLE | ||
and not parsed_query.is_valid_ctas() | ||
): | ||
raise SqlLabException( | ||
_( | ||
"CTAS (create table as select) can only be run with a query where " | ||
"the last statement is a SELECT. Please make sure your query has " | ||
"a SELECT as its last statement. Then, try running your query again." | ||
) | ||
) | ||
if ( | ||
query.select_as_cta | ||
and query.ctas_method == CtasMethod.VIEW | ||
and not parsed_query.is_valid_cvas() | ||
): | ||
raise SqlLabException( | ||
_( | ||
"CVAS (create view as select) can only be run with a query with " | ||
"a single SELECT statement. Please make sure your query has only " | ||
"a SELECT statement. Then, try running your query again." | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the error messages imply a deep understanding of how is_valid_ctas()
and is_valid_cvas()
work, I wonder if these should be moved into ParsedQuery
as something like assert_is_valid_ctas()
and assert_is_valid_cvas()
and have the error messages there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, great point, @villebro! I'll move the exception message closer to the checks.
What would be an example use case for using CTAS ? |
If you mean "why materialize a dataset?", the main reason is to create a summary and/or richer dataset to speed subsequent queries against this snapshot If you mean "why using the Superset CTAS feature over simply writing your own |
+1 to this comment, we are actively leveraging it
|
* WIP * Add unit tests for sql_parse * Add unit tests for sql_lab (cherry picked from commit 164db3e)
SUMMARY
The current behavior for running CTAS (create table as select) in SQL Lab is to check for each statement if it's a
SELECT
, and if so prepend the query withCREATE TABLE $table_name AS $query
. For example, if we have this query:And we run CTAS by passing the table name
my_table
we get this query:The problem is that for a query with multiple statements we run CTAS for each one. For example:
When run with CTAS and
my_table
will produce these 2 queries:The first query runs successfully, but the second fails because the table already exists.
This PR fixes the CTAS behavior, making it more consistent with how multiple statements work in SQL Lab. It changes SQL Lab so that:
SELECT
. This allows user to pre-process the data before loading it into the table. A simple example in MySQL would be:When run in a CTAS, this will create a table with the column
foo
with a row with the value 42, as expected.In Hive, a common pattern is to write queries like this:
Which should also work.
SELECT
. Before, there was no differentiation between CTAS and CVAS (other than the query generated).BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A
TEST PLAN
Tested manually, added unit tests.
ADDITIONAL INFORMATION