-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid query generated for test with store_failures if file_format not specified #236
Comments
@ferdyh How are you connecting to Databricks? Are you using a SQL endpoint? The query that dbt-spark should actually be running here looks like: CREATE TABLE `tests_schema`.`table_to_test` AS
SELECT a.*
FROM `source_schema`.`table_to_test` a
WHERE column1 is null You can look in your debug-level logs ( I previously noticed (in #233) that either pyodbc or the Databricks endpoint seems to rewrite queries, in a way that sometimes results in invalid syntax (!). On a hunch, I managed to get it working by configuring tests to use the Delta file format: # dbt_project.yml
tests:
+file_format: delta
+store_failures: true When I did that, the queries weren't rewritten in a way that caused syntax errors. |
@jtcohen6 I checked the queries, and indeed that is the exact query logged, but that is not the query that databricks receives. It is however fixed if I add +file_format: delta (or parquet for that matter). I'm not using SQL Endpoint, but cluster. |
Oh, interesting! It sounds like, unless the file format is specified, Databricks rewrites tests in a way that may yield syntax errors. I think a reasonable fix would be to update the Is that a change you'd be interested in contributing? Is there any reason we shouldn't include |
Only thing I can think of is that since Databricks 8.3 delta is the default format for tables. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Describe the bug
When add as test with a specific query and use 'store_failures', then the resulting sql ran against Databricks will result in a duplicate column name 'AS'. The query as defined in the test file:
The resulting query run against Databricks:
Steps To Reproduce
Queries provided
Expected behavior
That the columns get a valid alias, or 'AS' keyword is removed.
Screenshots and log output
If applicable, add screenshots or log output to help explain your problem.
The output of
dbt --version
:Tried with 0.21 aswell.
The operating system you're using:
The output of
python --version
:Python 3.8.10
Additional context
I have an almost similar query (only as an additional where statement) that works fine.
The text was updated successfully, but these errors were encountered: