-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.8.0 not able to detect existing tables anymore #691
Comments
@thijs-nijhuis-shell can you email me your artifacts? I have not seen any report like this, and I use incremental models with 1.8.x daily. ben.cassell@databricks.com |
@benc-db , I think I found the problem. If I look into the dbt.log I see this query somewhere at the top: select
table_name,
if(table_type in ('EXTERNAL', 'MANAGED', 'MANAGED_SHALLOW_CLONE'), 'table', lower(table_type)) as table_type,
lower(data_source_format) as file_format,
table_owner
from `catalog_name`.`information_schema`.`tables`
where table_schema = 'schema_name' I assume that one is used to fetch all the existing objects all at once. But when I run that command I get no results and I think the reason is similar to this discussion and caused by the fact that our catalog was renamed at some point. When I change the query into this, I do see all the objects: select
table_name,
if(table_type in ('EXTERNAL', 'MANAGED', 'MANAGED_SHALLOW_CLONE'), 'table', lower(table_type)) as table_type,
lower(data_source_format) as file_format,
table_owner
from system.`information_schema`.`tables`
where table_catalog = 'catalog_name' and table_schema = 'schema_name' |
Interesting. Would you mind filing a ticket through your company's databricks contact about this issue with information schema, giving the context about how it's breaking dbt? I can code a fallback like I did for MV, but it would be more performant if Databricks just fixed the information_schema behavior to work as expected. |
Describe the bug
Dbt-databricks 1.8.0 doesn't seem to be able to determine if a table already exists anymore. I have noticed this with both seeds and incremental models.
Seeds
We have a seed in our project called 'country'. When running
dbt build --select country
it succeeds if the country table is not created yet. But the second time you run it, it fails with this error messages:When I reinstall dbt-databricks 1.7.14, I can run the command over and over again and the incremental models do create merge statements.
I saw there was a fix with rerunning seeds in 1.8.1 but that doesn't solve it for me. Also, we don't have 'persist_doc' set, nor do we have a description for this seed.
Incremental
Our incremental models are always run as a 'create or replace' statement instead of a merge after upgrading tot 1.8.0. I see that locally in my target\run folder. I also see it on UC when running 'describe history catalog.schema.table_name' where all recent chnages are 'create or replace' instead of 'merge'.
Steps To Reproduce
Add a seed to the project called country.csv. Use dbt_project.yml to set its target catalog and schema (not sure if that is required). Run
dbt build --select country
or simplydbt seed
; this should work. Then run the same command again and it should fail.For the incremental models, simply create a model, set it to incremental and add this line:
{{ log("For model '"~model.name~"' is_incremental() is set to '"~is_incremental()~"'", True) }}
. The second time the model is run, 'is_incremental()' should be 'True' but it is not.Expected behavior
Automated detection if the table already exists.
Screenshots and log output
See error output above.
System information
The output of
dbt --version
:The operating system you're using:
Windows 11 enterprise
The output of
python --version
:Python 3.11.4
Additional context
I tried to debug the dbt-databricks seeds materialization locally. This line yields 'None' for me where I would expect it to get the relation if the table exists. Weirdly enough, the used parameters (database, schema and identifier) all have the correct value.
I see the same thing happening when running from a Databricks workflow. It uses a job cluster so it will get a fresh install of dbt-databricks 1.8.1 on each job run. No other packages installed.
The text was updated successfully, but these errors were encountered: