Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1161] snapshot tables are being recreated from scratch #5824

Closed
2 tasks done
deugene opened this issue Sep 13, 2022 · 3 comments
Closed
2 tasks done

[CT-1161] snapshot tables are being recreated from scratch #5824

deugene opened this issue Sep 13, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@deugene
Copy link

deugene commented Sep 13, 2022

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Hi! Recently we discovered a problem with dbt snapshots. We use dbt-core v1.2.1 and databricks as storage and from time to time, some of our snapshot tables are being recreated from scratch for no obvious reason. We checked data schemas - columns and their data types haven't changed. After some research, we discovered that there were two concurrent runs exactly at the time when tables were recreated. One of the runs finished successfully (presumably the one which recreated the tables):

�[0m15:22:53 Running with dbt=1.2.1
�[0m15:22:54 Unable to do partial parsing because config vars, config profile, or config target have changed
�[0m15:22:54 Unable to do partial parsing because env vars used in profiles.yml have changed
�[0m15:23:05 Found 173 models, 233 tests, 3 snapshots, 0 analyses, 769 macros, 0 operations, 0 seed files, 336 sources, 0 exposures, 0 metrics
�[0m15:23:05
�[0m15:23:09 Concurrency: 2 threads (target='galaxy_vets')
�[0m15:23:09
...
�[0m15:23:58 49 of 288 START snapshot dbt.todos_snapshot [RUN]
�[0m15:24:13 49 of 288 OK snapshotted dbt.todos_snapshot [�[32mOK�[0m in 14.79s]

And the other one failed with the error:

�[0m15:22:49 Running with dbt=1.2.1
�[0m15:22:49 Unable to do partial parsing because config vars, config profile, or config target have changed
�[0m15:22:49 Unable to do partial parsing because env vars used in profiles.yml have changed
�[0m15:23:01 Found 173 models, 233 tests, 3 snapshots, 0 analyses, 769 macros, 0 operations, 0 seed files, 336 sources, 0 exposures, 0 metrics
�[0m15:23:01
�[0m15:23:09 Concurrency: 2 threads (target='galaxy_vets')
�[0m15:23:09
...
�[0m15:23:59 72 of 258 START snapshot dbt.todos_snapshot [RUN]
�[0m15:24:19 72 of 258 ERROR snapshotting dbt.todos_snapshot [�[31mERROR�[0m in 20.51s]
...
�[0m15:37:38 �[33mRuntime Error in snapshot todos_snapshot (snapshots/todos_snapshot.sql)�[0m
�[0m15:37:38 The metadata of the Delta table has been changed by a concurrent update. Please try the operation again.
�[0m15:37:38 Conflicting commit: {"timestamp":1662737044597,"operation":"CREATE OR REPLACE TABLE AS SELECT","operationParameters":{"isManaged":false,"description":null,"partitionBy":[],"properties":{}},"readVersion":5744,"isolationLevel":"WriteSerializable","isBlindAppend":false,"operationMetrics":{"numFiles":"2","numOutputRows":"2576","numOutputBytes":"291672"},"engineInfo":"Databricks-Runtime/10.4.x-scala2.12","txnId":"d3ef0304-4912-41b1-a3c7-54cd87c8e2e6"}
�[0m15:37:38 Refer to https://docs.gcp.databricks.com/delta/concurrency-control.html for more details.
�[0m15:37:38

Expected Behavior

update snapshot tables without recreation.

Steps To Reproduce

Concurrently run the same snapshots with dbt-core v1.2.1 and databricks as a database

Relevant log output

No response

Environment

- OS: Debian Bullseye (python:3.8-slim docker image)
- Python: 3.8.14
- dbt: 1.2.1

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

we use the latest dbt-databricks adapter

@deugene deugene added bug Something isn't working triage labels Sep 13, 2022
@github-actions github-actions bot changed the title snapshot tables are being recreated from scratch [CT-1161] snapshot tables are being recreated from scratch Sep 13, 2022
@dbeatty10 dbeatty10 self-assigned this Sep 13, 2022
@dbeatty10
Copy link
Contributor

Thanks for opening @deugene !

We expect only one dbt invocation at a time (for a given set of database objects).

Are you intentionally triggering multiple concurrent dbt snapshots? If so, could you tell me more about the use case why?

@deugene
Copy link
Author

deugene commented Sep 14, 2022

Hi, @dbeatty10! Thanks for the answer.

We have partitioned jobs that extract and load data hourly, daily, weekly, etc. After each job we run corresponding dbt models (we achieved this by adding tags hourly, daily, weekly, etc. to models) but in some cases daily and hourly models may have refs to the same model/snapshot, and when we run dbt build --select @tag:hourly and dbt build --select @tag:daily both of these builds might try to run the same upstream models or snapshots.

@dbeatty10
Copy link
Contributor

Thanks for describing your use-case @deugene -- I think I understand it from high level.

The key to resolving this for you will be to avoid concurrent dbt materializations.

The most direct way to achieve this is by having only a single invocation of dbt at a time. Assuming you have tags for daily, and hourly, you would do the following once per day:
dbt build --select @tag:daily @tag:hourly

While that is running, you would want to make sure to not do the following:
dbt build --select @tag:hourly

The crucial insight is to only do one dbt build at a time.

I'm closing this for now since we intentionally don't support concurrent dbt builds. If I missed an important piece about this issue, we can re-open it. If you want to discuss the merits of concurrent dbt runs, then I'd suggest opening a discussion.

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Sep 14, 2022
@dbeatty10 dbeatty10 removed the triage label Sep 14, 2022
@dbeatty10 dbeatty10 removed their assignment Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants