Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUPER SLOW dbt run -m model_name #556

Closed
ashwintastic opened this issue Jan 18, 2024 · 11 comments
Closed

SUPER SLOW dbt run -m model_name #556

ashwintastic opened this issue Jan 18, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@ashwintastic
Copy link

Describe the bug

A clear and concise description of what the bug is. What command did you run? What happened?

I’m running dbt run -m model_name , which is super slow when I checked the logs dbt is checking each and every tables, views in that schema by running show extended table.

Earlier in version 1.6 it was responding faster

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

dbt run -m model_name

Expected behavior

A clear and concise description of what you expected to happen.
It should pick the model asap, but it is checking for each table, views in that schema describe table

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.


   --------------------------------LOGS---------------------------------------
   
   
   ============================== 22:47:58.530684 | d8769778-6f77-45ce-8fa3-2564bbaf55fb 
   ==============================
   22:47:58.530684 [info ] [MainThread]: Running with dbt=1.7.4
   22:47:58.533295 [debug] [MainThread]: running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager',    'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None',    'version_check': 'True', 'fail_fast': 'False', 'log_path': 'some_path\logs', 'debug': 'False', 'profiles_dir': 'some_path',    'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'warn_error_options':    'WarnErrorOptions(include=[], exclude=[])', 'invocation_command': "dbt run -m some_model --vars {'param1': 'xxxx',    'param2':'yyyyy', 'param3': 'xyxyxyx' }", 'log_format': 'default', 'introspect': 'True', 'target_path': 'None', 'static_parser':    'True', 'send_anonymous_usage_stats': 'True'}
   22:37:21.670827 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'project_id', 'label':    'da6d902d-ca85-448d-a17c-23c11405b822', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at    0x000001A5A639DE10>]}
   22:37:21.958659 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label':    'da6d902d-ca85-448d-a17c-23c11405b822', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at    0x000001A5B71DD4B0>]}
   22:37:21.960659 [info ] [MainThread]: Registered adapter: databricks=1.7.3
   22:37:22.018129 [debug] [MainThread]: checksum: d3f38ad46d5643a4a6fd32ce6a88e70ccb2066f8630b1f5ec7fec51d1271723b, vars:    {'source_dataset': 'SOME_SOURCE',
   'param2': 'xxxxxxxxxxxxxxxxxxx',
   'param1': 'XYZ'}, profile: , target: , version: 1.7.4
   
   22:37:22.912043 [debug] [MainThread]: Partial parsing enabled: 0 files deleted, 0 files added, 1 files changed.
   22:37:22.918044 [debug] [MainThread]: Partial parsing: updated file: databricks://macros\SOME_SCHEMA\futuristic_columns.sql
   22:37:24.115807 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label':    'da6d902d-ca85-448d-a17c-23c11405b822', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at    0x000001A5B7E640D0>]}
   22:37:24.509945 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label':    'da6d902d-ca85-448d-a17c-23c11405b822', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at    0x000001A5B7364280>]}
   22:37:24.512072 [info ] [MainThread]: Found 110 models, 16 snapshots, 501 tests, 16 sources, 0 exposures, 0 metrics, 688 macros,    0 groups, 0 semantic models
   22:37:24.513947 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label':    'da6d902d-ca85-448d-a17c-23c11405b822', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at    0x000001A5B73642E0>]}
   22:37:24.536996 [info ] [MainThread]:
   22:37:24.539946 [debug] [MainThread]: Databricks adapter: Creating DatabricksDBTConnection. name: master, thread: (11688, 24468),    compute: 22:37:24.541971 [debug] [MainThread]: Acquiring new databricks connection 'master' 22:37:24.544117 [debug] [MainThread]:    Databricks adapter: DatabricksDBTConnection._acquire: name: master, thread: (11688, 24468), compute:, acquire_release_count: 0,    idle time: 0s
   22:37:24.545945 [debug] [MainThread]: Databricks adapter: Thread (11688, 24468) using default compute resource.
   22:37:24.549979 [debug] [ThreadPool]: Databricks adapter: Creating DatabricksDBTConnection. name: list_hive_metastore, thread: (   11688, 5936), compute: 22:37:24.551945 [debug] [ThreadPool]: Acquiring new databricks connection 'list_hive_metastore'    22:37:24.552982 [debug] [ThreadPool]: Databricks adapter: DatabricksDBTConnection._acquire: name: list_hive_metastore, thread: (   11688, 5936), compute:, acquire_release_count: 0, idle time: 0s
   22:37:24.554949 [debug] [ThreadPool]: Databricks adapter: Thread (11688, 5936) using default compute resource.
   22:37:24.555980 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore"
   22:37:24.557169 [debug] [ThreadPool]: On list_hive_metastore: GetSchemas(database=hive_metastore, schema=None)
   22:37:24.558000 [debug] [ThreadPool]: Opening a new connection, currently in state init
   22:37:28.841685 [debug] [ThreadPool]: SQL status: OK in 4.28000020980835 seconds
   22:37:28.854641 [debug] [ThreadPool]: Databricks adapter: DatabricksDBTConnection._release: name: list_hive_metastore, thread: (   11688, 5936), compute: , acquire_release_count: 1, idle time: 0s 22:37:28.892143 [debug] [ThreadPool]: Databricks adapter:    Creating DatabricksDBTConnection. name: list_hive_metastore_SOME_SCHEMA, thread: (11688, 23084), compute: 
   22:37:28.895142 [debug] [ThreadPool]: Acquiring new databricks connection 'list_hive_metastore_SOME_SCHEMA'
   22:37:28.898143 [debug] [ThreadPool]: Databricks adapter: DatabricksDBTConnection._acquire: name:    list_hive_metastore_SOME_SCHEMA, thread: (11688, 23084), compute: ``, acquire_release_count: 0, idle time: 0s
   22:37:28.900141 [debug] [ThreadPool]: Databricks adapter: Thread (11688, 23084) using default compute resource.
   22:37:28.927617 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_SCHEMA"
   22:37:28.933000 [debug] [ThreadPool]: On list_hive_metastore_SOME_SCHEMA: GetTables(database=hive_metastore, schema=SOME_SCHEMA,    identifier=None)
   22:37:28.935875 [debug] [ThreadPool]: Opening a new connection, currently in state init
   
   22:37:31.564096 [debug] [ThreadPool]: SQL status: OK in 2.630000114440918 seconds
   22:37:31.719021 [debug] [ThreadPool]: Spark adapter: NotImplemented: add_begin_query
   22:37:31.722029 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_SCHEMA"
   22:37:31.725030 [debug] [ThreadPool]: On list_hive_metastore_SOME_SCHEMA: /* {"app": "dbt", "dbt_version": "1.7.4",    "dbt_databricks_version": "1.7.3", "databricks_sql_connector_version": "2.9.3", "profile_name": "some_profile", "target_name":    "some_env", "connection_name": "list_hive_metastore_SOME_SCHEMA"} */
   
   select current_catalog()
   22:37:32.260311 [debug] [ThreadPool]: SQL status: OK in 0.5299999713897705 seconds
   22:37:32.296242 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_SCHEMA"
   22:37:32.299069 [debug] [ThreadPool]: On list_hive_metastore_SOME_SCHEMA: /* {"app": "dbt", "dbt_version": "1.7.4",    "dbt_databricks_version": "1.7.3", "databricks_sql_connector_version": "2.9.3", "profile_name": "some_profile", "target_name":    "some_env", "connection_name": "list_hive_metastore_SOME_SCHEMA"} */
   show views in hive_metastore.SOME_SCHEMA
   
   22:37:32.938768 [debug] [ThreadPool]: SQL status: OK in 0.6399999856948853 seconds
   22:37:32.980341 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_SCHEMA"
   22:37:32.983068 [debug] [ThreadPool]: On list_hive_metastore_SOME_SCHEMA: /* {"app": "dbt", "dbt_version": "1.7.4",    "dbt_databricks_version": "1.7.3", "databricks_sql_connector_version": "2.9.3", "profile_name": "some_profile", "target_name":    "some_env", "connection_name": "list_hive_metastore_SOME_SCHEMA"} /
   show table extended in hive_metastore.SOME_SCHEMA like ''
   22:38:12.162673 [debug] [ThreadPool]: Databricks adapter: Reusing DatabricksDBTConnection. name:    list_hive_metastore_SOME_O_SCHEMA, thread: (11688, 23084), compute: , acquire_release_count: 0, idle time: 0.07942080497741699s    22:38:12.212333 [debug] [ThreadPool]: Databricks adapter: DatabricksDBTConnection._acquire: name:    list_hive_metastore_SOME_O_SCHEMA, thread: (11688, 23084), compute: , acquire_release_count: 0, idle time: 0.12878203392028809s
   22:38:12.256643 [debug] [ThreadPool]: Databricks adapter: Thread (11688, 23084) using default compute resource.
   22:38:12.342730 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_O_SCHEMA"
   22:38:12.369688 [debug] [ThreadPool]: On list_hive_metastore_SOME_O_SCHEMA: GetTables(database=hive_metastore,    schema=SOME_O_SCHEMA, identifier=None)
   22:38:14.369630 [debug] [ThreadPool]: SQL status: OK in 2.0 seconds
   22:38:14.532526 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_O_SCHEMA"
   22:38:14.534479 [debug] [ThreadPool]: On list_hive_metastore_SOME_O_SCHEMA: /* {"app": "dbt", "dbt_version": "1.7.4",    "dbt_databricks_version": "1.7.3", "databricks_sql_connector_version": "2.9.3", "profile_name": "data_transformation",    "target_name": "some_env", "connection_name": "list_hive_metastore_SOME_O_SCHEMA"} */
   
   select current_catalog()
   22:38:15.058400 [debug] [ThreadPool]: SQL status: OK in 0.5199999809265137 seconds
   22:38:15.072412 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_O_SCHEMA"
   22:38:15.075356 [debug] [ThreadPool]: On list_hive_metastore_SOME_O_SCHEMA: /* {"app": "dbt", "dbt_version": "1.7.4",    "dbt_databricks_version": "1.7.3", "databricks_sql_connector_version": "2.9.3", "profile_name": "data_transformation",    "target_name": "some_env", "connection_name": "list_hive_metastore_SOME_O_SCHEMA"} */
   show views in hive_metastore.SOME_O_SCHEMA
   
   22:38:15.934159 [debug] [ThreadPool]: SQL status: OK in 0.8600000143051147 seconds
   22:38:15.952715 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_O_SCHEMA"
   22:38:15.955722 [debug] [ThreadPool]: On list_hive_metastore_SOME_O_SCHEMA: /* {"app": "dbt", "dbt_version": "1.7.4",    "dbt_databricks_version": "1.7.3", "databricks_sql_connector_version": "2.9.3", "profile_name": "data_transformation",    "target_name": "some_env", "connection_name": "list_hive_metastore_SOME_O_SCHEMA"} /
   show table extended in hive_metastore.SOME_O_SCHEMA like ''

System information

The output of dbt --version:

Core:
  - installed: 1.7.4
  - latest:    1.7.4 - Up to date!

Plugins:
  - databricks: 1.7.3 - Up to date!
  - spark:      1.7.1 - Up to date!
  pip 23.3.2 

The operating system you're using:
Wndows 11

The output of python --version:
Python 3.10.6

Additional context

Add any other context about the problem here.

@ashwintastic ashwintastic added the bug Something isn't working label Jan 18, 2024
@benc-db
Copy link
Collaborator

benc-db commented Jan 18, 2024

On list_hive_metastore_SOME_SCHEMA: /* {"app": "dbt", "dbt_version": "1.7.4", "dbt_databricks_version": "1.7.3", "databricks_sql_connector_version": "2.9.3", "profile_name": "some_profile", "target_name": "some_env", "connection_name": "list_hive_metastore_SOME_SCHEMA"} /
show table extended in hive_metastore.SOME_SCHEMA like ''

This looks like the culprit. Thanks for the report.

@benc-db
Copy link
Collaborator

benc-db commented Jan 19, 2024

@ashwintastic can you try latest main and let me know if it improves the situation?

@josephberni
Copy link
Contributor

Curious as to if you see any improvements here @ashwintastic, I've noticed a similar problem where it takes around 3 mins to run a 4 sec model. I've attempted to use 1.7.7 also and the issue is much the same sadly.

@benc-db
Copy link
Collaborator

benc-db commented Feb 8, 2024

One core issue is that dbt enumerates every table in the schema just in case. For Databricks, metadata operations like this are pretty slow. What I did could only improve performance if you are on hive rather than UC.

@josephberni
Copy link
Contributor

One core issue is that dbt enumerates every table in the schema just in case. For Databricks, metadata operations like this are pretty slow. What I did could only improve performance if you are on hive rather than UC.

Interesting. Yeah I am also using hive not UC but sadly didn't see any improvements.

@benc-db
Copy link
Collaborator

benc-db commented Feb 8, 2024

Sorry to hear. We have an on-going conversation with dbt about how we might fix this in the future. On a large project this start up cost may not be noticeable, but it can be very annoying if you want to update a single model :/.

@daniel-gomez-mng
Copy link

Hello guys! I'm very interested in this topic. We have a use case where we run every dbt model in an Airflow task, so for every run we need to wait 1-2 minutes in order to discover the catalog. I guess it should have a reason, but our models are very slow and expensive for that. Do you have any news on how this behavior is going to improve?

Is this better with Unity Catalog?

There are some other related discussions about it in dbt-spark dbt-labs/dbt-spark#228 (comment)

@ashwintastic
Copy link
Author

I can see performance improvement using dbt=1.7.9 adapter: databricks=1.7.9. No more slowness issue. Thank you for the support

@AlanCortesK
Copy link

I can see performance improvement using dbt=1.7.9 adapter: databricks=1.7.9. No more slowness issue. Thank you for the support

What is your databricks_sql_connector_version, I installed dbt=1.7.9 adapter: databricks=1.7.9 and still having the same performance issue

@benc-db
Copy link
Collaborator

benc-db commented May 9, 2024

It depends on whether you are using UC or HMS, and how many tables you have in your schemas. As mentioned above, I'm in on-going conversations with dbt on how to improve targeted model performance. @jtcohen6 for viz.

@randypitcherii
Copy link

Hey! Any luck improving this, especially for UC uses?

If the metadata calls are expensive, could we hit something like information_schema to get this information instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants