SUPER SLOW dbt run -m model_name #556

ashwintastic · 2024-01-18T05:12:07Z

Describe the bug

A clear and concise description of what the bug is. What command did you run? What happened?

I’m running dbt run -m model_name , which is super slow when I checked the logs dbt is checking each and every tables, views in that schema by running show extended table.

Earlier in version 1.6 it was responding faster

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

dbt run -m model_name

Expected behavior

A clear and concise description of what you expected to happen.
It should pick the model asap, but it is checking for each table, views in that schema describe table

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.


   --------------------------------LOGS---------------------------------------
   
   
   ============================== 22:47:58.530684 | d8769778-6f77-45ce-8fa3-2564bbaf55fb 
   ==============================
   22:47:58.530684 [info ] [MainThread]: Running with dbt=1.7.4
   22:47:58.533295 [debug] [MainThread]: running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager',    'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None',    'version_check': 'True', 'fail_fast': 'False', 'log_path': 'some_path\logs', 'debug': 'False', 'profiles_dir': 'some_path',    'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'warn_error_options':    'WarnErrorOptions(include=[], exclude=[])', 'invocation_command': "dbt run -m some_model --vars {'param1': 'xxxx',    'param2':'yyyyy', 'param3': 'xyxyxyx' }", 'log_format': 'default', 'introspect': 'True', 'target_path': 'None', 'static_parser':    'True', 'send_anonymous_usage_stats': 'True'}
   22:37:21.670827 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'project_id', 'label':    'da6d902d-ca85-448d-a17c-23c11405b822', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at    0x000001A5A639DE10>]}
   22:37:21.958659 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label':    'da6d902d-ca85-448d-a17c-23c11405b822', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at    0x000001A5B71DD4B0>]}
   22:37:21.960659 [info ] [MainThread]: Registered adapter: databricks=1.7.3
   22:37:22.018129 [debug] [MainThread]: checksum: d3f38ad46d5643a4a6fd32ce6a88e70ccb2066f8630b1f5ec7fec51d1271723b, vars:    {'source_dataset': 'SOME_SOURCE',
   'param2': 'xxxxxxxxxxxxxxxxxxx',
   'param1': 'XYZ'}, profile: , target: , version: 1.7.4
   
   22:37:22.912043 [debug] [MainThread]: Partial parsing enabled: 0 files deleted, 0 files added, 1 files changed.
   22:37:22.918044 [debug] [MainThread]: Partial parsing: updated file: databricks://macros\SOME_SCHEMA\futuristic_columns.sql
   22:37:24.115807 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label':    'da6d902d-ca85-448d-a17c-23c11405b822', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at    0x000001A5B7E640D0>]}
   22:37:24.509945 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label':    'da6d902d-ca85-448d-a17c-23c11405b822', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at    0x000001A5B7364280>]}
   22:37:24.512072 [info ] [MainThread]: Found 110 models, 16 snapshots, 501 tests, 16 sources, 0 exposures, 0 metrics, 688 macros,    0 groups, 0 semantic models
   22:37:24.513947 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label':    'da6d902d-ca85-448d-a17c-23c11405b822', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at    0x000001A5B73642E0>]}
   22:37:24.536996 [info ] [MainThread]:
   22:37:24.539946 [debug] [MainThread]: Databricks adapter: Creating DatabricksDBTConnection. name: master, thread: (11688, 24468),    compute: 22:37:24.541971 [debug] [MainThread]: Acquiring new databricks connection 'master' 22:37:24.544117 [debug] [MainThread]:    Databricks adapter: DatabricksDBTConnection._acquire: name: master, thread: (11688, 24468), compute:, acquire_release_count: 0,    idle time: 0s
   22:37:24.545945 [debug] [MainThread]: Databricks adapter: Thread (11688, 24468) using default compute resource.
   22:37:24.549979 [debug] [ThreadPool]: Databricks adapter: Creating DatabricksDBTConnection. name: list_hive_metastore, thread: (   11688, 5936), compute: 22:37:24.551945 [debug] [ThreadPool]: Acquiring new databricks connection 'list_hive_metastore'    22:37:24.552982 [debug] [ThreadPool]: Databricks adapter: DatabricksDBTConnection._acquire: name: list_hive_metastore, thread: (   11688, 5936), compute:, acquire_release_count: 0, idle time: 0s
   22:37:24.554949 [debug] [ThreadPool]: Databricks adapter: Thread (11688, 5936) using default compute resource.
   22:37:24.555980 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore"
   22:37:24.557169 [debug] [ThreadPool]: On list_hive_metastore: GetSchemas(database=hive_metastore, schema=None)
   22:37:24.558000 [debug] [ThreadPool]: Opening a new connection, currently in state init
   22:37:28.841685 [debug] [ThreadPool]: SQL status: OK in 4.28000020980835 seconds
   22:37:28.854641 [debug] [ThreadPool]: Databricks adapter: DatabricksDBTConnection._release: name: list_hive_metastore, thread: (   11688, 5936), compute: , acquire_release_count: 1, idle time: 0s 22:37:28.892143 [debug] [ThreadPool]: Databricks adapter:    Creating DatabricksDBTConnection. name: list_hive_metastore_SOME_SCHEMA, thread: (11688, 23084), compute: 
   22:37:28.895142 [debug] [ThreadPool]: Acquiring new databricks connection 'list_hive_metastore_SOME_SCHEMA'
   22:37:28.898143 [debug] [ThreadPool]: Databricks adapter: DatabricksDBTConnection._acquire: name:    list_hive_metastore_SOME_SCHEMA, thread: (11688, 23084), compute: ``, acquire_release_count: 0, idle time: 0s
   22:37:28.900141 [debug] [ThreadPool]: Databricks adapter: Thread (11688, 23084) using default compute resource.
   22:37:28.927617 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_SCHEMA"
   22:37:28.933000 [debug] [ThreadPool]: On list_hive_metastore_SOME_SCHEMA: GetTables(database=hive_metastore, schema=SOME_SCHEMA,    identifier=None)
   22:37:28.935875 [debug] [ThreadPool]: Opening a new connection, currently in state init
   
   22:37:31.564096 [debug] [ThreadPool]: SQL status: OK in 2.630000114440918 seconds
   22:37:31.719021 [debug] [ThreadPool]: Spark adapter: NotImplemented: add_begin_query
   22:37:31.722029 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_SCHEMA"
   22:37:31.725030 [debug] [ThreadPool]: On list_hive_metastore_SOME_SCHEMA: /* {"app": "dbt", "dbt_version": "1.7.4",    "dbt_databricks_version": "1.7.3", "databricks_sql_connector_version": "2.9.3", "profile_name": "some_profile", "target_name":    "some_env", "connection_name": "list_hive_metastore_SOME_SCHEMA"} */
   
   select current_catalog()
   22:37:32.260311 [debug] [ThreadPool]: SQL status: OK in 0.5299999713897705 seconds
   22:37:32.296242 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_SCHEMA"
   22:37:32.299069 [debug] [ThreadPool]: On list_hive_metastore_SOME_SCHEMA: /* {"app": "dbt", "dbt_version": "1.7.4",    "dbt_databricks_version": "1.7.3", "databricks_sql_connector_version": "2.9.3", "profile_name": "some_profile", "target_name":    "some_env", "connection_name": "list_hive_metastore_SOME_SCHEMA"} */
   show views in hive_metastore.SOME_SCHEMA
   
   22:37:32.938768 [debug] [ThreadPool]: SQL status: OK in 0.6399999856948853 seconds
   22:37:32.980341 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_SCHEMA"
   22:37:32.983068 [debug] [ThreadPool]: On list_hive_metastore_SOME_SCHEMA: /* {"app": "dbt", "dbt_version": "1.7.4",    "dbt_databricks_version": "1.7.3", "databricks_sql_connector_version": "2.9.3", "profile_name": "some_profile", "target_name":    "some_env", "connection_name": "list_hive_metastore_SOME_SCHEMA"} /
   show table extended in hive_metastore.SOME_SCHEMA like ''
   22:38:12.162673 [debug] [ThreadPool]: Databricks adapter: Reusing DatabricksDBTConnection. name:    list_hive_metastore_SOME_O_SCHEMA, thread: (11688, 23084), compute: , acquire_release_count: 0, idle time: 0.07942080497741699s    22:38:12.212333 [debug] [ThreadPool]: Databricks adapter: DatabricksDBTConnection._acquire: name:    list_hive_metastore_SOME_O_SCHEMA, thread: (11688, 23084), compute: , acquire_release_count: 0, idle time: 0.12878203392028809s
   22:38:12.256643 [debug] [ThreadPool]: Databricks adapter: Thread (11688, 23084) using default compute resource.
   22:38:12.342730 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_O_SCHEMA"
   22:38:12.369688 [debug] [ThreadPool]: On list_hive_metastore_SOME_O_SCHEMA: GetTables(database=hive_metastore,    schema=SOME_O_SCHEMA, identifier=None)
   22:38:14.369630 [debug] [ThreadPool]: SQL status: OK in 2.0 seconds
   22:38:14.532526 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_O_SCHEMA"
   22:38:14.534479 [debug] [ThreadPool]: On list_hive_metastore_SOME_O_SCHEMA: /* {"app": "dbt", "dbt_version": "1.7.4",    "dbt_databricks_version": "1.7.3", "databricks_sql_connector_version": "2.9.3", "profile_name": "data_transformation",    "target_name": "some_env", "connection_name": "list_hive_metastore_SOME_O_SCHEMA"} */
   
   select current_catalog()
   22:38:15.058400 [debug] [ThreadPool]: SQL status: OK in 0.5199999809265137 seconds
   22:38:15.072412 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_O_SCHEMA"
   22:38:15.075356 [debug] [ThreadPool]: On list_hive_metastore_SOME_O_SCHEMA: /* {"app": "dbt", "dbt_version": "1.7.4",    "dbt_databricks_version": "1.7.3", "databricks_sql_connector_version": "2.9.3", "profile_name": "data_transformation",    "target_name": "some_env", "connection_name": "list_hive_metastore_SOME_O_SCHEMA"} */
   show views in hive_metastore.SOME_O_SCHEMA
   
   22:38:15.934159 [debug] [ThreadPool]: SQL status: OK in 0.8600000143051147 seconds
   22:38:15.952715 [debug] [ThreadPool]: Using databricks connection "list_hive_metastore_SOME_O_SCHEMA"
   22:38:15.955722 [debug] [ThreadPool]: On list_hive_metastore_SOME_O_SCHEMA: /* {"app": "dbt", "dbt_version": "1.7.4",    "dbt_databricks_version": "1.7.3", "databricks_sql_connector_version": "2.9.3", "profile_name": "data_transformation",    "target_name": "some_env", "connection_name": "list_hive_metastore_SOME_O_SCHEMA"} /
   show table extended in hive_metastore.SOME_O_SCHEMA like ''

System information

The output of dbt --version:

Core:
  - installed: 1.7.4
  - latest:    1.7.4 - Up to date!

Plugins:
  - databricks: 1.7.3 - Up to date!
  - spark:      1.7.1 - Up to date!
  pip 23.3.2

The operating system you're using:
Wndows 11

The output of python --version:
Python 3.10.6

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

benc-db · 2024-01-18T17:18:24Z

On list_hive_metastore_SOME_SCHEMA: /* {"app": "dbt", "dbt_version": "1.7.4", "dbt_databricks_version": "1.7.3", "databricks_sql_connector_version": "2.9.3", "profile_name": "some_profile", "target_name": "some_env", "connection_name": "list_hive_metastore_SOME_SCHEMA"} /
show table extended in hive_metastore.SOME_SCHEMA like ''

This looks like the culprit. Thanks for the report.

benc-db · 2024-01-19T17:45:27Z

@ashwintastic can you try latest main and let me know if it improves the situation?

josephberni · 2024-02-08T10:50:23Z

Curious as to if you see any improvements here @ashwintastic, I've noticed a similar problem where it takes around 3 mins to run a 4 sec model. I've attempted to use 1.7.7 also and the issue is much the same sadly.

benc-db · 2024-02-08T16:41:10Z

One core issue is that dbt enumerates every table in the schema just in case. For Databricks, metadata operations like this are pretty slow. What I did could only improve performance if you are on hive rather than UC.

josephberni · 2024-02-08T16:54:36Z

One core issue is that dbt enumerates every table in the schema just in case. For Databricks, metadata operations like this are pretty slow. What I did could only improve performance if you are on hive rather than UC.

Interesting. Yeah I am also using hive not UC but sadly didn't see any improvements.

benc-db · 2024-02-08T16:57:15Z

Sorry to hear. We have an on-going conversation with dbt about how we might fix this in the future. On a large project this start up cost may not be noticeable, but it can be very annoying if you want to update a single model :/.

daniel-gomez-mng · 2024-02-19T13:22:16Z

Hello guys! I'm very interested in this topic. We have a use case where we run every dbt model in an Airflow task, so for every run we need to wait 1-2 minutes in order to discover the catalog. I guess it should have a reason, but our models are very slow and expensive for that. Do you have any news on how this behavior is going to improve?

Is this better with Unity Catalog?

There are some other related discussions about it in dbt-spark dbt-labs/dbt-spark#228 (comment)

ashwintastic · 2024-03-07T12:11:19Z

I can see performance improvement using dbt=1.7.9 adapter: databricks=1.7.9. No more slowness issue. Thank you for the support

AlanCortesK · 2024-05-09T15:28:00Z

I can see performance improvement using dbt=1.7.9 adapter: databricks=1.7.9. No more slowness issue. Thank you for the support

What is your databricks_sql_connector_version, I installed dbt=1.7.9 adapter: databricks=1.7.9 and still having the same performance issue

benc-db · 2024-05-09T16:19:06Z

It depends on whether you are using UC or HMS, and how many tables you have in your schemas. As mentioned above, I'm in on-going conversations with dbt on how to improve targeted model performance. @jtcohen6 for viz.

randypitcherii · 2024-09-19T20:20:11Z

Hey! Any luck improving this, especially for UC uses?

If the metadata calls are expensive, could we hit something like information_schema to get this information instead?

ashwintastic added the bug Something isn't working label Jan 18, 2024

benc-db mentioned this issue Jan 18, 2024

Improve hive list relations behavior #557

Merged

3 tasks

dbeatty10 mentioned this issue Jan 18, 2024

[CT-3565] SUPER SLOW dbt run -m model_name dbt-labs/dbt-core#9402

Closed

2 tasks

dbeatty10 mentioned this issue Jan 19, 2024

SUPER SLOW dbt run -m model_name dbt-labs/docs.getdbt.com#4762

Closed

ashwintastic closed this as completed Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SUPER SLOW dbt run -m model_name #556

SUPER SLOW dbt run -m model_name #556

ashwintastic commented Jan 18, 2024

benc-db commented Jan 18, 2024

benc-db commented Jan 19, 2024

josephberni commented Feb 8, 2024

benc-db commented Feb 8, 2024

josephberni commented Feb 8, 2024

benc-db commented Feb 8, 2024

daniel-gomez-mng commented Feb 19, 2024

ashwintastic commented Mar 7, 2024

AlanCortesK commented May 9, 2024

benc-db commented May 9, 2024

randypitcherii commented Sep 19, 2024

SUPER SLOW dbt run -m model_name #556

SUPER SLOW dbt run -m model_name #556

Comments

ashwintastic commented Jan 18, 2024

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

Additional context

benc-db commented Jan 18, 2024

benc-db commented Jan 19, 2024

josephberni commented Feb 8, 2024

benc-db commented Feb 8, 2024

josephberni commented Feb 8, 2024

benc-db commented Feb 8, 2024

daniel-gomez-mng commented Feb 19, 2024

ashwintastic commented Mar 7, 2024

AlanCortesK commented May 9, 2024

benc-db commented May 9, 2024

randypitcherii commented Sep 19, 2024