[CT-1751] Config to optionally skip population of relation cache #6526

jtcohen6 · 2023-01-05T12:00:20Z

Users should have the ability to turn off relation cache population, if they really need to. It should still be "on" by default.

$ dbt --no-populate-cache [run|test|...]
$ DBT_POPULATE_CACHE=0|1 dbt [run|test|...]

# profiles.yml (UserConfig)
config:
  populate_cache: true|false

This is different from entirely disabling or skipping over the cache — we're just skipping the population of the cache on startup. When dbt needs to run caching queries, I think it should still report "cache miss," and then cache the result of the metadata query, if it needs to be used again in the same invocation.

Where

dbt-core/core/dbt/task/runnable.py

Lines 407 to 421 in 5453840

    
           def populate_adapter_cache(self, adapter, required_schemas: Set[BaseRelation] = None): 
        
               start_populate_cache = time.perf_counter() 
        
               if flags.CACHE_SELECTED_ONLY is True: 
        
                   adapter.set_relations_cache(self.manifest, required_schemas=required_schemas) 
        
               else: 
        
                   adapter.set_relations_cache(self.manifest) 
        
               cache_populate_time = time.perf_counter() - start_populate_cache 
        
               if dbt.tracking.active_user is not None: 
        
                   dbt.tracking.track_runnable_timing( 
        
                       {"adapter_cache_construction_elapsed": cache_populate_time} 
        
                   ) 
        
           def before_run(self, adapter, selected_uids: AbstractSet[str]): 
        
               with adapter.connection_named("master"): 
        
                   self.populate_adapter_cache(adapter)

Who is it for?

Users in large projects, or of data warehouses that are quite slow at running metadata queries.
Users running into inexplicable issues with specific relations not showing up in the relation cache ([CT-1333] [Bug] adapter.get_relation() does not find relation and no way to disable relation cache in dbt run --select #6050)
Interactive compile & preview ([CT-1584] [Feature] New top level commands: interactive compile #6358, [CT-1585] [Feature] New top level commands: interactive preview #6359), which need to be blazing-fast

YMMV

End users will need to experiment with the approach that's most efficient for them, between:

full cache enabled
"cache selected only" (docs)
--no-populate-cache

I expect mileage may vary between dev, CI, and prod environments.

Questions

Would this break behavior around --defer, which expects to use the relation cache to determine if model X already exists in the dev schema, or should have its reference rewritten to use the schema defined in the other manifest?

dbt-core/core/dbt/contracts/graph/manifest.py

Line 1018 in 5453840

not adapter.get_relation(current.database, current.schema, current.identifier)

Imagining a future where interactive compile/preview want to be both very fast, and able to correctly leverage --defer: We should also think more about making the adapter cache pluggable, as something that can live & persist outside of a single dbt-core invocation. It would be the responsibility of that other application wrapping dbt-core to handle cache invalidation (fun!).

The text was updated successfully, but these errors were encountered:

AnotherGuitar · 2023-03-23T23:16:43Z

We're using the dbt-databricks adaptor and would be very interested in seeing this feature get developed because compile times are taking a while for us. Thanks for writing up this issue!

jtcohen6 added enhancement New feature or request Team:Execution Team:Adapters Issues designated for the adapter area of the code adapter_caching Issues related to the adapter's relation cache labels Jan 5, 2023

github-actions bot changed the title ~~Config to optionally skip population of relation cache~~ [CT-1751] Config to optionally skip population of relation cache Jan 5, 2023

jtcohen6 mentioned this issue Jan 5, 2023

dbt compile runs in 7+ minutes databricks/dbt-databricks#243

Closed

dot2dotseurat mentioned this issue Jan 6, 2023

Allow for local compile of incremental models. duneanalytics/spellbook#2403

Merged

This was referenced Mar 13, 2023

[CT-1581] [Epic] dbt-core as a library: first steps #6356

Closed

[CT-2301] [Epic] API-ification: outstanding tasks for v1.5 #7162

Closed

leahwicz assigned stu-k Mar 27, 2023

This was referenced Apr 10, 2023

compile inline query doesn't add node #7292

Closed

Add option to skip relation cache population #7307

Merged

stu-k closed this as completed in #7307 Apr 11, 2023

jtcohen6 mentioned this issue Apr 17, 2023

--populate-cache/--no-populate-cache config dbt-labs/docs.getdbt.com#3207

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-1751] Config to optionally skip population of relation cache #6526

[CT-1751] Config to optionally skip population of relation cache #6526

jtcohen6 commented Jan 5, 2023

AnotherGuitar commented Mar 23, 2023

[CT-1751] Config to optionally skip population of relation cache #6526

[CT-1751] Config to optionally skip population of relation cache #6526

Comments

jtcohen6 commented Jan 5, 2023

Where

Who is it for?

YMMV

Questions

AnotherGuitar commented Mar 23, 2023