Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1751] Config to optionally skip population of relation cache #6526

Closed
Tracked by #7162 ...
jtcohen6 opened this issue Jan 5, 2023 · 1 comment · Fixed by #7307
Closed
Tracked by #7162 ...

[CT-1751] Config to optionally skip population of relation cache #6526

jtcohen6 opened this issue Jan 5, 2023 · 1 comment · Fixed by #7307
Assignees
Labels
adapter_caching Issues related to the adapter's relation cache enhancement New feature or request Team:Adapters Issues designated for the adapter area of the code

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jan 5, 2023

Users should have the ability to turn off relation cache population, if they really need to. It should still be "on" by default.

$ dbt --no-populate-cache [run|test|...]
$ DBT_POPULATE_CACHE=0|1 dbt [run|test|...]
# profiles.yml (UserConfig)
config:
  populate_cache: true|false

This is different from entirely disabling or skipping over the cache — we're just skipping the population of the cache on startup. When dbt needs to run caching queries, I think it should still report "cache miss," and then cache the result of the metadata query, if it needs to be used again in the same invocation.

Where

def populate_adapter_cache(self, adapter, required_schemas: Set[BaseRelation] = None):
start_populate_cache = time.perf_counter()
if flags.CACHE_SELECTED_ONLY is True:
adapter.set_relations_cache(self.manifest, required_schemas=required_schemas)
else:
adapter.set_relations_cache(self.manifest)
cache_populate_time = time.perf_counter() - start_populate_cache
if dbt.tracking.active_user is not None:
dbt.tracking.track_runnable_timing(
{"adapter_cache_construction_elapsed": cache_populate_time}
)
def before_run(self, adapter, selected_uids: AbstractSet[str]):
with adapter.connection_named("master"):
self.populate_adapter_cache(adapter)

Who is it for?

YMMV

End users will need to experiment with the approach that's most efficient for them, between:

  • full cache enabled
  • "cache selected only" (docs)
  • --no-populate-cache

I expect mileage may vary between dev, CI, and prod environments.

Questions

Would this break behavior around --defer, which expects to use the relation cache to determine if model X already exists in the dev schema, or should have its reference rewritten to use the schema defined in the other manifest?

not adapter.get_relation(current.database, current.schema, current.identifier)

Imagining a future where interactive compile/preview want to be both very fast, and able to correctly leverage --defer: We should also think more about making the adapter cache pluggable, as something that can live & persist outside of a single dbt-core invocation. It would be the responsibility of that other application wrapping dbt-core to handle cache invalidation (fun!).

@jtcohen6 jtcohen6 added enhancement New feature or request Team:Execution Team:Adapters Issues designated for the adapter area of the code adapter_caching Issues related to the adapter's relation cache labels Jan 5, 2023
@github-actions github-actions bot changed the title Config to optionally skip population of relation cache [CT-1751] Config to optionally skip population of relation cache Jan 5, 2023
@AnotherGuitar
Copy link

We're using the dbt-databricks adaptor and would be very interested in seeing this feature get developed because compile times are taking a while for us. Thanks for writing up this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapter_caching Issues related to the adapter's relation cache enhancement New feature or request Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants