Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot override dbt default materializations #1962

Closed
1 of 5 tasks
r-hempenstall opened this issue Nov 27, 2019 · 3 comments · Fixed by #1976
Closed
1 of 5 tasks

Cannot override dbt default materializations #1962

r-hempenstall opened this issue Nov 27, 2019 · 3 comments · Fixed by #1976
Labels
bug Something isn't working
Milestone

Comments

@r-hempenstall
Copy link

Describe the bug

I created a customised version of the incremental materialization. I then tried to use that materialization for one of my models and it didn't work -- dbt used the default incremental materialization still. This seems to be the case with other materializations as well.

Steps To Reproduce

Create a fresh dbt project backed by BigQuery using dbt init. Next create a materialization to override one of the default dbt materializations; I used the one below.

{% materialization incremental, adapter='bigquery' -%}

  {{ log('USING MY IMPLEMENTATION', info=True) }}

{%- endmaterialization %}

I then changed the default model to the following.

{{ config(materialized='incremental') }}

select 1 as id

This would be expected to log USING MY IMPLEMENTATION to my terminal and then exit with an error but when I run dbt run I get a nice log output ending with Completed Successfully. If I change the name of the materialization to my_incremental in both files then I get the log line and error message as I would expect.

Expected behavior

I would expect that my materialization would be used rather than dbt's if they both have the same name. In the example above, that would give me the log line and error message that I expected.

Screenshots and log output

The output I expected when I overrode the incremental materialization.

Running with dbt=0.15.0
Found 1 model, 0 tests, 0 snapshots, 0 analyses, 134 macros, 0 operations, 0 seed files, 0 sources

08:23:20 | Concurrency: 4 threads (target='dev')
08:23:20 | 
08:23:20 | 1 of 1 START my_table model dbt_ryan.my_first_dbt_model.............. [RUN]
USING MY IMPLEMENTATION
* Deprecation Warning: The materialization ("my_table") did not explicitly return a list
    of relations to add to the cache. By default the target relation will be
    added, but this behavior will be removed in a future version of dbt.

  For more information, see:
  https://docs.getdbt.com/v0.15/docs/creating-new-materializations#section-6-returning-relations
    

Unhandled error while executing model.my_new_package.my_first_dbt_model
'NoneType' object has no attribute 'status'
08:23:21 | 1 of 1 ERROR creating my_table model dbt_ryan.my_first_dbt_model..... [ERROR in 0.03s]
08:23:21 | 
08:23:21 | Finished running 1 my_table model in 0.90s.

Completed with 1 error and 0 warnings:

'NoneType' object has no attribute 'status'

Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

The output I actually got.

Running with dbt=0.15.0
Found 1 model, 0 tests, 0 snapshots, 0 analyses, 134 macros, 0 operations, 0 seed files, 0 sources

08:30:58 | Concurrency: 4 threads (target='dev')
08:30:58 | 
08:30:58 | 1 of 1 START incremental model dbt_ryan.my_first_dbt_model........... [RUN]
08:31:02 | 1 of 1 OK created incremental model dbt_ryan.my_first_dbt_model...... [MERGE (1) in 4.07s]
08:31:03 | 
08:31:03 | Finished running 1 incremental model in 5.07s.

Completed successfully

Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.15.0
   latest version: 0.15.0

Up to date!

The operating system you're using: macOS Mojave

The output of python --version: Python 3.6.8

@r-hempenstall r-hempenstall added bug Something isn't working triage labels Nov 27, 2019
@drewbanin drewbanin removed the triage label Nov 27, 2019
@drewbanin drewbanin added this to the 0.15.1 milestone Nov 27, 2019
@beckjake
Copy link
Contributor

I don’t really understand how this bug happened in the sense that I have no idea what could've changed, but the current 0.15.0 behavior is that we do these searches:

  • search manifest.nodes for adapter-named macro (name=='materialization_view_bigquery'), first match wins
  • if that failed, search manifest.nodes for default macro (name=='materialization_view_default'), first match wins

Neither search specifies the package name, so we just grab the first matching macro. In 0.15.0, the way we build the macros dict is core-macros-first. So since we iterate over the dict to search, we happen to always find core first. That means in Python <3.6 where dicts are randomly sorted, the result would be non-deterministic? Which seems pretty bad, way worse than this!

My reading of 0.14.4 is that it's doing the same thing (loading internal packages first, iterating in dict order), so I am honestly not sure at all what changed here - I assume in <0.15 we built the dict subtly differently during parsing and it happened to work out.

My first thought was to instead add a check where we add an outer loop that includes the package name - first look in the model’s package, and then look “anywhere”. But then I realized that if there's a dependency that has a materialization macro override in it (maybe you wanted to override incremental in a common base package!), we'll miss that. How should we handle it? Is it just tough cookies, rename your materialization in that case? Or should we order the namespace search in some fashion?

On a related note: it looks a lot like adapter plugins can accidentally override the default materialization. dbt should ignore that (because core loads first) but any change here has to take that risk into account. Maybe we should only look in the dbt_postgres package for materialization_X_postgres? Or give an error if we find materialization_X_default in a plugin?

@drewbanin maybe these are some nice UX problems for you to ponder over thanksgiving :)

For now I'm going to proceed by mostly just ignoring the dependencies problem, but I don't feel great about that. We should at least document whatever we choose (and let's not document "we chose dictionary order"!).

@r-hempenstall
Copy link
Author

I can't say that it's the right way to handle it but at Ocado we decided this morning that if we're using our own materialization (even if it's based off of a default dbt one) we should use a different name for it to make it clear that the logic is not the same regardless of whether we can override materializations or not. That may or may not help with your UX problems but I thought I'd let you know the route we decided to take.

@drewbanin
Copy link
Contributor

@beckjake I did see some hard-to-reproduce instances of dbt not picking up user-space materializations on < 0.15.0. Nondeterministic ordering with < py36 does sound like a likely (and really bad!) culprit.

But then I realized that if there's a dependency that has a materialization macro override in it (maybe you wanted to override incremental in a common base package!), we'll miss that. How should we handle it?

I'm not certain I totally understand the failure mode here. I like the idea of an outer loop that searches by package. The order of resolution should be:

  1. dbt-core's internal packages
  2. all of the packages included via the packages.yml file (in arbitrary order, I guess?)
  3. the "main" project

Can you clarify what the issue would be with an approach like this?

Maybe we should only look in the dbt_postgres package for materialization_X_postgres? Or give an error if we find materialization_X_default in a plugin?

I think part of the good answer here probably includes better materialization namespacing. That might mean having to do config(materialized='dbt_utils.incremental') in order to pick a materialization overridden in an imported package. I'm relatively happy disallowing default materializations in packages (unsure how we actually differentiate between the global project and a user-space package....) but I don't think we need to couple that change with this fix.

For now I'm going to proceed by mostly just ignoring the dependencies problem, but I don't feel great about that. We should at least document whatever we choose.

Cool, strong agree. Let's resolve materializations in the following order:

  1. dbt-core's internal packages
  2. all of the packages included via the packages.yml file (in arbitrary order, I guess?)
  3. the "main" project being run

I have to check out how this works in practice, but my thinking here is that each macro source above will overwrite the ones above it. So, materialization_X_postgres in the "main" project will take precedence over materialization_X_postgres or materialization_X_default defined in the internal packages. If resolution happens in the opposite way and we grab the first matching materialization macro, then the order listed above will be inverted.

@r-hempenstall I think your approach is a good one, and I imagine it's the approach we'll want to document and recommend to avoid ambiguity/confusion for users with more complex materialization use cases.

beckjake added a commit that referenced this issue Dec 9, 2019
…lizations

add a formal search order for materializations, tests (#1962)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants