Cannot override dbt default materializations #1962

r-hempenstall · 2019-11-27T08:38:28Z

Describe the bug

I created a customised version of the incremental materialization. I then tried to use that materialization for one of my models and it didn't work -- dbt used the default incremental materialization still. This seems to be the case with other materializations as well.

Steps To Reproduce

Create a fresh dbt project backed by BigQuery using dbt init. Next create a materialization to override one of the default dbt materializations; I used the one below.

{% materialization incremental, adapter='bigquery' -%}

  {{ log('USING MY IMPLEMENTATION', info=True) }}

{%- endmaterialization %}

I then changed the default model to the following.

{{ config(materialized='incremental') }}

select 1 as id

This would be expected to log USING MY IMPLEMENTATION to my terminal and then exit with an error but when I run dbt run I get a nice log output ending with Completed Successfully. If I change the name of the materialization to my_incremental in both files then I get the log line and error message as I would expect.

Expected behavior

I would expect that my materialization would be used rather than dbt's if they both have the same name. In the example above, that would give me the log line and error message that I expected.

Screenshots and log output

The output I expected when I overrode the incremental materialization.

Running with dbt=0.15.0
Found 1 model, 0 tests, 0 snapshots, 0 analyses, 134 macros, 0 operations, 0 seed files, 0 sources

08:23:20 | Concurrency: 4 threads (target='dev')
08:23:20 | 
08:23:20 | 1 of 1 START my_table model dbt_ryan.my_first_dbt_model.............. [RUN]
USING MY IMPLEMENTATION
* Deprecation Warning: The materialization ("my_table") did not explicitly return a list
    of relations to add to the cache. By default the target relation will be
    added, but this behavior will be removed in a future version of dbt.

  For more information, see:
  https://docs.getdbt.com/v0.15/docs/creating-new-materializations#section-6-returning-relations
    

Unhandled error while executing model.my_new_package.my_first_dbt_model
'NoneType' object has no attribute 'status'
08:23:21 | 1 of 1 ERROR creating my_table model dbt_ryan.my_first_dbt_model..... [ERROR in 0.03s]
08:23:21 | 
08:23:21 | Finished running 1 my_table model in 0.90s.

Completed with 1 error and 0 warnings:

'NoneType' object has no attribute 'status'

Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

The output I actually got.

Running with dbt=0.15.0
Found 1 model, 0 tests, 0 snapshots, 0 analyses, 134 macros, 0 operations, 0 seed files, 0 sources

08:30:58 | Concurrency: 4 threads (target='dev')
08:30:58 | 
08:30:58 | 1 of 1 START incremental model dbt_ryan.my_first_dbt_model........... [RUN]
08:31:02 | 1 of 1 OK created incremental model dbt_ryan.my_first_dbt_model...... [MERGE (1) in 4.07s]
08:31:03 | 
08:31:03 | Finished running 1 incremental model in 5.07s.

Completed successfully

Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

System information

Which database are you using dbt with?

The output of dbt --version:

installed version: 0.15.0
   latest version: 0.15.0

Up to date!

The operating system you're using: macOS Mojave

The output of python --version: Python 3.6.8

The text was updated successfully, but these errors were encountered:

beckjake · 2019-11-27T20:48:10Z

I don’t really understand how this bug happened in the sense that I have no idea what could've changed, but the current 0.15.0 behavior is that we do these searches:

search manifest.nodes for adapter-named macro (name=='materialization_view_bigquery'), first match wins
if that failed, search manifest.nodes for default macro (name=='materialization_view_default'), first match wins

Neither search specifies the package name, so we just grab the first matching macro. In 0.15.0, the way we build the macros dict is core-macros-first. So since we iterate over the dict to search, we happen to always find core first. That means in Python <3.6 where dicts are randomly sorted, the result would be non-deterministic? Which seems pretty bad, way worse than this!

My reading of 0.14.4 is that it's doing the same thing (loading internal packages first, iterating in dict order), so I am honestly not sure at all what changed here - I assume in <0.15 we built the dict subtly differently during parsing and it happened to work out.

My first thought was to instead add a check where we add an outer loop that includes the package name - first look in the model’s package, and then look “anywhere”. But then I realized that if there's a dependency that has a materialization macro override in it (maybe you wanted to override incremental in a common base package!), we'll miss that. How should we handle it? Is it just tough cookies, rename your materialization in that case? Or should we order the namespace search in some fashion?

On a related note: it looks a lot like adapter plugins can accidentally override the default materialization. dbt should ignore that (because core loads first) but any change here has to take that risk into account. Maybe we should only look in the dbt_postgres package for materialization_X_postgres? Or give an error if we find materialization_X_default in a plugin?

@drewbanin maybe these are some nice UX problems for you to ponder over thanksgiving :)

For now I'm going to proceed by mostly just ignoring the dependencies problem, but I don't feel great about that. We should at least document whatever we choose (and let's not document "we chose dictionary order"!).

r-hempenstall · 2019-11-28T13:20:40Z

I can't say that it's the right way to handle it but at Ocado we decided this morning that if we're using our own materialization (even if it's based off of a default dbt one) we should use a different name for it to make it clear that the logic is not the same regardless of whether we can override materializations or not. That may or may not help with your UX problems but I thought I'd let you know the route we decided to take.

drewbanin · 2019-12-02T20:15:14Z

@beckjake I did see some hard-to-reproduce instances of dbt not picking up user-space materializations on < 0.15.0. Nondeterministic ordering with < py36 does sound like a likely (and really bad!) culprit.

But then I realized that if there's a dependency that has a materialization macro override in it (maybe you wanted to override incremental in a common base package!), we'll miss that. How should we handle it?

I'm not certain I totally understand the failure mode here. I like the idea of an outer loop that searches by package. The order of resolution should be:

dbt-core's internal packages
all of the packages included via the packages.yml file (in arbitrary order, I guess?)
the "main" project

Can you clarify what the issue would be with an approach like this?

Maybe we should only look in the dbt_postgres package for materialization_X_postgres? Or give an error if we find materialization_X_default in a plugin?

I think part of the good answer here probably includes better materialization namespacing. That might mean having to do config(materialized='dbt_utils.incremental') in order to pick a materialization overridden in an imported package. I'm relatively happy disallowing default materializations in packages (unsure how we actually differentiate between the global project and a user-space package....) but I don't think we need to couple that change with this fix.

For now I'm going to proceed by mostly just ignoring the dependencies problem, but I don't feel great about that. We should at least document whatever we choose.

Cool, strong agree. Let's resolve materializations in the following order:

dbt-core's internal packages
all of the packages included via the packages.yml file (in arbitrary order, I guess?)
the "main" project being run

I have to check out how this works in practice, but my thinking here is that each macro source above will overwrite the ones above it. So, materialization_X_postgres in the "main" project will take precedence over materialization_X_postgres or materialization_X_default defined in the internal packages. If resolution happens in the opposite way and we grab the first matching materialization macro, then the order listed above will be inverted.

@r-hempenstall I think your approach is a good one, and I imagine it's the approach we'll want to document and recommend to avoid ambiguity/confusion for users with more complex materialization use cases.

…lizations add a formal search order for materializations, tests (#1962)

r-hempenstall added bug Something isn't working triage labels Nov 27, 2019

drewbanin removed the triage label Nov 27, 2019

drewbanin added this to the 0.15.1 milestone Nov 27, 2019

beckjake mentioned this issue Dec 4, 2019

add a formal search order for materializations, tests (#1962) #1976

Merged

beckjake closed this as completed in #1976 Dec 9, 2019

beckjake added a commit that referenced this issue Dec 9, 2019

Merge pull request #1976 from fishtown-analytics/fix/override-materia…

2dd604b

…lizations add a formal search order for materializations, tests (#1962)

beckjake mentioned this issue Jan 6, 2020

cannot override bultin macros #2032

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot override dbt default materializations #1962

Cannot override dbt default materializations #1962

r-hempenstall commented Nov 27, 2019

beckjake commented Nov 27, 2019

r-hempenstall commented Nov 28, 2019

drewbanin commented Dec 2, 2019

Cannot override dbt default materializations #1962

Cannot override dbt default materializations #1962

Comments

r-hempenstall commented Nov 27, 2019

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

beckjake commented Nov 27, 2019

r-hempenstall commented Nov 28, 2019

drewbanin commented Dec 2, 2019