fix: override materialization python models #8538

devmessias · 2023-08-31T21:04:04Z

resolves #8520 , resolves #9703

Problem

Current Behavior

When working within my company's workflow, we outline all configurations, metadata, etc. in the schema.yml file, with broader configurations being described in the dbt_projects.yml. However, the materialization of Python models cannot be controlled through these files. This requires us to set the configuration within the model itself, as demonstrated below:

def model(dbt, session):
    dbt.config(materialization='incremental')

If this isn't done, the materialization always defaults to 'table' in the generated manifest.json.

commit that introduces this behavior: https://github.com/dbt-labs/dbt-core/blame/48d04e81417195393af5af1f78ef695f3398f193/core/dbt/parser/base.py#L217

Expected Behavior

I would expect that the materialization of Python models could be controlled through these configuration files, just as it is observed with SQL models. Ideally, without having to specify the materialization within the model itself. When defining configurations in these files, they should be respected, and the materialization shouldn't default to 'table' in the generated manifest.json unless explicitly set to do so. The current behavior introduces an inconsistency in the way dbt models are configured and managed.

Solution

I've considered three approaches to address this bug. Firstly, I'll outline the two solutions I discarded, followed by an explanation of why I chose the third.

First Solution: (discarded)

Remove the behavior introduced by commit 48d04e8.

While this is a straightforward, it could break many existing pipelines. This would be unfavorable for many dbt users because it's a behavior available since v1.4.

Second Solution: (discarded)

The proposal here is to add a condition where the default materialization value would only used if it hasn't been predefined in a .yml file or within the model's .py file itself.

# core/dbt/parser/base.py:L192
if block.path.relative_path.endswith(".py"):
    language = ModelLanguage.python
    config.add_config_call({"_dbt_default_arg_materialized": "table"})

# core/dbt/context/context_config.py:L140
if patch_config_dict:
    result = self._update_from_config(result, patch_config_dict)

default_args = [k for k in config_call_dict if k.startswith("_dbt_default_arg_")]
result_keys = dir(result)
for key in default_args:
    config_key = key.replace("_dbt_default_arg_", "")
    if config_key in result_keys:
        del config_call_dict[key]
    else:
        config_call_dict[config_key] = config_call_dict[key]

However, this approach has two major issues:

1- The solution is messy and has a high cognitive load.
2- If the configuration for this materialization isn't defined in the model's .py or any .yml file, it defaults to creating a "view" materialization. This is because the NodeConfig class defaults to "view" as its materialization.

dbt-core/core/dbt/contracts/graph/model_config.py

Lines 438 to 441 in e5e1a27

    
           class NodeConfig(NodeAndTestConfig): 
        
               # Note: if any new fields are added with MergeBehavior, also update the 
        
               # 'mergebehavior' dictionary 
        
               materialized: str = "view"

It's just a hypothesis, but I believe this might be the very reason for commit 48d04e8 since "view" is not supported for Python models.

Third Solution (preferred):

In base.py, the proposal is to add a key-value pair to the config_call_dict.

https://github.com/devmessias/dbt-core/blob/21696c70c5c3c22003f4282405df9fad65a96ce5/core/dbt/parser/base.py#L215-L224,

Parameters described in _dbt_node_type_configs are used to overwrite the initial node configuration:

https://github.com/devmessias/dbt-core/blob/21696c70c5c3c22003f4282405df9fad65a96ce5/core/dbt/context/context_config.py#L136

Since we don't want the "_dbt_node_type_configs" to persist, I opted to remove it in the latest update.

https://github.com/devmessias/dbt-core/blob/21696c70c5c3c22003f4282405df9fad65a96ce5/core/dbt/context/context_config.py#L152-L156

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

github-actions · 2023-08-31T21:04:19Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

codecov · 2023-08-31T21:07:29Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.79%. Comparing base (7940ad5) to head (7d743da).
Report is 53 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (7940ad5) and HEAD (7d743da). Click for more details.

HEAD has 23 uploads less than BASE

Flag BASE (7940ad5) HEAD (7d743da)

unit 4 1

integration 20 0

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #8538       +/-   ##
===========================================
- Coverage   89.22%   62.79%   -26.43%     
===========================================
  Files         183      183               
  Lines       23457    23630      +173     
===========================================
- Hits        20930    14839     -6091     
- Misses       2527     8791     +6264

Flag	Coverage Δ
integration	`?`
unit	`62.79% <100.00%> (+0.70%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
Unit Tests	`62.79% <100.00%> (+0.70%)`	⬆️
Integration Tests	`62.79% <100.00%> (-26.43%)`	⬇️

devmessias · 2023-09-08T22:34:38Z

Hi @gshank,

Could I please receive feedback on this PR? One aspect I'm unable to understand is why the codecov is reducing the percentage of coverage. This PR did not introduce any branch conditions inside the code. I've noticed a similar decrease in some other PRs as well; should I be concerned about this at the moment?

gshank · 2023-09-14T14:32:10Z

I think the original "fix" (setting the materialization in the config_call_dict) was wrong, but I think this approach just doubles down on that wrong approach. This is a tricky area. I suspect that what's needed is some special setting in "update_parsed_node_config" in core/dbt/parser/base.py. The problem is detecting the difference between somebody actually setting the materialization vs it just defaulting. I'll have to poke around a bit more to see if I can find a good way to do this.

devmessias · 2023-09-18T14:09:46Z

I think the original "fix" (setting the materialization in the config_call_dict) was wrong, but I think this approach just doubles down on that wrong approach. This is a tricky area. I suspect that what's needed is some special setting in "update_parsed_node_config" in core/dbt/parser/base.py. The problem is detecting the difference between somebody actually setting the materialization vs it just defaulting. I'll have to poke around a bit more to see if I can find a good way to do this.

Thank you for your feedback, @gshank. I agree that this change may not be the ideal one.

It's a great idea to bring @aranke into this discussion, since they were responsible for the original commit that introduced this behavior.

Note: As I mentioned in the PR, it seems to me that the solution from @aranke is easier to reconcile the differences between Python and SQL models. I've tried to rethink the behavior that @aranke introduced and try to place it elsewhere but couldn't, maybe there really isn't a cleaner and fast solution yet, especially since it's a behavior that breaks the convention of all versions post 1.4.

Hey @aranke, could you share your thoughts to enrich our conversation? What do you think?

aranke · 2023-09-25T14:31:25Z

Hey @devmessias,

My original PR was just to default to table materialization for Python models, but I see that it's had this unintended side effect.
I agree with @gshank that we should probably take a step back and check what the expected behavior should be.

devmessias · 2023-09-25T16:30:00Z

A big issue is that removing the default table behavior could break numerous pipelines, as views are not supported in Python models.

devmessias · 2023-09-26T01:29:01Z

One option I am considering, given that @aranke only wanted to change the default materialization for Python (so I assume there is no point in considering future default changes), and as @gshank suggested it to be something in update_parsed_node_config, we could simply check if the language in parsed_node within the same function is a Python model and at the same time verify if the materialization is set as view, then change it to table. Since this is what causes the issue. What do you think? It would also have the advantage of being much simpler.

gshank · 2023-10-02T13:17:41Z

@devmessias I agree with your proposed solution. In update_parsed_node_config check that the language is python and that materialization is not set in the config dict, and if both of those are true, set it in the config dict.

See dbt-labs#8538 discussion for more details

devmessias · 2023-10-02T17:40:21Z

I've updated the PR with the proposed solution @gshank. Could you review the changes to ensure we're on the same page?

devmessias · 2023-10-02T18:01:48Z

Ah, I forgot to mention one thing. The default is set before the update_node_config. As a result, the following test created by @aranke , which was passing in my previous suggestion, won't pass now. Would it be better to remove this test and create a new one that checks only after the update_node_config?

gshank · 2023-10-02T18:15:29Z

Yes, that would be better. If you can find a way to do it in unit tests, that would be fine, but it might be easier to just look at the manifest for the python node in a functional test.

devmessias · 2023-10-03T12:31:14Z

Hi @gshank. I've managed to determine how to retain the unit test.
My previous approach was somewhat messy as it also had to manage config.build_config_dict(patch_config_dict=patch_config_dict) to address all issues within update_parsed_node_config. However, I've modified update_parsed_node_config to pass the model language as an argument to ContextConfigGenerator, thereby creating an appropriate NodeConfig for the Python model. I believe this method is superior. Could you please review my new commit and share your thoughts?

devmessias · 2023-10-13T13:10:48Z

Hi @aranke . What do you think about this new approach?

devmessias · 2023-10-24T15:29:09Z

Hi @gshank, just circling back on this pull request. Would appreciate your feedback when it's convenient for you. Thanks!

devmessias · 2023-12-27T00:50:11Z

Hi @dbeatty10, @gshank and @aranke, any updates on this bug/MR? I'm not sure about the dbt-core roadmap, or if bug #8520 was resolved in another MR, because if so, I can close this PR.

dbeatty10 · 2024-04-12T01:16:14Z

@graciegoheen and @peterallenwebb when you take a look at this, could you also check if it resolves #9703?

The reprex is here: #9703 (comment)

peterallenwebb · 2024-05-08T18:39:42Z

core/dbt/contracts/graph/model_config.py

+
+    resource = lookup.get(resource_type, NodeConfig)
+
+    if language == "python" and (resource == NodeConfig):


Let's use the enum value here instead of a string literal.

Here's the enum:

dbt-core/core/dbt/artifacts/resources/types.py

Lines 54 to 56 in a7af3b3

class ModelLanguage(StrEnum):

python = "python"

sql = "sql"

Here's an example of using it:

dbt-core/core/dbt/parser/base.py

Line 25 in a7af3b3

from dbt.node_types import AccessType, ModelLanguage, NodeType

dbt-core/core/dbt/parser/base.py

Line 218 in a7af3b3

language = ModelLanguage.python

peterallenwebb · 2024-05-08T18:41:47Z

core/dbt/context/context_config.py

@@ -82,8 +82,9 @@ def get_config_dict(self, resource_type: NodeType) -> Dict[str, Any]:


 class BaseContextConfigGenerator(Generic[T]):
-    def __init__(self, active_project: RuntimeConfig):
+    def __init__(self, active_project: RuntimeConfig, language: Optional[ModelLanguage] = None):


@gshank @ChenyuLInx Does this seem like the best way to do this? It seems odd to me that language would be the one element that needs to be passed separately.

@devmessias Thanks for the contribution and engaging discussion, sorry we have been a behind on this patch.

I have the same feeling when reading this code.
I can see the appeal of having a proper PythonNode with default materialization being table.
But because this node is buried deep under layers functions, I don't think having this language passed through is ideal.
I would recommend going back to what Gerda initially proposed (link)

In update_parsed_node_config check that the language is python and that materialization is not set in the config dict, and if both of those are true, set it in the config dict.

That would make this PR easier to follow. What do you think?

Hi, thank you for your feedback

It’s been quite some time since I worked on this, but if memory serves me right, the approach you’re referring to is indeed what I implemented in this commit: 8f95385. However, this change introduced another bug which was caught by a unit test, as documented here: unit test issue comment

@peterallenwebb @ChenyuLInx I believe the need to have a Python node is because the default materialization type in dbt is not supported for Python models (at least when I made this PR). This issue was not apparent before because it was being obscured by a materialization bug.

I think this the unit test is failing because the config_dict generated by config_dict = config.build_config_dict(patch_config_dict=patch_config_dict) is what's being added to the parsed node in the end.
And I see why you have the current solution which is creating a place to define a default node.

I think passing the language through multiple layers of calls is not ideal, but if we want to do something better, we have two ways:

do some large refactors to have this config consolidation function moved to a node and that would give us an easier way to define a default materialization on a python model node

slightly twist the patch_config_dict(or create a new version of it) with any config that is different from the default default, and apply it. We should add some comments to it also. (I personally prefer this for the current case, and we can refactor later on if we see a pattern developing here).

So this would look like adding the following logic here, and update the comment a bit.

if not patch_config_dict: patch_config_dict = {} if parsed_node.resource_type == NodeType.Model and parsed_node.language == ModelLanguage.python: if "materialized" not in patch_config_dict: patch_config_dict["materialized"] = "table"

@devmessias @gshank what do you think?

I agree @ChenyuLInx . It's more similar to my first proposal and will solve the default materialization issue

@devmessias are you still interested in making an update to this? No worries if not, we can take it across the finish line. Thanks either way for putting so much effort into this!

Yes, I’m still interested even though I’ve switched jobs. Now it’s a matter of honor 😆 . I will make the modifications this week and do a rebase.

Hi @devmessias just wanted to check if you're still interested in completing this PR!

Sorry @graciegoheen @ChenyuLInx @gshank, my current role has kept me really busy for quite a while, and I haven't been using dbt lately.I̶'̶l̶l̶ ̶t̶r̶y̶ ̶t̶o̶ ̶g̶e̶t̶ ̶t̶o̶ ̶i̶t̶ ̶o̶v̶e̶r̶ ̶t̶h̶e̶ ̶w̶e̶e̶k̶e̶n̶d̶,̶ ̶t̶h̶o̶u̶g̶h̶.̶

By the way, I know this might not be the best channel to ask, I applied for a position for the dbt-core team, I noticed that the position was listed as Romania (Remote), but I wasn’t sure if I could apply from Brazil, so I went ahead and applied anyway. I’ve contributed to dbt in the past, if any of you have any tips or insights, I'd really appreciate it!

updated

Okay, since the solution was similar to the first one, it was really easy to revert everything and perform a rebase. 😌 I've updated it with the suggestions and also changed the test for the custom Python model from view to be incremental (@alex-hsp). I believe views are still not supported, right?

tests/unit/test_parser.py

See dbt-labs#8538 discussion for more details

devmessias · 2024-10-07T13:24:41Z

ping everyone!

ChenyuLInx

Thanks for adding the patch!!

core/dbt/context/context_config.py

Already adjusted

devmessias requested a review from a team as a code owner August 31, 2023 21:04

devmessias requested a review from gshank August 31, 2023 21:04

cla-bot bot added the cla:yes label Aug 31, 2023

devmessias changed the title ~~[W.I.P.] fix: override materialization python models~~ fix: override materialization python models Aug 31, 2023

devmessias mentioned this pull request Sep 1, 2023

[CT-3049] [Bug] Unable to override default materialization for python models (dbt >1.4)/Hierarchical Configuration Not Applied #8520

Closed

2 tasks

devmessias force-pushed the fix/override_materialization_python_models branch from 1dd087b to f4d3551 Compare September 8, 2023 22:20

dataders added the ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering label Sep 13, 2023

devmessias added a commit to devmessias/dbt-core that referenced this pull request Oct 2, 2023

fix: change view materialization to table in python models

152ea5c

See dbt-labs#8538 discussion for more details

devmessias added a commit to devmessias/dbt-core that referenced this pull request Oct 2, 2023

fix: change view materialization to table in python models

8f95385

See dbt-labs#8538 discussion for more details

devmessias force-pushed the fix/override_materialization_python_models branch from 152ea5c to 8f95385 Compare October 2, 2023 17:36

devmessias mentioned this pull request Jan 5, 2024

Python models and external materialisation duckdb/dbt-duckdb#300

Closed

dbeatty10 added the community This PR is from a community member label Apr 3, 2024

graciegoheen requested review from peterallenwebb and removed request for gshank April 10, 2024 18:51

graciegoheen self-requested a review April 10, 2024 18:51

graciegoheen assigned peterallenwebb and graciegoheen Apr 10, 2024

dbeatty10 mentioned this pull request Apr 12, 2024

[Bug] Custom materialization not respected when passed via dbt_project.yaml or model.yaml #9703

Closed

2 tasks

peterallenwebb previously requested changes May 8, 2024

View reviewed changes

graciegoheen assigned ChenyuLInx May 21, 2024

devmessias added a commit to devmessias/dbt-core that referenced this pull request Oct 2, 2024

fix: change view materialization to table in python models

782f27d

See dbt-labs#8538 discussion for more details

devmessias added a commit to devmessias/dbt-core that referenced this pull request Oct 2, 2024

fix: change view materialization to table in python models

fff0850

See dbt-labs#8538 discussion for more details

devmessias added a commit to devmessias/dbt-core that referenced this pull request Oct 2, 2024

fix: change view materialization to table in python models

9cee3d6

See dbt-labs#8538 discussion for more details

devmessias added 7 commits October 2, 2024 14:06

fix: override materialization python models

9636aa5

chore: changie doc

7e338a1

fix: change view materialization to table in python models

55ef48c

See dbt-labs#8538 discussion for more details

feat: create PythonNodeConfig

18ed6b0

fix: revert PythonNodeModel

7fbaf77

fix: python default model should be table

6f2741f

test(python): should test for incremental not view

111ef65

devmessias force-pushed the fix/override_materialization_python_models branch from 6faca6d to 111ef65 Compare October 2, 2024 17:24

chore: changie doc

7c94bde

devmessias mentioned this pull request Oct 29, 2024

Create dbt.var.get() for python models to retrieve project variables without a yaml file #10914

Open

3 tasks

ChenyuLInx approved these changes Nov 13, 2024

View reviewed changes

core/dbt/context/context_config.py Outdated Show resolved Hide resolved

take out additional line

7d743da

ChenyuLInx merged commit 84230ce into dbt-labs:main Nov 14, 2024
53 of 54 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: override materialization python models #8538

fix: override materialization python models #8538

devmessias commented Aug 31, 2023 •

edited

Loading

github-actions bot commented Aug 31, 2023

codecov bot commented Aug 31, 2023 •

edited

Loading

devmessias commented Sep 8, 2023

gshank commented Sep 14, 2023

devmessias commented Sep 18, 2023 •

edited

Loading

aranke commented Sep 25, 2023

devmessias commented Sep 25, 2023

devmessias commented Sep 26, 2023

gshank commented Oct 2, 2023

devmessias commented Oct 2, 2023

devmessias commented Oct 2, 2023

gshank commented Oct 2, 2023

devmessias commented Oct 3, 2023

devmessias commented Oct 13, 2023

devmessias commented Oct 24, 2023

devmessias commented Dec 27, 2023

dbeatty10 commented Apr 12, 2024

peterallenwebb May 8, 2024

dbeatty10 Jul 16, 2024

peterallenwebb May 8, 2024

ChenyuLInx May 9, 2024

devmessias May 10, 2024

devmessias May 14, 2024 •

edited

Loading

ChenyuLInx May 24, 2024

devmessias May 29, 2024

ChenyuLInx Jun 3, 2024

devmessias Jun 4, 2024

graciegoheen Sep 4, 2024

devmessias Oct 2, 2024 •

edited

Loading

devmessias commented Oct 7, 2024

ChenyuLInx left a comment

	class NodeConfig(NodeAndTestConfig):
	# Note: if any new fields are added with MergeBehavior, also update the
	# 'mergebehavior' dictionary
	materialized: str = "view"


		resource = lookup.get(resource_type, NodeConfig)

		if language == "python" and (resource == NodeConfig):

fix: override materialization python models #8538

fix: override materialization python models #8538

Conversation

devmessias commented Aug 31, 2023 • edited Loading

Problem

Current Behavior

Expected Behavior

Solution

First Solution: (discarded)

Second Solution: (discarded)

Third Solution (preferred):

Checklist

github-actions bot commented Aug 31, 2023

codecov bot commented Aug 31, 2023 • edited Loading

Codecov Report

devmessias commented Sep 8, 2023

gshank commented Sep 14, 2023

devmessias commented Sep 18, 2023 • edited Loading

aranke commented Sep 25, 2023

devmessias commented Sep 25, 2023

devmessias commented Sep 26, 2023

gshank commented Oct 2, 2023

devmessias commented Oct 2, 2023

devmessias commented Oct 2, 2023

gshank commented Oct 2, 2023

devmessias commented Oct 3, 2023

devmessias commented Oct 13, 2023

devmessias commented Oct 24, 2023

devmessias commented Dec 27, 2023

dbeatty10 commented Apr 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

devmessias May 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

devmessias Oct 2, 2024 • edited Loading

Choose a reason for hiding this comment

devmessias commented Oct 7, 2024

ChenyuLInx left a comment

Choose a reason for hiding this comment

devmessias commented Aug 31, 2023 •

edited

Loading

codecov bot commented Aug 31, 2023 •

edited

Loading

devmessias commented Sep 18, 2023 •

edited

Loading

devmessias May 14, 2024 •

edited

Loading

devmessias Oct 2, 2024 •

edited

Loading