[CT-665] [Feature] Allow control over the outer CTE identifier generated in ephemeral materialization #5273

miro-ur · 2022-05-18T19:51:32Z

Is this your first time opening an issue?

I have read the expectations for open source contributors

Describe the Feature

We use a dot notation to group and organize dbt model files with a naming pattern of {group}.{model} such as ingest.source and dw.dimension etc. With specially complex models this provides a level of organization and consistency that keeps files organized, unique and faster to identify and find.

We rely on config aliases to then properly name relations - otherwise adding schema would be an invalid fully qualified relation name like schema.layer.table. This works for both table and view materialization, however causes the generated sql to be invalid when using the ephemeral materialization due to the outer CTE identifier being generated as __dbt__cte__ingest.source.

If model alias is configured it should be used everywhere as the identifier of the output instead of the file name - regardless of materialization used to give end user better control over generated SQL code and avoid errors or potential invalid code being generated when using an unconventional naming of model files.

Describe alternatives you've considered

An alternative might be some form of sanitization of CTE name during generation to avoid generating invalid CTE identifier - like stripping non alphanumeric characters or generating a non model name linked name - like an incremental __dbt__cte__1 - as long as its a valid and unique CTE name to enable stacking CTEs.

Who will this benefit?

This feature would introduce consistency in behavior compared to other materialization strategies (alias being used as table or view name), give uses control over naming of models vs files and allow end users to come up with creative naming convention and organization of model files without having to worry that switching materialization will cause a failure because of the model file name.

Are you interested in contributing this feature?

Happy to contribute, looking for feedback and some pointers on this from the team

Anything else?

No response

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2022-05-19T10:57:18Z

@miro-ur Thanks for opening!

This is something that's defined in Python today, but within the "adapter" interface, to accommodate the fact that different databases support different naming conventions:

dbt-core/core/dbt/adapters/base/relation.py

Lines 204 to 219 in 2c42fb4

    
           @staticmethod 
        
           def add_ephemeral_prefix(name: str): 
        
               return f"__dbt__cte__{name}" 
        
           @classmethod 
        
           def create_ephemeral_from_node( 
        
               cls: Type[Self], 
        
               config: HasQuoting, 
        
               node: Union[ParsedNode, CompiledNode], 
        
           ) -> Self: 
        
               # Note that ephemeral models are based on the name. 
        
               identifier = cls.add_ephemeral_prefix(node.name) 
        
               return cls.create( 
        
                   type=cls.CTE, 
        
                   identifier=identifier, 
        
               ).quote(identifier=False)

I don't know that we'd get much benefit from turning this into a macro, and it would require a lot of code plumbing to set up a Jinja rendering context where it isn't currently needed.

I think both of your alternative proposals are totally reasonable:

Use node.alias instead of node.name for creating the CTE name
Replace non-alphanumeric characters with _ when creating the CTE name

I can't think offhand of any risks or downstream implications. It will be worth verifying with our automated tests, and of course adding a new one. We have a few existing tests for "models with dots in their names" here and here, since this is a capability that we know some users rely on, and we want to avoid any future regressions.

I'd welcome a PR for this!

miro-ur · 2022-07-18T19:29:22Z

@jtcohen6 Appreciate the guidance and apologies for the delay.
I just forked the repo and will do my best to make the changes and submit a PR.

github-actions · 2023-07-14T02:08:39Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions · 2023-07-22T01:53:30Z

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

miro-ur added enhancement New feature or request triage labels May 18, 2022

github-actions bot changed the title ~~[Feature] Allow control over the outer CTE identifier generated in ephemeral materialization~~ [CT-665] [Feature] Allow control over the outer CTE identifier generated in ephemeral materialization May 18, 2022

jtcohen6 added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors Team:Adapters Issues designated for the adapter area of the code labels May 19, 2022

jtcohen6 removed the triage label May 19, 2022

miro-ur mentioned this issue Jul 18, 2022

Feature CT-665 Allow control over the outer CTE identifier generated in ephemeral materialization #5488

Closed

6 tasks

github-actions bot added the stale Issues that have gone stale label Jul 14, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 22, 2023

jeancochrane mentioned this issue Aug 16, 2023

Update dbt views to select from other dbt models where possible ccao-data/data-architecture#71

Merged

This was referenced Jun 10, 2024

Use model alias for the CTE identifier generated during ephemeral materialization dbt-labs/dbt-adapters#236

Merged

Use model alias for the CTE identifier generated during ephemeral materialization #10290

Merged

dbeatty10 reopened this Jul 10, 2024

dbeatty10 mentioned this issue Jul 19, 2024

[Bug] DBT unit tests don't work properly when source table name matches another source or model #10433

Open

2 tasks

github-actions bot removed the stale Issues that have gone stale label Jul 26, 2024

colin-rogers-dbt closed this as completed in dbt-labs/dbt-adapters#236 Aug 9, 2024

antitoine mentioned this issue Nov 29, 2024

[Regression] BigQuery: Unit tests using sources with sharded tables no longer work after CTE naming change #11075

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-665] [Feature] Allow control over the outer CTE identifier generated in ephemeral materialization #5273

[CT-665] [Feature] Allow control over the outer CTE identifier generated in ephemeral materialization #5273

miro-ur commented May 18, 2022

jtcohen6 commented May 19, 2022

miro-ur commented Jul 18, 2022

github-actions bot commented Jul 14, 2023

github-actions bot commented Jul 22, 2023

[CT-665] [Feature] Allow control over the outer CTE identifier generated in ephemeral materialization #5273

[CT-665] [Feature] Allow control over the outer CTE identifier generated in ephemeral materialization #5273

Comments

miro-ur commented May 18, 2022

Is this your first time opening an issue?

Describe the Feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

jtcohen6 commented May 19, 2022

miro-ur commented Jul 18, 2022

github-actions bot commented Jul 14, 2023

github-actions bot commented Jul 22, 2023