Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: adapter cte generation #2712

Merged
merged 5 commits into from
Aug 19, 2020

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented Aug 18, 2020

resolves #2660 (well, makes it feasible)
resolves #2609

Description

This PR makes it possible for the exasol plugin (or any others) to override their Relation class with one that has a custom add_ephemeral_prefix method. For the basic case, this should solve cte-naming issues.

The adapter class is further modified: It now provides a get_compiler method that returns a Compiler instance, so plugins can provide plugin-specific compilation changes.

In particular, it should now be possible to override how dbt creates ephemeral models. No promises, of course, but this should make it feasible for adapters that support subqueries but not CTEs to make their own "ephemeral" models. This is definitely left as an exercise to the plugin author, but overriding the handy _insert_ctes method should help. Note that _insert_ctes also does special work for making data tests work, so authors will need to re-implement that. Which is because...

This PR also makes data tests behave much more like CTEs. Instead of performing a subquery, dbt inserts a custom CTE at the end of the CTE chain that includes the test body, and then the final select clause is a select count(*) from the CTE. Adapters that need to override CTE generation will need to do this as well, but I think that's ok. My understanding is that this will make #2609 work as it should out of the box.

Most adapters should not need to change anything. Any adapters that don't support CTEs but do support subqueries might break here. We should be able tof come up with a backwards-compatibility behavior here if we need to, and make only our adapters use this behavior.

Finally, while I was working on this I found that we were not properly excluding some things from the sandbox. This isn't a huge deal, but as a matter of hygiene we should be prefixing more attributes of objects in the context with _ so jinja won't allow access.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Aug 18, 2020
@beckjake beckjake marked this pull request as ready for review August 18, 2020 18:50
@beckjake beckjake requested a review from jtcohen6 August 18, 2020 18:50
@@ -186,6 +198,90 @@ def _get_compiled_model(
f'was not an ephemeral model: {cte_id}'
)

def _inject_ctes_into_sql(self, sql: str, ctes: List[InjectedCTE]) -> str:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't new code, it moved from core/dbt/contracts/graph/compiled.py


return str(parsed)

def _model_prepend_ctes(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't new code, it moved from core/dbt/contracts/graph/compiled.py, where it was previously a method on the CompiledModel type.

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two birds with one scone—nice!

I think we ourselves will have the privilege of testing the feasibility of this change for plugin authors. We'll need to override the new default behavior for both Spark + Presto, which do not support nested CTEs. That is, a user could not have CTEs within their data test definition, because the whole thing will be wrapped in another CTE. That's an extant problem for ephemeral models (dbt-labs/dbt-spark#11), too.

I'll open a PR in utils to update the _is_ephemeral check to use adapter.add_ephemeral_prefix instead of hard-coding the literal.

core/dbt/task/test.py Outdated Show resolved Hide resolved
@beckjake
Copy link
Contributor Author

That is, a user could not have CTEs within their data test definition, because the whole thing will be wrapped in another CTE. That's an extant problem for ephemeral models (dbt-labs/dbt-spark#11), too.

I don't particularly relish the thought of implementing this, but I think this PR makes it possible to solve that issue, since you can override the compiler to flatten CTEs. It does sound tricky, we might want to end up upstreaming that to dbt core if we do write it, for other plugins (to opt into: nesting them seems better if it's available).

Anyway, it should be pretty straightforward to override the data test behavior for those adapters.

Jacob Beck added 5 commits August 18, 2020 14:33
The adapter's Relation is consulted for adding the ephemeral model prefix

Also hide some things from Jinja

Have the adapter be responsible for producing the compiler, move CTE generation into the Relation object
@beckjake beckjake force-pushed the feature/adapter-cte-generation branch from 06afbbd to d3e4d3f Compare August 18, 2020 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants