Fix persist_docs for columns #180

jtcohen6 · 2021-06-14T22:12:26Z

follow up to #84, #170

Description

Although #170 implemented spark__alter_column_comment, we were missing the needed call to the persist_docs macro itself in needed materializations, since relation-level docs are handled within the create_x_as macro DDL.

So this PR:

Defined spark__persist_docs, to add column descriptions only
Adds a call to persist_docs to the table + seed materializations. It's already in the snapshot materialization, and views cannot persist columns descriptions.
Adds an integration test modeled off the one here. Key difference: column descriptions don't show up in show table extended in ... like '*', so instead of pulling from catalog.json, we have to run describe extended for each table we want to check.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

jtcohen6 · 2021-06-14T22:41:04Z

I forgot that Databricks SQL endpoints can't create parquet tables. I'll update when I get a chance tomorrow.

leahwicz · 2021-06-14T23:33:40Z

dbt/include/spark/macros/materializations/seed.sql

@@ -81,10 +81,7 @@
  {%- set agate_table = load_agate_table() -%}
  {%- do store_result('agate_table', response='OK', agate_table=agate_table) -%}

-  {{ run_hooks(pre_hooks, inside_transaction=False) }}


Why do the run_hooks calls change? Is it related to the persist_docs change or is this related to something else?

This isn't really related to the specific change in the PR at all, I just noticed because I was editing related lines. We clearly copied these lines from the default seed materialization a few years back. The default materialization, which was originally written for Redshift/Postgres, assumes a transactional database. Some hooks may want to run before the main transaction starts (begin), after it starts, before it ends (commit), or after it ends.

Spark doesn't have transactions, so there's no need to distinguish between inside_transaction = True|False. I figured I would simplify it while here, but I'd also be fine reverting those lines for PR cleanliness.

jtcohen6 · 2021-06-15T16:40:33Z

I'll cherry-pick to 0.20.latest after merging

* Fix persist_docs for columns * Disable parquet model on endpoint * Rm parquet model, not worth the fuss * Update changelog [skip ci]

binhnefits · 2021-10-05T23:38:48Z

Hi, I was wondering why persist_docs is not called for incremental models?

jtcohen6 · 2021-10-07T15:43:48Z

@binhnefits I think that was an oversight on my part! Would you be able to open a separate issue for that?

Also, if you'd be interested in contributing the fix for it: I think it would just look like adding {% do persist_docs(target_relation, model) %} to the end of the incremental materialization.

Fix persist_docs for columns

885a90c

cla-bot bot added the cla:yes label Jun 14, 2021

leahwicz reviewed Jun 14, 2021

View reviewed changes

jtcohen6 added 2 commits June 14, 2021 21:37

Disable parquet model on endpoint

98af2c1

Rm parquet model, not worth the fuss

3a1bafa

leahwicz approved these changes Jun 15, 2021

View reviewed changes

Update changelog [skip ci]

2914d8d

jtcohen6 merged commit a8a85c5 into master Jun 15, 2021

jtcohen6 deleted the fix/persist-docs-columns branch June 15, 2021 16:40

jtcohen6 added a commit that referenced this pull request Jun 15, 2021

Fix persist_docs for columns (#180)

dec3f38

* Fix persist_docs for columns * Disable parquet model on endpoint * Rm parquet model, not worth the fuss * Update changelog [skip ci]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix persist_docs for columns #180

Fix persist_docs for columns #180

jtcohen6 commented Jun 14, 2021 •

edited

Loading

jtcohen6 commented Jun 14, 2021

leahwicz Jun 14, 2021 •

edited

Loading

jtcohen6 Jun 15, 2021

jtcohen6 commented Jun 15, 2021

binhnefits commented Oct 5, 2021

jtcohen6 commented Oct 7, 2021

Fix persist_docs for columns #180

Fix persist_docs for columns #180

Conversation

jtcohen6 commented Jun 14, 2021 • edited Loading

Description

Checklist

jtcohen6 commented Jun 14, 2021

leahwicz Jun 14, 2021 • edited Loading

Choose a reason for hiding this comment

jtcohen6 Jun 15, 2021

Choose a reason for hiding this comment

jtcohen6 commented Jun 15, 2021

binhnefits commented Oct 5, 2021

jtcohen6 commented Oct 7, 2021

jtcohen6 commented Jun 14, 2021 •

edited

Loading

leahwicz Jun 14, 2021 •

edited

Loading