Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support TTL for BigQuery tables #2711

Merged
merged 18 commits into from
Aug 19, 2020
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
- Upgraded snowflake-connector-python dependency to 2.2.10 and enabled the SSO token cache ([#2613](https://github.com/fishtown-analytics/dbt/issues/2613), [#2689](https://github.com/fishtown-analytics/dbt/issues/2689), [#2698](https://github.com/fishtown-analytics/dbt/pull/2698))

### Features
- Support TTL for BigQuery tables([#2711](https://github.com/fishtown-analytics/dbt/pull/2711))
- Add better retry support when using the BigQuery adapter ([#2694](https://github.com/fishtown-analytics/dbt/pull/2694), follow-up to [#1963](https://github.com/fishtown-analytics/dbt/pull/1963))
- Added a `dispatch` method to the context adapter and deprecated `adapter_macro`. ([#2302](https://github.com/fishtown-analytics/dbt/issues/2302), [#2679](https://github.com/fishtown-analytics/dbt/pull/2679))
- The built-in schema tests now use `adapter.dispatch`, so they can be overridden for adapter plugins ([#2415](https://github.com/fishtown-analytics/dbt/issues/2415), [#2684](https://github.com/fishtown-analytics/dbt/pull/2684))
Expand All @@ -18,7 +19,7 @@

Contributors:
- [@bbhoss](https://github.com/bbhoss) ([#2677](https://github.com/fishtown-analytics/dbt/pull/2677))
- [@kconvey](https://github.com/kconvey) ([#2694](https://github.com/fishtown-analytics/dbt/pull/2694))
- [@kconvey](https://github.com/kconvey) ([#2694](https://github.com/fishtown-analytics/dbt/pull/2694), [#2711], (https://github.com/fishtown-analytics/dbt/pull/2711))

## dbt 0.18.0b2 (July 30, 2020)

Expand Down
6 changes: 6 additions & 0 deletions plugins/bigquery/dbt/adapters/bigquery/impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ class BigqueryConfig(AdapterConfig):
labels: Optional[Dict[str, str]] = None
partitions: Optional[List[str]] = None
grant_access_to: Optional[List[Dict[str, str]]] = None
time_to_expiration: Optional[int] = None


class BigQueryAdapter(BaseAdapter):
Expand Down Expand Up @@ -745,6 +746,11 @@ def get_table_options(
expiration = 'TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 hour)'
opts['expiration_timestamp'] = expiration

if (config.get('time_to_expiration') is not None) and (not temporary):
expiration = ('TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL '
'{} hour').format(config.get('time_to_expiration'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name 'time_to_expiration' doesn't provide any hints about the unit. Maybe this could be 'hours_to_expiration'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 How do you feel about this, following #2697?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally in favor of hours_to_expiration

opts['expiration_timestamp'] = expiration

if config.persist_relation_docs() and 'description' in node:
description = sql_escape(node['description'])
opts['description'] = '"""{}"""'.format(description)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
select 1 as id
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
""""Test adapter specific config options."""
from test.integration.base import DBTIntegrationTest, use_profile
import textwrap
import yaml


class TestBigqueryDatePartitioning(DBTIntegrationTest):

@property
def schema(self):
return "bigquery_test_022"

@property
def models(self):
return "adapter-specific-models"

@property
def profile_config(self):
return self.bigquery_profile()

@property
def project_config(self):
return yaml.safe_load(textwrap.dedent('''\
config-version: 2
models:
test:
materialized: table
expiring_table:
time_to_expiration: 4
'''))

@use_profile('bigquery')
def test_time_to_expiration(self):
_, stdout = self.run_dbt_and_capture()
self.assertIn(
'expiration_timestamp: TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL '
'4 hour)', stdout)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the example I copied here was one that expected failure on the model, so the query would be dumped in stdout.

I probably want to inspect results from self.run_dbt(), but could use a pointer to the compiled SQL within the results to do this assertIn (it's a little hard to decipher the schema sometimes). Let me know if that makes sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want results[index].node.injected_sql. You can look for results by node name using results[index].node.name.

Also, don't feel at all obligated to do this, but because we use pytest for tests now you are free to use the (much more ergonomic, at least to me) assert whatever in stdout syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beckjake The error I got makes me think injected_sql isn't what I'm looking for in results.

E AssertionError: 'expiration_timestamp: TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 4 hour)' not found in 'select 1 as id'

This config adds the expiration_timestamp as part of the ddl, and if this was ddl, it should say something like create or replace table as .... I can't remember if this is present in debug (which I believe just dumps the query), or where else the full ddl might be in the results. Any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now that I look at this more carefully, I think this will be in the output if you run with --debug, but not the injected_sql. injected_sql contains the value that will end up as the sql value in the materialization. But this change happens ultimately in the create_table_as macro that's called from the materialization.

It is, I suppose, always possible that we don't log all our queries on bigquery? That would be pretty bad behavior on our part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this on a real project locally and it doesn't look like the ddl is anywhere in run_results.json, but it definitely is in the output with --debug.

23 changes: 23 additions & 0 deletions test/unit/test_bigquery_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -571,6 +571,29 @@ def test_parse_partition_by(self):
}
)

def test_time_to_expiration(self):
adapter = self.get_adapter('oauth')

expected = {
'expiration': 'TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 4 hour)',
}
actual = adapter.get_table_options(
config={'time_to_expiration': 4}, node={}, temporary=False)
self.assertEqual(expected, actual)


def test_time_to_expiration_temporary(self):
adapter = self.get_adapter('oauth')

expected = {
'expiration': (
'TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 hour)'),
}
actual = adapter.get_table_options(
config={'time_to_expiration': 4}, node={}, temporary=True)
self.assertEqual(expected, actual)



class TestBigQueryFilterCatalog(unittest.TestCase):
def test__catalog_filter_table(self):
Expand Down