Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake database quoting on 0.13 #1396

Closed
tayloramurphy opened this issue Apr 12, 2019 · 5 comments
Closed

Snowflake database quoting on 0.13 #1396

tayloramurphy opened this issue Apr 12, 2019 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@tayloramurphy
Copy link

Issue

Issue description

We're attempting to update to 0.13 and are unable to run dbt when testing.

The command that fails is:

dbt -d seed --profiles-dir profile --target prod

The key parts of the logs seem to be:

$ dbt -d seed --profiles-dir profile --target prod
Running with dbt=0.13.0
2019-04-12 14:27:33,923 (MainThread): Tracking: do not track
2019-04-12 14:27:33,980 (MainThread): Parsing macros/monthly_change.sql
...
2019-04-12 14:27:36,663 (MainThread): Parsing model.snowplow.snowplow_web_events_time
2019-04-12 14:27:36,678 (MainThread): * Deprecation Warning: The adapter function `adapter.already_exists` is deprecated and will be removed in
 a future release of dbt. Please use `adapter.get_relation` instead.
 Documentation for get_relation can be found here:
 https://docs.getdbt.com/reference#adapter

2019-04-12 14:27:36,678 (MainThread): On "snowplow_web_events_time": cache miss for schema "1149-UPGRADE-DBT-TO-0-13_ANALYTICS.ANALYTICS", this is inefficient
2019-04-12 14:27:36,692 (MainThread): Parsing macros/core.sql
...
2019-04-12 14:27:36,911 (MainThread): Parsing macros/materializations/merge.sql
2019-04-12 14:27:36,960 (MainThread): Acquiring new snowflake connection "snowplow_web_events_time".
2019-04-12 14:27:36,961 (MainThread): Opening a new connection, currently in state init
2019-04-12 14:27:37,664 (MainThread): Using snowflake connection "snowplow_web_events_time".
2019-04-12 14:27:37,664 (MainThread): On snowplow_web_events_time: BEGIN
2019-04-12 14:27:37,787 (MainThread): SQL status: SUCCESS 1 in 0.12 seconds
2019-04-12 14:27:37,788 (MainThread): Using snowflake connection "snowplow_web_events_time".
2019-04-12 14:27:37,788 (MainThread): On snowplow_web_events_time: select
      table_catalog as database,
      table_name as name,
      table_schema as schema,
      case when table_type = 'BASE TABLE' then 'table'
           when table_type = 'VIEW' then 'view'
           else table_type
      end as table_type
    from 1149-UPGRADE-DBT-TO-0-13_ANALYTICS.information_schema.tables
    where table_schema ilike 'ANALYTICS'
      and table_catalog ilike '1149-upgrade-dbt-to-0-13_analytics'
2019-04-12 14:27:38,327 (MainThread): Snowflake error: 001003 (42000): 018b7a23-01d4-643a-0000-289d04fea226: SQL compilation error:
syntax error line 9 at position 9 unexpected '1149'.
2019-04-12 14:27:38,327 (MainThread): On snowplow_web_events_time: ROLLBACK
2019-04-12 14:27:38,489 (MainThread): Flushing usage events
2019-04-12 14:27:38,489 (MainThread): Encountered an error:
2019-04-12 14:27:38,489 (MainThread): Database Error
  001003 (42000): 018b7a23-01d4-643a-0000-289d04fea226: SQL compilation error:
  syntax error line 9 at position 9 unexpected '1149'.
2019-04-12 14:27:38,497 (MainThread): Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/dbt/adapters/snowflake/connections.py", line 77, in exception_handler
    yield
  File "/usr/local/lib/python3.7/site-packages/dbt/adapters/sql/connections.py", line 61, in add_query
    cursor.execute(sql, bindings)
  File "/usr/local/lib/python3.7/site-packages/snowflake/connector/cursor.py", line 550, in execute
    errvalue)
  File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 97, in errorhandler_wrapper
    cursor.errorhandler(connection, cursor, errorclass, errorvalue)
  File "/usr/local/lib/python3.7/site-packages/snowflake/connector/errors.py", line 73, in default_errorhandler
    done_format_msg=errorvalue.get(u'done_format_msg'))
snowflake.connector.errors.ProgrammingError: 001003 (42000): 018b7a23-01d4-643a-0000-289d04fea226: SQL compilation error:
syntax error line 9 at position 9 unexpected '1149'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/dbt/main.py", line 79, in main
    results, succeeded = handle_and_check(args)
  File "/usr/local/lib/python3.7/site-packages/dbt/main.py", line 153, in handle_and_check
    task, res = run_from_args(parsed)
  File "/usr/local/lib/python3.7/site-packages/dbt/main.py", line 209, in run_from_args
    results = run_from_task(task, cfg, parsed)
  File "/usr/local/lib/python3.7/site-packages/dbt/main.py", line 217, in run_from_task
    result = task.run()
  File "/usr/local/lib/python3.7/site-packages/dbt/task/runnable.py", line 242, in run
    self._runtime_initialize()
  File "/usr/local/lib/python3.7/site-packages/dbt/task/runnable.py", line 50, in _runtime_initialize
    self.manifest = load_manifest(self.config)
  File "/usr/local/lib/python3.7/site-packages/dbt/task/runnable.py", line 29, in load_manifest
    internal_manifest=internal_manifest)
  File "/usr/local/lib/python3.7/site-packages/dbt/loader.py", line 173, in load_all
    internal_manifest)
  File "/usr/local/lib/python3.7/site-packages/dbt/loader.py", line 166, in _load_from_projects
    loader.load(internal_manifest=internal_manifest)
  File "/usr/local/lib/python3.7/site-packages/dbt/loader.py", line 139, in load
    self._load_nodes()
  File "/usr/local/lib/python3.7/site-packages/dbt/loader.py", line 78, in _load_nodes
    self._load_sql_nodes(ModelParser, NodeType.Model, 'source_paths')
  File "/usr/local/lib/python3.7/site-packages/dbt/loader.py", line 43, in _load_sql_nodes
    **kwargs
  File "/usr/local/lib/python3.7/site-packages/dbt/parser/base_sql.py", line 63, in load_and_parse
    return self.parse_sql_nodes(result, tags)
  File "/usr/local/lib/python3.7/site-packages/dbt/parser/base_sql.py", line 82, in parse_sql_nodes
    node_parsed = self.parse_node(node, node_path, project, tags=tags)
  File "/usr/local/lib/python3.7/site-packages/dbt/parser/base.py", line 224, in parse_node
    self._render_with_context(parsed_node, config)
  File "/usr/local/lib/python3.7/site-packages/dbt/parser/base.py", line 160, in _render_with_context
    capture_macros=True)
  File "/usr/local/lib/python3.7/site-packages/dbt/clients/jinja.py", line 279, in get_rendered
    return render_template(template, ctx, node)
  File "/usr/local/lib/python3.7/site-packages/dbt/clients/jinja.py", line 266, in render_template
    return template.render(ctx)
  File "/usr/local/lib/python3.7/site-packages/jinja2/asyncsupport.py", line 76, in render
    return original_render(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/jinja2/environment.py", line 1008, in render
    return self.environment.handle_exception(exc_info, True)
  File "/usr/local/lib/python3.7/site-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.7/site-packages/jinja2/_compat.py", line 37, in reraise
    raise value.with_traceback(tb)
  File "dbt-e4fc89a406e64fd8502658b5", line 1, in top-level template code
    from __future__ import division, generator_stop
  File "/usr/local/lib/python3.7/site-packages/jinja2/sandbox.py", line 438, in call
    return __context.call(__obj, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dbt/clients/jinja.py", line 104, in call
    return macro(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/jinja2/runtime.py", line 579, in _invoke
    rv = self._func(*arguments)
  File "dbt-2b309f9e7ea9717e61106a77", line 3, in template
    name = None
  File "/usr/local/lib/python3.7/site-packages/jinja2/sandbox.py", line 438, in call
    return __context.call(__obj, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dbt/clients/jinja.py", line 104, in call
    return macro(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/jinja2/runtime.py", line 579, in _invoke
    rv = self._func(*arguments)
  File "dbt-1326913a3a7b62469d9260c4", line 27, in template
    t_3.append(
  File "/usr/local/lib/python3.7/site-packages/jinja2/sandbox.py", line 438, in call
    return __context.call(__obj, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dbt/clients/jinja.py", line 104, in call
    return macro(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/jinja2/runtime.py", line 579, in _invoke
    rv = self._func(*arguments)
  File "dbt-2b309f9e7ea9717e61106a77", line 21, in template
    context.exported_vars.add('dbt_macro__snowplow_web_events_time')
  File "/usr/local/lib/python3.7/site-packages/jinja2/sandbox.py", line 438, in call
    return __context.call(__obj, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dbt/context/common.py", line 67, in wrapped
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dbt/adapters/base/meta.py", line 45, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dbt/adapters/base/impl.py", line 632, in already_exists
    model_name=model_name)
  File "/usr/local/lib/python3.7/site-packages/dbt/adapters/base/impl.py", line 607, in get_relation
    relations_list = self.list_relations(database, schema, model_name)
  File "/usr/local/lib/python3.7/site-packages/dbt/adapters/base/impl.py", line 569, in list_relations
    information_schema, schema, model_name=model_name
  File "/usr/local/lib/python3.7/site-packages/dbt/adapters/sql/impl.py", line 201, in list_relations_without_caching
    release=True
  File "/usr/local/lib/python3.7/site-packages/dbt/adapters/base/impl.py", line 856, in execute_macro
    result = macro_function(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/dbt/clients/jinja.py", line 104, in call
    return macro(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/jinja2/runtime.py", line 579, in _invoke
    rv = self._func(*arguments)
  File "dbt-1326913a3a7b62469d9260c4", line 218, in template
    l_1_sql = undefined("parameter 'sql' was not provided", name='sql')
  File "/usr/local/lib/python3.7/site-packages/jinja2/sandbox.py", line 438, in call
    return __context.call(__obj, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dbt/clients/jinja.py", line 104, in call
    return macro(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/jinja2/runtime.py", line 579, in _invoke
    rv = self._func(*arguments)
  File "dbt-1326913a3a7b62469d9260c4", line 25, in template
    pass
  File "/usr/local/lib/python3.7/site-packages/jinja2/sandbox.py", line 438, in call
    return __context.call(__obj, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dbt/clients/jinja.py", line 104, in call
    return macro(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/jinja2/runtime.py", line 579, in _invoke
    rv = self._func(*arguments)
  File "dbt-ca70e94c428665d55249974a", line 54, in template
    context.vars['dbt_macro__snowflake__create_table_as'] = l_0_dbt_macro__snowflake__create_table_as = Macro(environment, macro, 'dbt_macro__snowflake__create_table_as', ('temporary', 'relation', 'sql'), False, False, False, context.eval_ctx.autoescape)
  File "/usr/local/lib/python3.7/site-packages/jinja2/sandbox.py", line 438, in call
    return __context.call(__obj, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dbt/clients/jinja.py", line 104, in call
    return macro(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/jinja2/runtime.py", line 579, in _invoke
    rv = self._func(*arguments)
  File "dbt-6d2e15fbbd8937005642a0f8", line 10, in template
    t_1 = environment.tests['none']
  File "/usr/local/lib/python3.7/site-packages/jinja2/sandbox.py", line 438, in call
    return __context.call(__obj, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dbt/context/common.py", line 67, in wrapped
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dbt/adapters/base/impl.py", line 229, in execute
    fetch=fetch
  File "/usr/local/lib/python3.7/site-packages/dbt/adapters/sql/connections.py", line 95, in execute
    _, cursor = self.add_query(sql, name, auto_begin)
  File "/usr/local/lib/python3.7/site-packages/dbt/adapters/snowflake/connections.py", line 224, in add_query
    abridge_sql_log=abridge_sql_log
  File "/usr/local/lib/python3.7/site-packages/dbt/adapters/sql/connections.py", line 66, in add_query
    return connection, cursor
  File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.7/site-packages/dbt/adapters/snowflake/connections.py", line 95, in exception_handler
    raise dbt.exceptions.DatabaseException(msg)
dbt.exceptions.DatabaseException: Database Error
  001003 (42000): 018b7a23-01d4-643a-0000-289d04fea226: SQL compilation error:
  syntax error line 9 at position 9 unexpected '1149'.

Results

select
      table_catalog as database,
      table_name as name,
      table_schema as schema,
      case when table_type = 'BASE TABLE' then 'table'
           when table_type = 'VIEW' then 'view'
           else table_type
      end as table_type
    from 1149-UPGRADE-DBT-TO-0-13_ANALYTICS.information_schema.tables
    where table_schema ilike 'ANALYTICS'
      and table_catalog ilike '1149-upgrade-dbt-to-0-13_analytics'

is failing b/c the DB name is not being quoted at all. We're testing this in this MR https://gitlab.com/gitlab-data/analytics/merge_requests/881/diffs where you can see we've set

quoting:
  database: true
  identifier: false
  schema: false

System information

The output of dbt --version:

± dbt --version
installed version: 0.13.0
   latest version: 0.13.0

Up to date!

The operating system you're running on: Mac OSX 10.14

The python version you're using: I see this on 3.5.2 and 3.7

Steps to reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example models, etc are all very helpful here.

I'm able to reproduce this locally by setting my profile to:

gitlab-snowflake:
  target: dev
  outputs:
    dev:
      type: snowflake
      threads: 16
      account: gitlab
      user: tmurphy
      password: 
      role: TRANSFORMER
      database: 1149-UPGRADE-DBT-TO-0-13_ANALYTICS
      warehouse: ENGINEER_XS
      schema: tmurphy_scratch

The DB name is set based on the branch name. We know it's not valid for Snowflake but when it's quoted it's fine and this wasn't a problem in 0.12

Convo from Slack:

Is it possible that the information schema isn’t getting quoted as configured? The error I found seems linked to https://github.com/fishtown-analytics/dbt/blob/027a0d2ee6d138b5c7ae05a0b20e511e6bfdf516/core/dbt/adapters/sql/impl.py#L177 which only has a quote policy for schema and identifier.

my understanding of the code flow is that information_schema comes with the quoting policy but it appears to be overwritten? https://github.com/fishtown-analytics/dbt/blob/027a0d2ee6d138b5c7ae05a0b20e511e6bfdf516/core/dbt/adapters/base/relation.py#L139 I fully acknowledge I could be talking through my hat here lol

drew.banin
Haha, yeah, this code is super complicated 😕

The big challenge is that this logic needs to account for:

  • warehouses where cross-database information schema queries are supported (like Snowflake)
  • warehouses where they are not supported (like Redshift)
  • snowflake’s uppercase-by-default identifiers

In list_relations_without_caching, the query that’s failing for you is actually executed via the call to execute_macro on L179. We then apply a quote policy to the resulting relations that are returned by the query, but I think that would happen after the information_schema query succeeds.

In the information_schema method, I think the relevant line of code is probably L147. self.quote_policy['database'] should be sourced from the adapter default (unquoted) and then overwritten with a user-supplied quote policy if provided. If there’s a bug in this code, it would probably be there!

(all line numbers based on commit 027a0d2ee6d138b5c7ae05a0b20e511e6bfdf516) (edited)

So, the very quick fix is to make your database name a valid snowflake identifier (ie. use underscores instead of dashes, prefix it with a letter instead of a number).

@drewbanin
Copy link
Contributor

Thanks for the detailed report @tayloramurphy! I will check it out

@drewbanin drewbanin added this to the 0.13.1 - bugfix release milestone Apr 15, 2019
@drewbanin
Copy link
Contributor

Hey @tayloramurphy - I was able to spend some more time trying to reproduce this. It was a tricky one to diagnose!

This is specifically an issue with the call to adapter.already_exists() in the snowplow_web_events_time model. I think the quick fix is to replace calls to already_exists() with is_incremental(). We made that change in the 0.7.x releases of the Snowplow package, though there may be other places in your codebase where this change is required.

Interestingly, is_incremental() calls get_relation(), which is also called already_exists(). I think the operative difference is that is_incremental no-ops during parsing, whereas already_exists() will actually execute the query. My assumption is that dbt doesn't configure database quoting appropriately during parsing, which would probably be in the class of "bug".

Still going to need to spend some more time digging in here to come up with a resolution, but wanted to hit you back with a workaround in the meantime!

@drewbanin drewbanin added the bug Something isn't working label Apr 16, 2019
@tayloramurphy
Copy link
Author

tayloramurphy commented Apr 16, 2019

@drewbanin This worked for the snowplow package! But now I'm seeing similar behavior elsewhere.

Running dbt -d seed --profiles-dir profile --target prod on the same MR (quoting set to true, still on 0.13) fails when truncating the table.

2019-04-16 14:56:55,753 (Thread-2): 14:56:55 | 2 of 2 START seed file ANALYTICS_staging.country_codes............... [RUN]
2019-04-16 14:56:55,889 (Thread-2): Acquiring new snowflake connection "country_codes".
2019-04-16 14:56:55,892 (Thread-2): Opening a new connection, currently in state init
2019-04-16 14:56:55,892 (Thread-1): Acquiring new snowflake connection "ping_metrics_to_stage_mapping_data".
2019-04-16 14:56:55,898 (Thread-1): Opening a new connection, currently in state init
2019-04-16 14:56:56,292 (Thread-2): Using snowflake connection "country_codes".
2019-04-16 14:56:56,293 (Thread-2): On country_codes: BEGIN
2019-04-16 14:56:56,388 (Thread-1): Using snowflake connection "ping_metrics_to_stage_mapping_data".
2019-04-16 14:56:56,388 (Thread-1): On ping_metrics_to_stage_mapping_data: BEGIN
2019-04-16 14:56:56,411 (Thread-2): SQL status: SUCCESS 1 in 0.12 seconds
2019-04-16 14:56:56,411 (Thread-2): Using snowflake connection "country_codes".
2019-04-16 14:56:56,412 (Thread-2): On country_codes: truncate table 1149-UPGRADE-DBT-TO-0-13_ANALYTICS."ANALYTICS_STAGING"."COUNTRY_CODES"
2019-04-16 14:56:56,493 (Thread-1): SQL status: SUCCESS 1 in 0.11 seconds
2019-04-16 14:56:56,494 (Thread-1): Using snowflake connection "ping_metrics_to_stage_mapping_data".
2019-04-16 14:56:56,494 (Thread-1): On ping_metrics_to_stage_mapping_data: truncate table 1149-UPGRADE-DBT-TO-0-13_ANALYTICS."ANALYTICS_STAGING"."PING_METRICS_TO_STAGE_MAPPING_DATA"
2019-04-16 14:56:56,559 (Thread-2): Snowflake error: 001003 (42000): 018b90c0-016c-b984-0000-289d05134b1e: SQL compilation error:
syntax error line 1 at position 15 unexpected '1149'.
2019-04-16 14:56:56,559 (Thread-2): On country_codes: ROLLBACK
2019-04-16 14:56:56,685 (Thread-1): Snowflake error: 001003 (42000): 018b90c0-0114-0a1e-0000-289d05133d76: SQL compilation error:
syntax error line 1 at position 15 unexpected '1149'.
2019-04-16 14:56:56,686 (Thread-1): On ping_metrics_to_stage_mapping_data: ROLLBACK
2019-04-16 14:56:56,737 (Thread-2): 14:56:56 | 2 of 2 ERROR loading seed file ANALYTICS_staging.country_codes....... [ERROR in 0.97s]
2019-04-16 14:56:56,877 (Thread-1): 14:56:56 | 1 of 2 ERROR loading seed file ANALYTICS_staging.ping_metrics_to_stage_mapping_data [ERROR in 1.12s]

Same sort of thing where the DB is not getting quoted. Let me know if you need more context / details!

@beckjake
Copy link
Contributor

Oh wow, this is an embarrassing one. I'm going to have to make a test for it, but the fix is real obvious - this should be quoting['database']

@beckjake beckjake self-assigned this Apr 16, 2019
beckjake added a commit that referenced this issue Apr 30, 2019
@beckjake
Copy link
Contributor

Fixed in #1402

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants