Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command line vars are parsed inconsistently in the dbt server #2265

Closed
drewbanin opened this issue Mar 30, 2020 · 2 comments · Fixed by #2363
Closed

Command line vars are parsed inconsistently in the dbt server #2265

drewbanin opened this issue Mar 30, 2020 · 2 comments · Fixed by #2363
Labels
bug Something isn't working partial_parsing rpc Issues related to dbt's RPC server vars

Comments

@drewbanin
Copy link
Contributor

From @jtcohen6:

Bug description

The dbt server handles command line --var arguments differently than dbt run invocations when partial parsing is enabled. It appears that at parse-time, the dbt server is using the variable values declared in dbt_project.yml, whereas at runtime, it uses the vars provided with --vars in the rpc method.

Results

I would expect values passed to dbt run --vars to be parsed, first and foremost, and override any default values, before any metadata queries are run against the database. That's true in dbt CLI and not true when using the dbt server.

Steps to reproduce

Given a dbt_project.yml that includes

models:
  vars:
    environment_name: dev

and a model:

{{ config(database = var('environment_name')) }}

select 1 as fun

Run

dbt run -m my_test_model --vars 'environment_name: qa'

dbt CLI:

2020-03-27 21:33:04.743677 (MainThread): Found 6 models, 5 tests, 0 snapshots, 0 analyses, 131 macros, 1 operation, 0 seed files, 0 sources
2020-03-27 21:33:04.750941 (MainThread): 
2020-03-27 21:33:04.751394 (MainThread): Acquiring new snowflake connection "master".
2020-03-27 21:33:04.751511 (MainThread): Opening a new connection, currently in state init
2020-03-27 21:33:04.759351 (ThreadPoolExecutor-0_0): Acquiring new snowflake connection "list_analytics_qa".
2020-03-27 21:33:04.759620 (ThreadPoolExecutor-0_0): Opening a new connection, currently in state init
2020-03-27 21:33:04.884238 (ThreadPoolExecutor-0_0): Using snowflake connection "list_analytics_qa".
2020-03-27 21:33:04.884403 (ThreadPoolExecutor-0_0): On list_analytics_qa: /* {"app": "dbt", "dbt_version": "0.16.0", "profile_name": "jaffle_shop", "target_name": "dev", "connection_name": "list_analytics_qa"} */

    show terse schemas in database analytics_qa
    limit 10000

dbt Server:

2020-03-27 21:41:30.004433Z: Found 6 models, 5 tests, 0 snapshots, 0 analyses, 131 macros, 1 operation, 0 seed files, 0 sources
2020-03-27 21:41:30.010170Z: Acquiring new snowflake connection "master".
2020-03-27 21:41:30.010319Z: Opening a new connection, currently in state init
2020-03-27 21:41:30.018723Z: Acquiring new snowflake connection "list_analytics_dev".
2020-03-27 21:41:30.018850Z: Opening a new connection, currently in state init
2020-03-27 21:41:30.107756Z: Using snowflake connection "list_analytics_dev".
2020-03-27 21:41:30.107908Z: On list_analytics_dev: /* {"app": "dbt", "dbt_version": "0.16.0", "profile_name": "user", "target_name": "dev", "connection_name": "list_analytics_dev"} */

    show terse schemas in database analytics_dev
    limit 10000

In both cases I'm running dbt==0.16.0


I would expect some weirdness around the --vars flag when partial parsing is enabled. By definition, we're using the previously-parsed variables to skip re-parsing every model with our updated variables, so in some ways, this is working as intended.

I believe we should be storing a hash of CLI --vars and using that hash to determine if the partial parse cache is valid for the run or not. In this case, I suspect that something funny is happening w/r/t the dbt server's cli_args method.

@drewbanin drewbanin added bug Something isn't working rpc Issues related to dbt's RPC server labels Mar 30, 2020
@drewbanin drewbanin added this to the Octavius Catto milestone Mar 30, 2020
@drewbanin
Copy link
Contributor Author

@beckjake to advise on what kinds of tradeoffs we'd need to make to get something like this working more consistently.

@beckjake
Copy link
Contributor

I think the trick here is that it's not partial parsing that's involved (that does correctly invalidate the cache!) Instead it's the "API calls inherit the manifest from the server" behavior.

To fix this, I'll add a check and if vars changed, the task will re-generate the manifest at call-time. This will result in cli_args calls having to rebuild the manifest each time, which could be rough on performance.

In the future, we might want to make a cache that stores one manifest per --vars the server has seen, or something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working partial_parsing rpc Issues related to dbt's RPC server vars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants