Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2670] [Bug] Undocumented contract behavior/feature causing unexpected results #7824

Closed
2 tasks done
ttusing opened this issue Jun 7, 2023 · 7 comments
Closed
2 tasks done
Assignees
Labels
bug Something isn't working model_contracts model_versions Refinement Maintainer input needed stale Issues that have gone stale user docs [docs.getdbt.com] Needs better documentation

Comments

@ttusing
Copy link
Contributor

ttusing commented Jun 7, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I am submitting this as a bug and looking for advice on where to follow up.

There seems to be an undocumented feature when using contracts. When deferring to a previous run, changing contracted datatypes or disabling a contract throws an error and requires the user to use model versioning.

In my workflow, we have DBT cloud running CI and needed to change a datatype. We got the following error unexpectedly:

Breaking Change to Contract Error in model my_model
(path/to/my_model.sql)
  While comparing to previous project state, dbt detected a breaking change to an enforced contract.
  
  Columns with data_type changes: 
   - my_column (number -> number(18,4))
  
  Consider making an additive (non-breaking) change instead, if possible.
  Otherwise, create a new model version: https://docs.getdbt.com/docs/collaborate/govern/model-versions

This feature is not described anywhere. In order to address this, we turned off deferring to previous runs and are disabling our implemented contracts until behavior better matches documentation and we can decide if the feature is appropriate for us.

We don't want to use model versioning and ideally this specific behavior of changing contracts throwing errors can be configured to be turned off in project config. I am unsure what a good workflow would be for using contracts that need to change occasionally in a traditional DBT cloud CI setup.

I also believe more documentation could include guidance on adding precision and scale for Snowflake datatypes.

Expected Behavior

Enforcement steps of contracts matches the documentation page. Perhaps adding to the section "DBT will do these things differently".

https://docs.getdbt.com/docs/collaborate/govern/model-contracts

Steps To Reproduce

  1. Make a contracted model
  2. Run your project
  3. Change your model and defer to the manifest from your previous run

Relevant log output

No response

Environment

- OS: DBT Cloud
- dbt: `1.5.1`

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

@ttusing ttusing added bug Something isn't working triage labels Jun 7, 2023
@github-actions github-actions bot changed the title [Bug] Undocumented contract behavior/feature causing unexpected results [CT-2670] [Bug] Undocumented contract behavior/feature causing unexpected results Jun 7, 2023
@MichelleArk MichelleArk self-assigned this Jun 8, 2023
@MichelleArk
Copy link
Contributor

MichelleArk commented Jun 8, 2023

Thank you for opening this thorough issue @ttusing!

Agreed, the documentation under Model Contracts is lacking any mention of the breaking change detection functionality. We have documented in the reference documentation for the contract config but should highlight it where we explain the model contract conceptually since it's a big piece of the overall motivation. I've opened an issue for that here: dbt-labs/docs.getdbt.com#3496


We don't want to use model versioning and ideally this specific behavior of changing contracts throwing errors can be configured to be turned off in project config. I am unsure what a good workflow would be for using contracts that need to change occasionally in a traditional DBT cloud CI setup.

Possible Workaround(s)
If the breaking contract changes are being made intentionally, an (unsafe!) option is to merge anyways on failing CI -- future CI runs will pass because the baseline state will include the breaking changes. But if that CI run would have failed on something unrelated to contracts after the breaking change check, it would streamroll over a scenario that should have failed.

With what's currently available in dbt, a safer workaround would be to:

  1. Make a breaking change and observe Breaking Change to Contract Error in CI
  2. Determine that the breaking change is intentional and model versioning isn't necessary
  3. Turn off contract enforcement for the model with breaking changes (by setting contract: {enforced: false} at the model-level) and running CI -- this way any failures unrelated to the breaking changes will be detected during CI
  4. Turn contract enforcement back on, and run the model locally (without deferral) to ensure the new contract is validated
  5. With contract enforcement still re-enabled, run CI and merge the changes despite the Breaking Change to Contract Error failure in CI

Is there a better way?
This is pretty cumbersome, but I imagine sometimes necessary when versioning is to heavy-handed a solution (perhaps when working within a single project, or if it's known that there are no downstream teams relying on the changed pieces of the model).

To streamline this workflow, we could consider a new configuration on the contract config (something like: allow_breaking_changes: true) that bypasses the breaking change error. It could either be a part of the config (and checked into version control), or a CLI flag that does the same thing for the scope of an invocation. I'd lean towards a configuration because having this setting versioned controlled would be helpful to understand at which point breaking changes were made and why.

In general I do think something like allow_breaking_changes kind of defeats the purpose of contracts from the perspective of downstream consumers in other projects -- they can no longer trust that building on top of a contracted model is having a guaranteed and stable interface to build on top of. I'm going to leave this in refinement to work out the implications of this kind of change on the broader governance ideas introduced in 1.5. It should be possible to use contract enforcement independently of versioning, or perhaps offer a lightweight way (v0?) to indicate a model is unstable despite being contracted.


I also believe more documentation could include guidance on adding precision and scale for Snowflake datatypes.

Precision and scale aren't part of the contract -- for data types that have precision (and scale), just the high-level data type is part of the contract. There is some documentation on that in the reference docs:

That said, when dbt is comparing data types, it will not compare granular details such as size, precision, or scale. We don't think you should sweat the difference between varchar(256) and varchar(257), because it doesn't really affect the experience of downstream queriers. If you need a more-precise assertion, it's always possible to accomplish by writing or using a custom test.

@MichelleArk MichelleArk added Refinement Maintainer input needed and removed triage labels Jun 8, 2023
@ttusing
Copy link
Contributor Author

ttusing commented Jun 8, 2023

Thanks! This is really helpful.

One feature idea to help this workflow that might be more in-line and I think I've seen discussed elsewhere/might already be true: when using model versions, have the latest model version materialize in the database as the usual model name instead of having a model version suffix. In this case, we could increment the model version with changes.

With regards to precision/scale, when using numeric, Snowflake defaults to 38,0, meaning that I think it is impossible to use decimals with contracts and Snowflake. I haven't fully dug into this, could be wrong about some part of this.

For some product feedback reference, we are looking at using contracts actually for our staging models. The primary motivation is that we'd like to ensure that input into our project has stable datatypes and enforce that the current columns in model YML match the SQL/database.

Think we are going to pause on implementing this since it is taking a bit more effort than expected/budgeted. I am definitely going to keep an eye on this and contribute as I have time ✨ . Contracts are a very powerful feature.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 9, 2023

Great conversation so far :)

In general I do think something like allow_breaking_changes kind of defeats the purpose of contracts from the perspective of downstream consumers in other projects

@MichelleArk I agree!


@ttusing In your original post, you said:

We don't want to use model versioning

Given your most recent reply, is it fair to say that you'd be more willing to use model versioning if we made it easier (automatic) to always have the latest version deployed into the unversioned/unsuffixed namespace? (That's this issue: #7442. There is a documented workaround in the meantime that achieves the same behavior using a custom macro + post-hook.)


I think it is impossible to use decimals with contracts and Snowflake

I take your point to be: If I specify integer below (instead of numeric), this will still work, and won't raise a contract error.

models:
  - name: my_model
    config:
      contract:
        enforced: true
    columns:
      - name: my_decimal_column
        data_type: integer
select 1.1 as my_decimal_column

After quickly looking into this: Snowflake's cursor reports back both data types as FIXED, with only the precision and scale differing. Since we don't currently enforce differences in precision/scale, it's not caught as a difference. (I dropped my breakpoint here.)

ipdb> sql
'select * from (\n        select 1.1 as my_decimal_column\n    ) as __dbt_sbq\n    where false\n    limit 0\n'
ipdb> cursor.description
[ResultMetadata(name='MY_DECIMAL_COLUMN', type_code=0, display_size=None, internal_size=None, precision=2, scale=1, is_nullable=False)]
ipdb> columns
[SnowflakeColumn(column='MY_DECIMAL_COLUMN', dtype='FIXED', char_size=None, numeric_precision=None, numeric_scale=None)]
...
ipdb> sql
'select * from (\n        select\n    cast(null as integer) as my_decimal_column\n    ) as __dbt_sbq\n    where false\n    limit 0\n'
ipdb> cursor.description
[ResultMetadata(name='MY_DECIMAL_COLUMN', type_code=0, display_size=None, internal_size=None, precision=38, scale=0, is_nullable=True)]
ipdb> columns
[SnowflakeColumn(column='MY_DECIMAL_COLUMN', dtype='FIXED', char_size=None, numeric_precision=None, numeric_scale=None)]

Is there some way for us to reliably detect that difference, without splitting other hairs? scale=0 versus scale>0?

By contrast, other data platforms give us back genuinely different data type codes/names — e.g. DECIMAL versus INT on Postgres.

@ttusing
Copy link
Contributor Author

ttusing commented Jun 11, 2023

@ttusing In your original post, you said:

We don't want to use model versioning

Given your most recent reply, is it fair to say that you'd be more willing to use model versioning if we made it easier (automatic) to always have the latest version deployed into the unversioned/unsuffixed namespace? (That's this issue: #7442. There is a documented workaround in the meantime that achieves the same behavior using a custom macro + post-hook.)

Yeah, we want to keep table names stable. We also have tooling that assumes that DBT model names match Snowflake table names.

I think it is impossible to use decimals with contracts and Snowflake

I take your point to be: If I specify integer below (instead of numeric), this will still work, and won't raise a contract error.
...

I looked into this a little more. I think it would be useful to update documentation to explain this behavior a bit more.

When specifying the Snowflake datatype for numerics (and I suppose varchars), you can specify this in the contract using the expected database syntax, i.e. data_type: number(18,4). This wasn't totally obvious to me that this would work, or that not doing it would cast my decimal to an integer of type number(32,0). I am also not entirely sure that this syntax will not break some other part of contracts in DBT.

tldr: is specifying precision and scale in this datatype property like this a supported feature? If so, I think updating documentation on how to use it and the consequences of not specifying it would be helpful to users - it was not entirely obvious to me that DBT would cast my numeric to the default Snowflake numeric precision and scale when I specified that the field was numeric in the contract.

@joellabes
Copy link
Contributor

I opened #8028 without realising that this one already existed. Worth noting that there's an easier workaround from jerco here: #8028 (comment)

In short, you can bump the version but not use any other of the versioning constructs. To quote the issue:

That's effectively what you're doing anyway, by updating the contract with a breaking change, and saying that there won't be any old version sticking around for a migration window.

Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Jan 14, 2024
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working model_contracts model_versions Refinement Maintainer input needed stale Issues that have gone stale user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

No branches or pull requests

4 participants