-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a new column is not a breaking contract change #7333
Changes from 4 commits
afa4757
897ab08
80c2857
f0a30ce
e9e58ee
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
kind: Fixes | ||
body: Adding a new column is not a breaking contract change | ||
time: 2023-04-12T13:34:38.231881+02:00 | ||
custom: | ||
Author: jtcohen6 | ||
Issue: "7332" |
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -39,7 +39,7 @@ | |||||||||||
) | ||||||||||||
from dbt.contracts.util import Replaceable, AdditionalPropertiesMixin | ||||||||||||
from dbt.events.functions import warn_or_error | ||||||||||||
from dbt.exceptions import ParsingError, InvalidAccessTypeError, ModelContractError | ||||||||||||
from dbt.exceptions import ParsingError, InvalidAccessTypeError, ContractBreakingChangeError | ||||||||||||
from dbt.events.types import ( | ||||||||||||
SeedIncreased, | ||||||||||||
SeedExceedsLimitSamePath, | ||||||||||||
|
@@ -539,29 +539,61 @@ def build_contract_checksum(self): | |||||||||||
self.contract.checksum = hashlib.new("sha256", data).hexdigest() | ||||||||||||
|
||||||||||||
def same_contract(self, old) -> bool: | ||||||||||||
# If the contract wasn't previously enforced | ||||||||||||
if old.contract.enforced is False and self.contract.enforced is False: | ||||||||||||
# Not a change | ||||||||||||
return True | ||||||||||||
if old.contract.enforced is False and self.contract.enforced is True: | ||||||||||||
# A change, but not a breaking change | ||||||||||||
return False | ||||||||||||
|
||||||||||||
breaking_change_reasons = [] | ||||||||||||
# Otherwise: the contract was previously enforced | ||||||||||||
self.build_contract_checksum() | ||||||||||||
|
||||||||||||
# If the checksums match up, the contract has not changed, so same_contract: True | ||||||||||||
if self.contract.checksum == old.contract.checksum: | ||||||||||||
return True | ||||||||||||
|
||||||||||||
# The checksums don't match up, so there has been a change. | ||||||||||||
# We need to determine if it's a **breaking** change. | ||||||||||||
# These are the categories of breaking changes: | ||||||||||||
contract_enforced_disabled: bool = False | ||||||||||||
columns_removed: List[str] = [] | ||||||||||||
column_type_changes: List[Tuple[str, str, str]] = [] | ||||||||||||
|
||||||||||||
if old.contract.enforced is True and self.contract.enforced is False: | ||||||||||||
# Breaking change: throw an error | ||||||||||||
# Note: we don't have contract.checksum for current node, so build | ||||||||||||
# Breaking change: the contract was previously enforced, and it no longer is | ||||||||||||
# Note: we don't have contract.checksum for current node, so build it now | ||||||||||||
self.build_contract_checksum() | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. have we not already done this up here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah, good catch - I'd added it up there, but it actually isn't needed (?) - we only need it in the case that dbt-core/core/dbt/contracts/graph/nodes.py Lines 571 to 575 in 6b42a71
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm pretty sure we can remove both calls to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, that's a change. Initially I only had it building when contract.enforced is True, and it's been changed so that it always builds. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually no, it's "build_contract_checksum" that checks for contract enforced. So it is a no-op, but it probably shouldn't be, because Michelle wanted to also capture that there were additional changes to the columns. So we should remove the check for contract.enforced in build_contract_checksum and only call it in parse_patch if contract.enforced is True, and THEN it won't be a no-op. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Boy this logic makes your head hurt. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gshank Right 😅 We want the full column spec to be captured in the checksum, so that additions would cause a checksum mismatch — but if the checksums don't match, we then need to dig in and understand whether a column was actually removed/changed, or just added. I'm happy with the current implementation for now, and it seems to satisfy the test cases we've added (reflecting in-the-wild use cases we'd expect) — but we can open up another ticket to revisit this logic. |
||||||||||||
breaking_change_reasons.append("contract has been disabled") | ||||||||||||
|
||||||||||||
if self.contract.checksum != old.contract.checksum: | ||||||||||||
# Breaking change, throw error | ||||||||||||
breaking_change_reasons.append("column definitions have changed") | ||||||||||||
contract_enforced_disabled = True | ||||||||||||
|
||||||||||||
for key, value in sorted(old.columns.items()): | ||||||||||||
# Has this column been removed? | ||||||||||||
if key not in self.columns.keys(): | ||||||||||||
columns_removed.append(value.name) | ||||||||||||
# Has this column's data type changed? | ||||||||||||
elif value.data_type != self.columns[key].data_type: | ||||||||||||
column_type_changes.append( | ||||||||||||
(str(value.name), str(value.data_type), str(self.columns[key].data_type)) | ||||||||||||
) | ||||||||||||
# Otherwise, this was an additive change -- not breaking | ||||||||||||
else: | ||||||||||||
continue | ||||||||||||
|
||||||||||||
# Did we find any changes that we consider breaking? If so, throw an error | ||||||||||||
if contract_enforced_disabled or columns_removed or column_type_changes: | ||||||||||||
raise ( | ||||||||||||
ContractBreakingChangeError( | ||||||||||||
contract_enforced_disabled=contract_enforced_disabled, | ||||||||||||
columns_removed=columns_removed, | ||||||||||||
column_type_changes=column_type_changes, | ||||||||||||
node=self, | ||||||||||||
) | ||||||||||||
) | ||||||||||||
|
||||||||||||
if breaking_change_reasons: | ||||||||||||
raise (ModelContractError(reasons=" and ".join(breaking_change_reasons), node=self)) | ||||||||||||
# Otherwise, the contract has still changed, so same_contract: False | ||||||||||||
else: | ||||||||||||
# no breaking changes | ||||||||||||
return True | ||||||||||||
return False | ||||||||||||
|
||||||||||||
|
||||||||||||
# ==================================== | ||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -207,22 +207,44 @@ def _fix_dupe_msg(self, path_1: str, path_2: str, name: str, type_name: str) -> | |
) | ||
|
||
|
||
class ModelContractError(DbtRuntimeError): | ||
class ContractBreakingChangeError(DbtRuntimeError): | ||
CODE = 10016 | ||
MESSAGE = "Contract Error" | ||
MESSAGE = "Breaking Change to Contract" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I changed this, but I don't actually know where this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it appears in a generic message that packages up compilation errors, but I'm not sure it does anywhere else. Let me look for it... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A few exceptions use it to construct the actual "message", but mostly it's not. Another thing we might want to clean up at some point. So I don't think it matters here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Other message use a "type" property to construct a "{type} Error", but mostly DbtInternalError |
||
|
||
def __init__(self, reasons, node=None): | ||
self.reasons = reasons | ||
def __init__( | ||
self, contract_enforced_disabled, columns_removed, column_type_changes, node=None | ||
): | ||
self.contract_enforced_disabled = contract_enforced_disabled | ||
self.columns_removed = columns_removed | ||
self.column_type_changes = column_type_changes | ||
super().__init__(self.message(), node) | ||
|
||
@property | ||
def type(self): | ||
return "Contract" | ||
return "Breaking Change to Contract" | ||
|
||
def message(self): | ||
breaking_changes = [] | ||
if self.contract_enforced_disabled: | ||
breaking_changes.append("The contract's enforcement has been disabled.") | ||
if self.columns_removed: | ||
columns_removed_str = "\n - ".join(self.columns_removed) | ||
breaking_changes.append(f"Columns were removed: \n - {columns_removed_str}") | ||
if self.column_type_changes: | ||
column_type_changes_str = "\n - ".join( | ||
[f"{c[0]} ({c[1]} -> {c[2]})" for c in self.column_type_changes] | ||
) | ||
breaking_changes.append( | ||
f"Columns with data_type changes: \n - {column_type_changes_str}" | ||
) | ||
|
||
reasons = "\n\n".join(breaking_changes) | ||
|
||
return ( | ||
f"There is a breaking change in the model contract because {self.reasons}; " | ||
"you may need to create a new version. See: https://docs.getdbt.com/docs/collaborate/publish/model-versions" | ||
"While comparing to previous project state, dbt detected a breaking change to an enforced contract." | ||
f"\n\n{reasons}\n\n" | ||
"Consider making an additive (non-breaking) change instead, if possible.\n" | ||
"Otherwise, create a new model version: https://docs.getdbt.com/docs/collaborate/publish/model-versions" | ||
) | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is wrong. If old.contract.enforced is True and self.contract.enforced is False it's a breaking change even when checksums are still equal, and this will always return True for that case. Also the "build_contract_checksum" only needs to run for self.contract.enforced is False, and the line above runs it for both True and False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Good point! Let me add a test for this
Per thread above, I don't think we need to call
build_contract_checksum
at any point withinsame_contract
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per other thread, we want to capture that the contract has been disabled AND the columns are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking more closely, the current formulation of the logic does work, because
self.contract.checksum
isNone
!I do agree that a more explicit condition for this would be better, rather than implicitly depending on that behavior to remain the case, so I've added it in