-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add breakingChange to catalogDiff #17588
Conversation
@@ -8285,13 +8285,15 @@ <h3 class="field-label">Example data</h3> | |||
"fieldName" : [ "fieldName", "fieldName" ], | |||
"addField" : { }, | |||
"transformType" : "add_field", | |||
"removeField" : { } | |||
"removeField" : { }, | |||
"isBreaking" : true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was automatically generated and I'm not sure what it is or whether I should change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine.
@@ -10475,6 +10487,7 @@ <h3><a name="FieldTransform"><code>FieldTransform</code> - </a> <a class="up" hr | |||
<div class="param-enum-header">Enum:</div> | |||
<div class="param-enum">add_field</div><div class="param-enum">remove_field</div><div class="param-enum">update_field_schema</div> | |||
<div class="param">fieldName </div><div class="param-desc"><span class="param-type"><a href="#string">array[String]</a></span> A field name is a list of strings that form the path to the field. </div> | |||
<div class="param">isBreaking </div><div class="param-desc"><span class="param-type"><a href="#boolean">Boolean</a></span> </div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was also automatically generated
5b74a75
to
c4897a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frontend changes look good. ✅
@@ -4163,6 +4164,8 @@ components: | |||
- update_field_schema | |||
fieldName: | |||
$ref: "#/components/schemas/FieldName" | |||
isBreaking: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It doesn't look like we are using the isX
convention for other boolean fields in the OpenAPI config. Most appear to be just X (e.g. succeeded
, available
, retryable
, etc). I do see one example of hasX
, but I'm not sure if we have a convention and/or want to follow it with this field by using breakingChange
instead of isBreaking
. @cgardens Thoughts on whether or not we have a naming convention for OpenAPI boolean properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw isSyncing on webBackendConnectionRead, but happy to change this if it's more in line with our convention - isSyncing might be the only one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alovew I don't really have an opinion one way or another -- I just noticed that we are not using the isX
convention. Just wanted to make sure that there wasn't a decision that predates both of us to not use that convention before we introduce inconsistency in the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote we drop the is
prefix. Looking at all the other type: boolean
fields we have, one of them has the is
prefix; isSyncing
, three have a has
prefix; hasConnections
, hasSources
, hasDestination
and every other one has no prefix.
Additionally can we add a description to this field indicating what it is used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to test this but I was getting this error when trying to refresh my schema.
"Internal Server Error: Cannot invoke \"io.airbyte.api.model.generated.AirbyteCatalog.getStreams()\" because \"discovered\" is null"
Unsure if this is related to this PR.
@krishnaglick can you give me more details about how you saw this error? which source? was it a new connection & were there schema changes? |
@krishnaglick also where in the codebase the error was invoked? |
I have two postgres DB's syncing to eachother. I cleaned out one to a default, pristine state, and did a schema update. I'm going to try against master too. |
spoke with @krishnaglick on slack, and apparently he is seeing this error on master as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found potential for a couple bugs.
Does this need to be a Boolean
field or can we make it a boolean
instead?
final List<ConfiguredAirbyteStream> streamList = configuredCatalog.getStreams().stream() | ||
.filter(s -> Objects.equals(s.getStream().getNamespace(), descriptor.getNamespace()) | ||
&& s.getStream().getName().equals(descriptor.getName())) | ||
.toList(); | ||
|
||
final ConfiguredAirbyteStream stream = streamList.get(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two statements could be combined into one statement which removes the need to have the streamList
variable by replacing toList()
with findFirst()
final var stream = configuredCatalog.getStreams().stream()
.filter(s -> Objects.equals(s.getStream().getNamespace(), descriptor.getNamespace())
&& s.getStream().getName().equals(descriptor.getName()))
.findFirst();
Additionally, as the stream
variable is only referenced if !streamOld.equals(streamNew)
this work would be moved to inside that if
statement. No reason to do this work if there is a chance it's going to be ignored.
The findFirst()
call returns an Optional
so you would need to check if you actually received a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there is an issue with the current code, what if the filter
call returns no results? The streamList.get(0)
line will fail with an IndexOutOfBoundsException
.
@@ -4163,6 +4164,8 @@ components: | |||
- update_field_schema | |||
fieldName: | |||
$ref: "#/components/schemas/FieldName" | |||
isBreaking: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote we drop the is
prefix. Looking at all the other type: boolean
fields we have, one of them has the is
prefix; isSyncing
, three have a has
prefix; hasConnections
, hasSources
, hasDestination
and every other one has no prefix.
Additionally can we add a description to this field indicating what it is used for?
@@ -384,4 +405,17 @@ static void combineAccumulator(final Map<List<String>, JsonNode> accumulatorLeft | |||
}); | |||
} | |||
|
|||
static Boolean transformBreaksConnection(final ConfiguredAirbyteStream configuredStream, final List<String> fieldName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this return boolean
instead of Boolean
?
if (SyncMode.INCREMENTAL == syncMode && configuredStream.getCursorField().equals(fieldName)) { | ||
return true; | ||
} | ||
|
||
final DestinationSyncMode destinationSyncMode = configuredStream.getDestinationSyncMode(); | ||
if (DestinationSyncMode.APPEND_DEDUP == destinationSyncMode && configuredStream.getPrimaryKey().contains(fieldName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential for a subtle bug here as the equals
of a list is only true when
two lists are defined to be equal if they contain the same elements in the same order
Is there a potential that these lists contain the same fields in a different order? Do we need to order these lists before comparing them?
This affects both configuredStream.getCursorField().equals(fieldName)
and configuredStream.getPrimaryKey().contains(fieldName)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought there could only ever be one cursorField, but I think changing it to contains
would fix this. I don't think there's a problem with the second one since it's using contains
instead of equals
. Is there something I'm missing about contains
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, fieldName is also a list. I didn't know why this is a list - there is also only ever one item in the fieldName list. I couldn't find examples where there were fieldNames that consisted of a list of multiple strings. I can check with the connectors team on this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colesnodgrass it looks like the way we do existing diffs between field names, the order matters. I can't find an example of this, but if one field name is a list of ["field_prefix", "field_name"] and another is ["field_name", "field_prefix"], this would be considered a different field and it would be added to the CatalogDiff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, this may be something we want to explicitly state/enforce somewhere in the code. Or maybe abstract behind a new datatype as there is a potential here for breaking this accidentally. But this doesn't need to be done as part of this PR.
if (transformBreaksConnection(configuredStream, fieldName)) { | ||
fieldTransforms.add(FieldTransform.createRemoveFieldTransform(fieldName, fieldNameToTypeOld.get(fieldName), true)); | ||
} else { | ||
fieldTransforms.add(FieldTransform.createRemoveFieldTransform(fieldName, fieldNameToTypeOld.get(fieldName), false)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the if/else
isn't necessary, can be replaced with
fieldTransforms.add(FieldTransform.createRemoveFieldTransform(
fieldName,
fieldNameToTypeOld.get(fieldName),
transformBreaksConnection(configuredStream, fieldName)
));
Sets.difference(fieldNameToTypeNew.keySet(), fieldNameToTypeOld.keySet()) | ||
.forEach(fieldName -> fieldTransforms.add(FieldTransform.createAddFieldTransform(fieldName, fieldNameToTypeNew.get(fieldName)))); | ||
.forEach(fieldName -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the curly-braces necessary here?
@@ -64,4 +67,8 @@ public UpdateFieldSchemaTransform getUpdateFieldTransform() { | |||
return updateFieldTransform; | |||
} | |||
|
|||
public Boolean getIsBreaking() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be isBreaking()
instead of getIsBreaking()
.
Also could this be a boolean
instead of a Boolean
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I agree with comments of @colesnodgrass . I'm going to leave him to approve the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment on the Optional.get()
usage.
final ConfiguredAirbyteStream stream = configuredCatalog.getStreams().stream() | ||
.filter(s -> Objects.equals(s.getStream().getNamespace(), descriptor.getNamespace()) | ||
&& s.getStream().getName().equals(descriptor.getName())) | ||
.findFirst().get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.get()
will throw a NoSuchElementException
if called on an empty Optional
.
What should happen here if we have nothing? Should streamTransoforms.add
still be called? Is nothing an error case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a case that should happen, but if it does, I think we should default to calling the transform non-breaking. I'll make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻 I think it's also worth looking into using final var
more than explicitly specifying the object type as this makes cleaning-up/refactoring this code easier.
74928be
to
cb3639b
Compare
* Add breaking field to FieldTransform on catalogDiff
isBreaking is a new field on the FieldTransform object within the CatalogDiff object. When we calculate catalog diffs, we will now pass in the connection's configuredCatalog since that has data about the sync modes & primary and cursor fields for a connection.
A FieldTransform is considered 'breaking' if:
This PR does the following: