Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add breakingChange to catalogDiff #17588

Merged
merged 10 commits into from
Oct 11, 2022
Merged

Conversation

alovew
Copy link
Contributor

@alovew alovew commented Oct 4, 2022

isBreaking is a new field on the FieldTransform object within the CatalogDiff object. When we calculate catalog diffs, we will now pass in the connection's configuredCatalog since that has data about the sync modes & primary and cursor fields for a connection.

A FieldTransform is considered 'breaking' if:

  1. the connection is INCREMENTAL and the cursor field was removed OR
  2. the connection is DEDUP and the primary key was removed

This PR does the following:

  1. Update the FieldTransform object to add this new field
  2. Populate the isBreaking field correctly within the getDiff calculation
  3. Update the API to require the isBreaking field with the CatalogDiff response
  4. Return the isBreaking field with API responses that include CatalogDiffs

@github-actions github-actions bot added area/api Related to the api area/documentation Improvements or additions to documentation area/platform issues related to the platform area/protocol area/server area/frontend Related to the Airbyte webapp labels Oct 4, 2022
@alovew alovew temporarily deployed to more-secrets October 4, 2022 23:03 Inactive
@alovew alovew temporarily deployed to more-secrets October 5, 2022 17:56 Inactive
@alovew alovew temporarily deployed to more-secrets October 5, 2022 20:52 Inactive
@alovew alovew temporarily deployed to more-secrets October 5, 2022 21:19 Inactive
@alovew alovew marked this pull request as ready for review October 5, 2022 21:20
@alovew alovew requested review from a team as code owners October 5, 2022 21:20
@@ -8285,13 +8285,15 @@ <h3 class="field-label">Example data</h3>
"fieldName" : [ "fieldName", "fieldName" ],
"addField" : { },
"transformType" : "add_field",
"removeField" : { }
"removeField" : { },
"isBreaking" : true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was automatically generated and I'm not sure what it is or whether I should change it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine.

@@ -10475,6 +10487,7 @@ <h3><a name="FieldTransform"><code>FieldTransform</code> - </a> <a class="up" hr
<div class="param-enum-header">Enum:</div>
<div class="param-enum">add_field</div><div class="param-enum">remove_field</div><div class="param-enum">update_field_schema</div>
<div class="param">fieldName </div><div class="param-desc"><span class="param-type"><a href="#string">array[String]</a></span> A field name is a list of strings that form the path to the field. </div>
<div class="param">isBreaking </div><div class="param-desc"><span class="param-type"><a href="#boolean">Boolean</a></span> </div>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was also automatically generated

@alovew alovew force-pushed the anne/add-breaking-change-to-catalog-diff branch from 5b74a75 to c4897a6 Compare October 5, 2022 22:18
@alovew alovew temporarily deployed to more-secrets October 5, 2022 22:20 Inactive
@alovew alovew temporarily deployed to more-secrets October 6, 2022 00:14 Inactive
Copy link
Contributor

@edmundito edmundito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frontend changes look good. ✅

@@ -4163,6 +4164,8 @@ components:
- update_field_schema
fieldName:
$ref: "#/components/schemas/FieldName"
isBreaking:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It doesn't look like we are using the isX convention for other boolean fields in the OpenAPI config. Most appear to be just X (e.g. succeeded, available, retryable, etc). I do see one example of hasX, but I'm not sure if we have a convention and/or want to follow it with this field by using breakingChange instead of isBreaking. @cgardens Thoughts on whether or not we have a naming convention for OpenAPI boolean properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw isSyncing on webBackendConnectionRead, but happy to change this if it's more in line with our convention - isSyncing might be the only one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alovew I don't really have an opinion one way or another -- I just noticed that we are not using the isX convention. Just wanted to make sure that there wasn't a decision that predates both of us to not use that convention before we introduce inconsistency in the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote we drop the is prefix. Looking at all the other type: boolean fields we have, one of them has the is prefix; isSyncing, three have a has prefix; hasConnections, hasSources, hasDestination and every other one has no prefix.

Additionally can we add a description to this field indicating what it is used for?

Copy link
Contributor

@krishnaglick krishnaglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to test this but I was getting this error when trying to refresh my schema.



"Internal  Server Error: Cannot invoke  \"io.airbyte.api.model.generated.AirbyteCatalog.getStreams()\" because  \"discovered\" is null"

Unsure if this is related to this PR.

@alovew
Copy link
Contributor Author

alovew commented Oct 6, 2022

@krishnaglick can you give me more details about how you saw this error? which source? was it a new connection & were there schema changes?

@alovew
Copy link
Contributor Author

alovew commented Oct 6, 2022

@krishnaglick also where in the codebase the error was invoked?

@krishnaglick
Copy link
Contributor

@krishnaglick can you give me more details about how you saw this error? which source? was it a new connection & were there schema changes?

I have two postgres DB's syncing to eachother. I cleaned out one to a default, pristine state, and did a schema update. I'm going to try against master too.

@alovew
Copy link
Contributor Author

alovew commented Oct 6, 2022

spoke with @krishnaglick on slack, and apparently he is seeing this error on master as well

Copy link
Member

@colesnodgrass colesnodgrass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found potential for a couple bugs.

Does this need to be a Boolean field or can we make it a boolean instead?

Comment on lines 329 to 334
final List<ConfiguredAirbyteStream> streamList = configuredCatalog.getStreams().stream()
.filter(s -> Objects.equals(s.getStream().getNamespace(), descriptor.getNamespace())
&& s.getStream().getName().equals(descriptor.getName()))
.toList();

final ConfiguredAirbyteStream stream = streamList.get(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two statements could be combined into one statement which removes the need to have the streamList variable by replacing toList() with findFirst()

final var stream = configuredCatalog.getStreams().stream()
  .filter(s -> Objects.equals(s.getStream().getNamespace(), descriptor.getNamespace())
    && s.getStream().getName().equals(descriptor.getName()))
  .findFirst();

Additionally, as the stream variable is only referenced if !streamOld.equals(streamNew) this work would be moved to inside that if statement. No reason to do this work if there is a chance it's going to be ignored.

The findFirst() call returns an Optional so you would need to check if you actually received a value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there is an issue with the current code, what if the filter call returns no results? The streamList.get(0) line will fail with an IndexOutOfBoundsException.

@@ -4163,6 +4164,8 @@ components:
- update_field_schema
fieldName:
$ref: "#/components/schemas/FieldName"
isBreaking:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote we drop the is prefix. Looking at all the other type: boolean fields we have, one of them has the is prefix; isSyncing, three have a has prefix; hasConnections, hasSources, hasDestination and every other one has no prefix.

Additionally can we add a description to this field indicating what it is used for?

@@ -384,4 +405,17 @@ static void combineAccumulator(final Map<List<String>, JsonNode> accumulatorLeft
});
}

static Boolean transformBreaksConnection(final ConfiguredAirbyteStream configuredStream, final List<String> fieldName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this return boolean instead of Boolean?

Comment on lines 410 to 415
if (SyncMode.INCREMENTAL == syncMode && configuredStream.getCursorField().equals(fieldName)) {
return true;
}

final DestinationSyncMode destinationSyncMode = configuredStream.getDestinationSyncMode();
if (DestinationSyncMode.APPEND_DEDUP == destinationSyncMode && configuredStream.getPrimaryKey().contains(fieldName)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential for a subtle bug here as the equals of a list is only true when

two lists are defined to be equal if they contain the same elements in the same order

Is there a potential that these lists contain the same fields in a different order? Do we need to order these lists before comparing them?

This affects both configuredStream.getCursorField().equals(fieldName) and configuredStream.getPrimaryKey().contains(fieldName).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought there could only ever be one cursorField, but I think changing it to contains would fix this. I don't think there's a problem with the second one since it's using contains instead of equals. Is there something I'm missing about contains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, fieldName is also a list. I didn't know why this is a list - there is also only ever one item in the fieldName list. I couldn't find examples where there were fieldNames that consisted of a list of multiple strings. I can check with the connectors team on this though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colesnodgrass it looks like the way we do existing diffs between field names, the order matters. I can't find an example of this, but if one field name is a list of ["field_prefix", "field_name"] and another is ["field_name", "field_prefix"], this would be considered a different field and it would be added to the CatalogDiff

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, this may be something we want to explicitly state/enforce somewhere in the code. Or maybe abstract behind a new datatype as there is a potential here for breaking this accidentally. But this doesn't need to be done as part of this PR.

Comment on lines 363 to 367
if (transformBreaksConnection(configuredStream, fieldName)) {
fieldTransforms.add(FieldTransform.createRemoveFieldTransform(fieldName, fieldNameToTypeOld.get(fieldName), true));
} else {
fieldTransforms.add(FieldTransform.createRemoveFieldTransform(fieldName, fieldNameToTypeOld.get(fieldName), false));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the if/else isn't necessary, can be replaced with

fieldTransforms.add(FieldTransform.createRemoveFieldTransform(
  fieldName, 
  fieldNameToTypeOld.get(fieldName), 
  transformBreaksConnection(configuredStream, fieldName)
));

Sets.difference(fieldNameToTypeNew.keySet(), fieldNameToTypeOld.keySet())
.forEach(fieldName -> fieldTransforms.add(FieldTransform.createAddFieldTransform(fieldName, fieldNameToTypeNew.get(fieldName))));
.forEach(fieldName -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the curly-braces necessary here?

@@ -64,4 +67,8 @@ public UpdateFieldSchemaTransform getUpdateFieldTransform() {
return updateFieldTransform;
}

public Boolean getIsBreaking() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be isBreaking() instead of getIsBreaking().

Also could this be a boolean instead of a Boolean?

Copy link
Contributor

@benmoriceau benmoriceau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I agree with comments of @colesnodgrass . I'm going to leave him to approve the PR.

@alovew alovew requested a review from colesnodgrass October 10, 2022 21:49
@alovew alovew temporarily deployed to more-secrets October 10, 2022 22:18 Inactive
Copy link
Member

@colesnodgrass colesnodgrass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on the Optional.get() usage.

final ConfiguredAirbyteStream stream = configuredCatalog.getStreams().stream()
.filter(s -> Objects.equals(s.getStream().getNamespace(), descriptor.getNamespace())
&& s.getStream().getName().equals(descriptor.getName()))
.findFirst().get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.get() will throw a NoSuchElementException if called on an empty Optional.

What should happen here if we have nothing? Should streamTransoforms.add still be called? Is nothing an error case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a case that should happen, but if it does, I think we should default to calling the transform non-breaking. I'll make that change.

@alovew alovew temporarily deployed to more-secrets October 10, 2022 22:46 Inactive
@alovew alovew temporarily deployed to more-secrets October 10, 2022 22:58 Inactive
@alovew alovew temporarily deployed to more-secrets October 10, 2022 23:16 Inactive
Copy link
Member

@colesnodgrass colesnodgrass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 I think it's also worth looking into using final var more than explicitly specifying the object type as this makes cleaning-up/refactoring this code easier.

@alovew alovew force-pushed the anne/add-breaking-change-to-catalog-diff branch from 74928be to cb3639b Compare October 10, 2022 23:52
@alovew alovew temporarily deployed to more-secrets October 10, 2022 23:55 Inactive
@alovew alovew dismissed colesnodgrass’s stale review October 11, 2022 00:32

implemented changes

@alovew alovew merged commit ca2605d into master Oct 11, 2022
@alovew alovew deleted the anne/add-breaking-change-to-catalog-diff branch October 11, 2022 01:04
jhammarstedt pushed a commit to jhammarstedt/airbyte that referenced this pull request Oct 31, 2022
* Add breaking field to FieldTransform on catalogDiff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Related to the api area/documentation Improvements or additions to documentation area/frontend Related to the Airbyte webapp area/platform issues related to the platform area/protocol area/server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants