-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🪟🎉 Connector builder: Schema inferrer UI #21154
Conversation
…airbyte into flash1293/schema-inferrer-api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a bunch of comments, but overall this is looking really great, nice work!
airbyte-webapp/src/services/connectorBuilder/ConnectorBuilderStateService.tsx
Outdated
Show resolved
Hide resolved
helpers.setValue(schemaDiff.mergedSchema); | ||
}} | ||
> | ||
<FormattedMessage id="connectorBuilder.mergeSchemaButton" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random idea I had while playing with this: it could be cool if when hovering over the Merge properties
button, all of the "deleted" lines are no longer highlighted red, and instead just the additions are shown in the diff view.
The reasoning is that currently, the diff view shows what will happen to the existing schema if the user clicks the Overwrite
button. But there isn't a simple way for the user to see what the schema will look like if they click Merge
.
What do you think? Definitely can be done in a separate PR if we do decide to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of making it more obvious how the merge is going to work, there are other things I think might be nice (having a "merge" button per difference for example - especially for large schemas). I will start a new issue to collect these.
airbyte-webapp/src/components/connectorBuilder/Builder/StreamConfigView.tsx
Show resolved
Hide resolved
useDebounce( | ||
() => { | ||
if (editorView === "ui") { | ||
setSchemaDiff(getDiff(field.value, inferredSchema)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small weird visual thing I noticed: after clicking Import schema
, the "type": "object"
line in the Schema view moves from the top of the schema to the bottom of the schema after a short delay for some reason.
And this also means that the order of the fields in the Schema tab in the testing panel does not match the Schema tab in the Stream view.
Here is a screen recording demonstrating this:
Screen.Recording.2023-01-11.at.3.41.07.PM.mov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of the diffing the library is ordering all keys alphabetically: https://github.com/kpdecker/jsdiff/blob/master/src/diff/json.js#L72
This is a little annoying in this context, I'm not sure how bad it is though. An idea I have is to always order keys alphabetically in all the schemas to enforce some consistency as part of the formatJson
helper. What do you think? This is what I implemented on the PR. IMHO that's a nice way of making sure things are consistent, taking away one thing that can cause differences.
The only simple way I see to do the diffing respecting the original source order is to do a line-diff on the stringified schema instead of diffing the object directly, but that would cause other weird cases:
- Adding a new property will cause a difference because it adds a comma to the end of the previously last property:
"a": {
"first: "abc"
}
to
"a": {
"first: "abc",
"second": "def"
}
has the following diff:
"a": {
- "first: "abc"
+ "first: "abc",
+ "second": "def"
}
- As the diffing algorithm doesn't know about object nesting anymore, it's possible diffs will span multiple nested objects which gets confusing easily (like in git when adding and removing a function in the same place causes the zebra pattern of deleted, added and unchanged lines). Would be nice to avoid this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that trying to do line-level diffs doesn't seem like a good idea due to the reasons you listed.
I think ordering keys alphabetically everywhere should be fine. The only other ideas I have here are:
- Re-order the schemaDiff so that the keys match the
inferredSchema
key order (though with this approach we wouldn't know where to place keys that exist in declared schema but don't exist in the inferred schema) - Fork (🤮) the json diff library to remove the
sort()
call or open an issue to make that sorting configurable
But I think ordering the keys alphabetically is probably the simplest approach that guarantees the most consistency, so I'm fine with that approach. We can always revisit if people complain about the alphabetical sorting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-order the schemaDiff so that the keys match the inferredSchema key ord
I guess it's possible somehow, but the data structure returned from the library is line based, so ordering the keys based on these will get some pretty complex code parsing property names out of strings while tracking the nesting level. Definitely think it's worth pushing that out.
airbyte-webapp/src/components/connectorBuilder/StreamTestingPanel/PageDisplay.tsx
Show resolved
Hide resolved
{inferredSchema && ( | ||
<Tab className={styles.tab}> | ||
{({ selected }) => ( | ||
<Text className={classNames(styles.tabTitle, { [styles.selected]: selected })}> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design had the Schema tab be on a separate level than Records | Request | Response
, with all of those put inside of a general Results
tab, but I think I like this approach of just adding a fourth tab for Schema instead, since it seems like a simpler UI.
However, now that we are not planning to nest these tabs under a different set of tabs, I think we should just increase the font size of these tab titles, since the text feels pretty small currently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
airbyte-webapp/src/components/connectorBuilder/StreamTestingPanel/SchemaDiffView.module.scss
Outdated
Show resolved
Hide resolved
airbyte-webapp/src/components/connectorBuilder/StreamTestingPanel/SchemaDiffView.tsx
Outdated
Show resolved
Hide resolved
</pre> | ||
) : ( | ||
schemaDiff.changes.map((change, changeIndex) => ( | ||
<pre |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed switching from <pre>
to Monaco editor for the Records/Request/Response display, so that users are able to collapse and expand sections of the JSON.
It seems like it will be harder to switch to Monaco for this schema diff view, because we are manually styling the different lines of the diff. Does that sound right to you? If so, I think it is probably fine to keep <pre>
for just this view, since collapsing parts of the diff would probably be confusing anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there is a way to do the per-line styling with Monaco as well, but I would solve that as part of the switch to Monaco. Agreed that this is not super critical to use Monaco for the schema diff view.
airbyte-webapp/src/components/connectorBuilder/StreamTestingPanel/PageDisplay.tsx
Show resolved
Hide resolved
@lmossman Addressed all your points, could you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple more small comments, nothing blocking.
Changes look great to me, everything works as expected from local testing!
airbyte-webapp/src/components/connectorBuilder/Builder/StreamConfigView.tsx
Show resolved
Hide resolved
* fix stuff * add inferred schema to API * fix yaml changes * fix yaml formatting * add whitespace back * basic ui * advanced UI * Remove unused one * reset package lock * resolve merge conflicts * styling * show button and icon in the normal schema tab * restructure * handle yaml view * small fix * review comments * make monaco resize * review comments
Fixes #21060
What
Add UI part of schema auto detection (completely disabled in yaml view)
If there is no schema defined as part of the stream, an "Import schema" button is shown both in the stream schema tab and in the testing panel. This is done to teach users about the ability to import the detected schema.
If there is a schema defined already, the diff is checked whether there are incompatible changes (lines deleted from the old schema). If that's the case, there are two buttons - "Overwrite" and "Merge properties". The latter button is only doing the additive changes without losing any existing schema definition.
If the changes are purely additive, only a single "Import schema" button is shown
How
diff
library which has a special json object diffing mode.useReadStream
call into the test state service instead of it being nested in the testing panel. However, this seems cleaner anyway as it encapsulates all direct interaction with the testing API in a single place.