Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🪟🎉 Connector builder: Schema inferrer UI #21154

Merged
merged 33 commits into from
Jan 13, 2023

Conversation

flash1293
Copy link
Contributor

@flash1293 flash1293 commented Jan 9, 2023

Fixes #21060

What

Add UI part of schema auto detection (completely disabled in yaml view)

If there is no schema defined as part of the stream, an "Import schema" button is shown both in the stream schema tab and in the testing panel. This is done to teach users about the ability to import the detected schema.
Screenshot 2023-01-11 at 10 34 52

If there is a schema defined already, the diff is checked whether there are incompatible changes (lines deleted from the old schema). If that's the case, there are two buttons - "Overwrite" and "Merge properties". The latter button is only doing the additive changes without losing any existing schema definition.
Screenshot 2023-01-10 at 16 52 52

If the changes are purely additive, only a single "Import schema" button is shown
Screenshot 2023-01-10 at 16 53 16

How

  • The diffing is done using the diff library which has a special json object diffing mode.
  • As diffing can be expensive for large objects, it is debounced - otherwise it would happen for every keystroke
  • To be able to show the schema conflict indicator and the "Import" button in the stream config view, I had to move the useReadStream call into the test state service instead of it being nested in the testing panel. However, this seems cleaner anyway as it encapsulates all direct interaction with the testing API in a single place.
  • If the current schema text isn't valid JSON, it's not doing a diff at all, just rendering the formatted schema

@octavia-squidington-iv octavia-squidington-iv added the area/frontend Related to the Airbyte webapp label Jan 9, 2023
@flash1293 flash1293 temporarily deployed to more-secrets January 9, 2023 14:24 — with GitHub Actions Inactive
@flash1293 flash1293 temporarily deployed to more-secrets January 9, 2023 14:24 — with GitHub Actions Inactive
@flash1293 flash1293 changed the title Connector builder: Schema inferrer UI 🪟🎉 Connector builder: Schema inferrer UI Jan 10, 2023
@flash1293 flash1293 marked this pull request as ready for review January 11, 2023 10:48
Copy link
Contributor

@lmossman lmossman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a bunch of comments, but overall this is looking really great, nice work!

helpers.setValue(schemaDiff.mergedSchema);
}}
>
<FormattedMessage id="connectorBuilder.mergeSchemaButton" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random idea I had while playing with this: it could be cool if when hovering over the Merge properties button, all of the "deleted" lines are no longer highlighted red, and instead just the additions are shown in the diff view.

The reasoning is that currently, the diff view shows what will happen to the existing schema if the user clicks the Overwrite button. But there isn't a simple way for the user to see what the schema will look like if they click Merge.

What do you think? Definitely can be done in a separate PR if we do decide to do this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of making it more obvious how the merge is going to work, there are other things I think might be nice (having a "merge" button per difference for example - especially for large schemas). I will start a new issue to collect these.

useDebounce(
() => {
if (editorView === "ui") {
setSchemaDiff(getDiff(field.value, inferredSchema));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small weird visual thing I noticed: after clicking Import schema, the "type": "object" line in the Schema view moves from the top of the schema to the bottom of the schema after a short delay for some reason.

And this also means that the order of the fields in the Schema tab in the testing panel does not match the Schema tab in the Stream view.

Here is a screen recording demonstrating this:

Screen.Recording.2023-01-11.at.3.41.07.PM.mov

Copy link
Contributor Author

@flash1293 flash1293 Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of the diffing the library is ordering all keys alphabetically: https://github.com/kpdecker/jsdiff/blob/master/src/diff/json.js#L72

This is a little annoying in this context, I'm not sure how bad it is though. An idea I have is to always order keys alphabetically in all the schemas to enforce some consistency as part of the formatJson helper. What do you think? This is what I implemented on the PR. IMHO that's a nice way of making sure things are consistent, taking away one thing that can cause differences.

The only simple way I see to do the diffing respecting the original source order is to do a line-diff on the stringified schema instead of diffing the object directly, but that would cause other weird cases:

  • Adding a new property will cause a difference because it adds a comma to the end of the previously last property:
"a": {
  "first: "abc"
}

to

"a": {
  "first: "abc",
  "second": "def"
}

has the following diff:

 "a": {
-  "first: "abc"
+  "first: "abc",
+  "second": "def"
}
  • As the diffing algorithm doesn't know about object nesting anymore, it's possible diffs will span multiple nested objects which gets confusing easily (like in git when adding and removing a function in the same place causes the zebra pattern of deleted, added and unchanged lines). Would be nice to avoid this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that trying to do line-level diffs doesn't seem like a good idea due to the reasons you listed.

I think ordering keys alphabetically everywhere should be fine. The only other ideas I have here are:

  • Re-order the schemaDiff so that the keys match the inferredSchema key order (though with this approach we wouldn't know where to place keys that exist in declared schema but don't exist in the inferred schema)
  • Fork (🤮) the json diff library to remove the sort() call or open an issue to make that sorting configurable

But I think ordering the keys alphabetically is probably the simplest approach that guarantees the most consistency, so I'm fine with that approach. We can always revisit if people complain about the alphabetical sorting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-order the schemaDiff so that the keys match the inferredSchema key ord

I guess it's possible somehow, but the data structure returned from the library is line based, so ordering the keys based on these will get some pretty complex code parsing property names out of strings while tracking the nesting level. Definitely think it's worth pushing that out.

{inferredSchema && (
<Tab className={styles.tab}>
{({ selected }) => (
<Text className={classNames(styles.tabTitle, { [styles.selected]: selected })}>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design had the Schema tab be on a separate level than Records | Request | Response, with all of those put inside of a general Results tab, but I think I like this approach of just adding a fourth tab for Schema instead, since it seems like a simpler UI.

However, now that we are not planning to nest these tabs under a different set of tabs, I think we should just increase the font size of these tab titles, since the text feels pretty small currently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it removes a click from going from schema to request/response which I think is nice. This is how it looks now:
Screenshot 2023-01-12 at 11 55 44

</pre>
) : (
schemaDiff.changes.map((change, changeIndex) => (
<pre
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed switching from <pre> to Monaco editor for the Records/Request/Response display, so that users are able to collapse and expand sections of the JSON.

It seems like it will be harder to switch to Monaco for this schema diff view, because we are manually styling the different lines of the diff. Does that sound right to you? If so, I think it is probably fine to keep <pre> for just this view, since collapsing parts of the diff would probably be confusing anyway

Copy link
Contributor Author

@flash1293 flash1293 Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there is a way to do the per-line styling with Monaco as well, but I would solve that as part of the switch to Monaco. Agreed that this is not super critical to use Monaco for the schema diff view.

@flash1293
Copy link
Contributor Author

@lmossman Addressed all your points, could you take another look?

Copy link
Contributor

@lmossman lmossman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more small comments, nothing blocking.

Changes look great to me, everything works as expected from local testing!

@flash1293 flash1293 enabled auto-merge (squash) January 13, 2023 10:55
@flash1293 flash1293 merged commit f6967f1 into master Jan 13, 2023
@flash1293 flash1293 deleted the flash1293/schema-inferrer-ui branch January 13, 2023 12:52
jbfbell pushed a commit that referenced this pull request Jan 13, 2023
* fix stuff

* add inferred schema to API

* fix yaml changes

* fix yaml formatting

* add whitespace back

* basic ui

* advanced UI

* Remove unused one

* reset package lock

* resolve merge conflicts

* styling

* show button and icon in the normal schema tab

* restructure

* handle yaml view

* small fix

* review comments

* make monaco resize

* review comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/frontend Related to the Airbyte webapp
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connector builder UI: Expose schema auto-detection
3 participants