Add openAPI spec for Connector Builder Server #17535

lmossman · 2022-10-03T22:37:45Z

What

Resolves #17424

Adds an OpenAPI spec for the Connector Builder Server, which will provide the backend functionality for the Connector Builder web application, as described in the tech spec: https://docs.google.com/document/d/11HrieUnA7oa6YsDhOVZpAVkOoZt2RWQQWJLPf7xQodY/edit#heading=h.ugjhw5e99cnr

How

Describe the solution

lmossman · 2022-10-03T22:50:09Z

connector-builder-server/src/main/openapi/openapi.yaml

+          $ref: "#/components/responses/ExceptionResponse"
+        "422":
+          $ref: "#/components/responses/InvalidInputResponse"
+  /v1/streams/list:


This endpoint solves two problems listed in the tech spec:

(tech spec link) allows the webapp to populate the stream dropdown without having to manually parse the connector definition yaml file

(tech spec link) provides the URLs for each stream so that the webapp does not need to manually parse the connector definition yaml file

Since the backend implementation of this endpoint is just parsing the connector definition file to extract the stream info, it should be very fast. This is important because this request will likely need to be submitted on every change to the yaml contents in the webapp (potentially after some short delay), so that the stream list and URLs are always accurate.

lmossman · 2022-10-03T22:56:50Z

connector-builder-server/src/main/openapi/openapi.yaml

+            type: object
+            required:
+              - request
+              - response


I decided to combine a request and the response for that request into a single object here, since I think every request should have a response and it may be more helpful to our users if they are presented as such. But let me know if anyone disagrees with this approach, or if it is overly complex to implement for any reason.

lmossman · 2022-10-03T22:58:03Z

connector-builder-server/src/main/openapi/openapi.yaml

+                  body:
+                    type: object
+                    description: The body of the HTTP request, if present
+                  headers:
+                    type: object
+                    description: The headers of the HTTP request


I wasn't super sure if these should be JSON objects or just strings. I went with JSON objects here because it looks like that's what they are in the debug log messages of the read command, but open to feedback here

connector-builder-server/src/main/openapi/openapi.yaml

lmossman · 2022-10-04T00:08:06Z

connector-builder-server/src/main/openapi/openapi.yaml

+        results:
+          type: array
+          description: The RECORD and STATE AirbyteMessages coming from the read operation
+          items:
+            type: string
+        logs:
+          type: array
+          description: The LOG AirbyteMessages coming from the read operation
+          items:
+            type: string


I set these to string, because I didn't seen an obvious way to reference just the AirbyteMessage or AirbyteLogMessage schema from the protocol file, but I'm open to suggestions here

I figured out how to do this after all! Just needed to reference the whole airbyte protocol in one schema, and then add more slashes when referencing that schema to access its children

lmossman · 2022-10-04T17:23:30Z

Ideas from BG:

We may want to group requests, records, and logs into some shared entity (e.g. "pages")
Read only a single page or a single slice for a stream, so that we don't need to wait for all of the data to be read
- Maybe the API should accept num_pages and/or num_slices (maybe both of the user wants to test pagination AND slicing behavior) argument in case the user wants to test pagination/slicing (can default it to 1 for now)

lmossman · 2022-10-04T20:14:25Z

connector-builder-server/src/main/openapi/openapi.yaml

+  url: "https://docs.airbyte.com/connector-development/config-based/overview/"
+
+paths:
+  /v1/stream/read:


@girarda @brianjlai see this commit for the changes to this endpoint coming from our Backlog Grooming discussion: 2b52fb0
Here's a screenshot of the updated Swagger view of this spec since it's a little easier to see what the final product looks like there:

I tried a nested grouping here of pages nested inside of slices, with a sliceId and pageId in each nested object.

What do you think of this structure? I thought it made sense because if a user is doing both stream slicing and pagination, then seeing each page broken down by slice makes sense. If they are not using one of those, then there will just be one element in that level, e.g. if they are not using pagination then there will just be a single element in pages.

I also wasn't super sure about sliceId and pageId -- I figured we wanted some way to identify which slices and which pages are which, but I wasn't sure if we have something to use as the "IDs" here. Since they are just strings, maybe we can just put in the URL parameter / header that changes for each slice/page?

in practice, stream slices correspond to states, so we could identify them using the state object, for example:

{ "repository": "airbytehq/integration-test", "created_at": "2021-06-29T03:44:45Z" }

we don't currently persist or keep track of the number of pages within a slice, but each page corresponds to a single request/response so we could identify them using a counter

Got it, I think that makes sense as a way to differentiate them

I decided to just remove the sliceId and pageId fields from this. I didn't feel like they would add much value, because the slice "state" can be extracted from the results, and the page number is basically just the index of the page in the pages array

the slice "state" can be extracted from the results
the slice represents a range ("2021-05-29T03:44:45Z" - "2021-06-29T03:44:45Z"), whereas the output state only represents the end state.
Do we only care about showing the end state here?

lmossman · 2022-10-04T20:19:45Z

connector-builder-server/src/main/openapi/openapi.yaml

+        numSlices:
+          type: integer
+          description: Number of stream slices to read from the source
+          default: 1
+        numPages:
+          type: integer
+          description: Number of pages to read from the source
+          default: 1


Added these request params for the eventual goal of allowing users to specify the number of slices/pages to request when testing

lmossman · 2022-10-04T20:20:58Z

connector-builder-server/src/main/openapi/openapi.yaml

+        state:
+          type: object
+          description: State blob to use for incremental streams


Added this state request param as well for testing incremental streams. I'll bring this up in our Low Code Builder Sync, but we will probably eventually want to have a text input where users can enter a JSON state blob, which can be passed in to this parameter.

I think this is a good addition, but also noting we'll need to make sure this pane to specify state is only visible/editable when a stream supports incremental and not just full refresh

connector-builder-server/src/main/openapi/openapi.yaml

girarda · 2022-10-05T18:19:46Z

connector-builder-server/src/main/openapi/openapi.yaml

+        stream:
+          type: string
+          description: Name of the stream to read
+        numSlices:


open question: do we want to request a number of slices or specify which slice to request?

What would it look like to specify a specific slice to request here? Passing in a state blob, or something else?

Also, based on our discussion this morning around the Next page / Next slice behavior this morning, I think we'll want to update this part of the API to accommodate that behavior

I'm not sure we get much benefit from allowing developers to specify arbitrary middle slices as input via a literal number(or middle pages for that matter). But maybe I'm not thinking about every scenario.

I like the idea of a state object to be passed in during testing seems like it would address both these case to test incremental, but also as a means to test a slice in the middle if necessary

girarda · 2022-10-05T18:28:51Z

connector-builder-server/src/main/openapi/openapi.yaml

+        definition:
+          $ref: "#/components/schemas/ConnectorDefinitionBody"
+          description: The config-based connector definition contents
+    StreamsListRead:


is this how the server will share the schema (whenever we support schema detection)?

I don't think so, because detecting the schema will require making an API call to the source.

This /streams/list endpoint is meant to be very fast, as it will likely be called on every change to the yaml editor contents in order to keep the stream list up to date. I think for schema detection, we will likely just want to do that as part of the /stream/read API call, e.g. add another property to the return value of that function that contains the schema of the results.

girarda · 2022-10-05T18:30:29Z

connector-builder-server/src/main/openapi/openapi.yaml

+  url: "https://docs.airbyte.com/connector-development/config-based/overview/"
+
+paths:
+  /v1/stream/read:


should we add endpoints for check and discovery? An alternative would be to specify the operation as part of the request, but I think they're different enough to warrant different endpoints

Note from discussion: check/discover/spec endpoints can be added in a later phase. Phase 1 will only be focused on read. Will add a note about this to the tech spec as well

girarda · 2022-10-05T18:31:42Z

connector-builder-server/src/main/openapi/openapi.yaml

+        definition:
+          $ref: "#/components/schemas/ConnectorDefinitionBody"
+          description: The config-based connector definition contents
+    StreamsListRead:


would it be valuable to also return the stream's slices so they can be displayed?

What do you mean exactly by "return the stream's slices"? Are you suggesting that this should return all of the possible slices that are configured for a given stream, e.g. if they are using a list slicer? And this would be displayed somewhere in the testing panel?

Discussed this over zoom - the idea here is to have this return a list of all the slices that will be read for this stream, so that they can be displayed somewhere to provide visual feedback on how far through the stream slicing the user has made it. Added a commented-out block here as this can be done in a future phase (will add a note to the tech spec as well)

connector-builder-server/src/main/openapi/openapi.yaml

girarda · 2022-10-05T18:34:23Z

connector-builder-server/src/main/openapi/openapi.yaml

+                      description: The HTTP request sent to the source API for this page
+                      required:
+                        - url
+                        - headers


I'm not 100% sure if headers should be required or not, but it should be consistent with the body and request parameters fields

brianjlai · 2022-10-06T01:20:55Z

connector-builder-server/src/main/openapi/openapi.yaml

+
+paths:
+  /v1/stream/read:
+    post:


how come we want these endpoints defined as POST calls? Because we're using a request body? Or I guess a real sync is sort of a POST call since the end result has records written to a destination, even if the test one doesnt?

Yeah I was sort of following the convention in our main Airbyte API where we use post calls and pass all of the inputs in the request body. Especially for this API, since we need to pass the entire connector definition contents to the server I felt like that needed to go in a request body as opposed to being something like a URL parameter. And we can't have a request bodies for get requests

brianjlai · 2022-10-06T01:43:48Z

connector-builder-server/src/main/openapi/openapi.yaml

+          default: 1
+        numPages:
+          type: integer
+          description: Number of pages to read from the source


Number of pages to read per-slice from the source

We should probably be very explicit in the description about what this means. Presumably we mean number of pages per slice we will retrieve. i.e. slices = 3, pages = 5 means that for each slice we'll get 5 pages worth of records.

As it reads now, it might be conflated to mean 3 total pages, but pages are kind of a subprocess of slicing

lmossman · 2022-10-07T00:00:09Z

connector-builder-server/src/main/openapi/openapi.yaml

+                    airbyteMessages:
+                      type: array
+                      description: The RECORD/STATE/LOG AirbyteMessages coming from the read operation for this page
+                      items:
+                        $ref: "#/components/schemas/AirbyteProtocol/definitions/AirbyteMessage"


Since I was able to figure out how to reference the actual airbyte protocol yaml definition for AirbyteMessage in this API spec, I decided to just consolidate the results and logs fields here into a single airbyteMessages field, which will contain all of the AirbyteMessages coming from the connector, and the frontend can just filter down to specific types if needed

lmossman · 2022-10-07T00:02:18Z

@brianjlai @girarda I've updated this spec based on your feedback and what we discussed on zoom, and I think this is ready for another look now. Here are the changes I made this time, not too many: 423097e

connector-builder-server/src/main/openapi/openapi.yaml

girarda · 2022-10-07T01:40:07Z

connector-builder-server/src/main/openapi/openapi.yaml

+          items:
+            type: object
+            required:
+              - pages


I think this should have another field describing the slice (it's defined as a json object)

@girarda By "describing the slice", are you talking about the state object that is emitted at the end of the slice? If so, that will be contained in the last page's airbyteMessages field.

Or is there some other information about the slice that should be returned here?

I mean the exact slice object
eg

{ start_time: "2021-01-01", end_time: "2021-01-31" }

Ah, I understand now. Yeah, I think that would be useful information to show to the user. I will add a sliceDescriptor here

@girarda added: c63fd4c

girarda

sherifnada · 2022-10-14T23:23:26Z

connector-builder-server/src/main/openapi/openapi.yaml

+          type: object
+          description: The headers of the HTTP response, if any
+    ConnectorDefinitionBody:
+      $ref: ../../../../airbyte-cdk/python/airbyte_cdk/sources/declarative/config_component_schema.json


we'll need to make sure to wire up the builds to regen if this referenced file changes

@sherifnada Do you have an example or know how that would be done? If not I can make a ticket

sherifnada · 2022-10-14T23:27:09Z

connector-builder-server/src/main/openapi/openapi.yaml

+          type: integer
+          description: Number of pages to read from the source for each slice
+          default: 1
+        state:


should this be a proper STATE message?

Good callout! Yes it should be, will update

sherifnada · 2022-10-14T23:28:24Z

connector-builder-server/src/main/openapi/openapi.yaml

+        stream:
+          type: string
+          description: Name of the stream to read
+        numPages:


this currently does not exist as a first-class-concept in the CDK i.e: we'd need to make some non-trivial changes tot he backend to do this. Any reason we can't just use limit to control the number of records returned?

The idea was that users may want to test out pagination by first fetching the first page of data, then clicking a "Next page" button to load the next page. Since we don't have any good way of fetching a specific page, we thought that it would be easier to implement changes to the CDK to limit the number of pages.

So when the user clicks "Next page", we would fetch 2 pages of data and display the second. Then if they click "Next page" again, we would fetch 3 pages of data and display the third, and so on.

The other motivation for this is that we may not want to fetch all pages of data every time the user clicks "Test", because some APIs may have hundreds or thousands of pages, which could make requesting all of those pages take many seconds or minutes, which wouldn't be a great user experience. So we wanted to limit the number of pages actually being fetched from the source API to mitigate this and make the "Test" experience snappy.

Though, I think it would be fine for the first MVP two not implement this page-limiting behavior and just fetch all pages instead, and maybe this is something we can add later if there is a need for it. So I can comment out this parameter for now

sherifnada · 2022-10-14T23:30:20Z

connector-builder-server/src/main/openapi/openapi.yaml

+                  type: object
+                  required:
+                    - airbyteMessages
+                    - request


@girarda is there already a plan for how these objects specifically will be returned from the CDK?

I'm pretty sure that'll require changes on multiple layers of the CDK because Sources are not aware of HTTP requests and responses.

I have a rough proposal here: #17839.

Will bring it up for grooming

* master: (304 commits) Bump helm chart version reference to 0.40.27 (#18152) Bump helm chart version reference to 0.40.26 (#18094) Update deployment.yaml (#18151) Publishes Postgres, MySQL, MSSQL source with changes from #18041 (#18086) Fix minor DBT Cloud Errors. (#18147) Sentry Integration : Stop reporting all non system-error error types. (#18133) Docs: Fix backoff stategy docs (#18143) 🐛 Destination GCS: Fix error logs to log 'Gcs' rather than 'AWS' (#17901) Add openAPI spec for Connector Builder Server (#17535) Alex/mvp UI for dbt cloud integration (#18095) increased timeout for sat tests (#18128) Bmoric/remove dep connector worker (#17977) `recordsRead` should be a long (#18123) doc_update_oath_issue_gsc (#17967) 🎉 Source Zendesk Chat: engagements data fix infinity looping + gradlew format (#18121) 🐛 Source Zendesk Chat: engagements data fix infinity looping (#17745) Custom APM Tracing (#17947) 11679 BigQuery-Denormalized Destination: improve code coverage (#17827) increased timeout for sat tests (#18114) docs: clarify language (#18090) ...

* add openapi spec * add 'a' * rename stream test to stream read and add logs * move logs * group results by slice/page and add more request params * address PR/zoom feedback * move request and response into their own definitions * add sliceDescriptor * fix type of state prop and remove numPages * change order

lmossman added 2 commits October 3, 2022 15:35

add openapi spec

c8acf88

add 'a'

72c7a0e

lmossman commented Oct 3, 2022

View reviewed changes

lmossman added 2 commits October 3, 2022 15:53

rename stream test to stream read and add logs

d09e168

move logs

a32a05a

lmossman commented Oct 3, 2022

View reviewed changes

connector-builder-server/src/main/openapi/openapi.yaml Show resolved Hide resolved

lmossman commented Oct 3, 2022

View reviewed changes

connector-builder-server/src/main/openapi/openapi.yaml Show resolved Hide resolved

lmossman requested review from brianjlai and girarda October 3, 2022 23:06

lmossman commented Oct 4, 2022

View reviewed changes

group results by slice/page and add more request params

2b52fb0

lmossman commented Oct 4, 2022

View reviewed changes

girarda reviewed Oct 5, 2022

View reviewed changes

brianjlai reviewed Oct 6, 2022

View reviewed changes

address PR/zoom feedback

423097e

lmossman commented Oct 7, 2022

View reviewed changes

lmossman marked this pull request as ready for review October 7, 2022 00:00

Merge branch 'master' into lmossman/connector-builder-openapi-spec

f6b6cf8

lmossman requested review from girarda and brianjlai October 7, 2022 00:01

girarda reviewed Oct 7, 2022

View reviewed changes

move request and response into their own definitions

b7daa82

lmossman requested a review from girarda October 10, 2022 23:50

lmossman mentioned this pull request Oct 11, 2022

Implement connector builder server #17814

Closed

add sliceDescriptor

c63fd4c

girarda approved these changes Oct 12, 2022

View reviewed changes

sherifnada approved these changes Oct 14, 2022

View reviewed changes

lmossman added 2 commits October 17, 2022 13:51

fix type of state prop and remove numPages

e06e325

change order

5220cad

lmossman merged commit 19a296c into master Oct 18, 2022

lmossman deleted the lmossman/connector-builder-openapi-spec branch October 18, 2022 22:42

octavia-squidington-iii mentioned this pull request Oct 20, 2022

Bump Airbyte version from 0.40.15 to 0.40.16 #18268

Merged

Add openAPI spec for Connector Builder Server #17535

Add openAPI spec for Connector Builder Server #17535

Conversation

lmossman commented Oct 3, 2022

What

How

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmossman commented Oct 4, 2022

Choose a reason for hiding this comment

lmossman Oct 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmossman Oct 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmossman commented Oct 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

girarda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmossman Oct 4, 2022 •

edited

Loading

lmossman Oct 5, 2022 •

edited

Loading