Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add openAPI spec for Connector Builder Server #17535

Merged
merged 11 commits into from
Oct 18, 2022
217 changes: 217 additions & 0 deletions connector-builder-server/src/main/openapi/openapi.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
openapi: 3.0.0
info:
description: |
Connector Builder Server API

version: "1.0.0"
title: Connector Builder Server API
contact:
email: contact@airbyte.io
license:
name: MIT
url: "https://opensource.org/licenses/MIT"
externalDocs:
description: Find out more about Connector Builder
url: "https://docs.airbyte.com/connector-development/config-based/overview/"

paths:
/v1/stream/read:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@girarda @brianjlai see this commit for the changes to this endpoint coming from our Backlog Grooming discussion: 2b52fb0
Here's a screenshot of the updated Swagger view of this spec since it's a little easier to see what the final product looks like there:
Screen Shot 2022-10-04 at 1 13 16 PM

Copy link
Contributor Author

@lmossman lmossman Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a nested grouping here of pages nested inside of slices, with a sliceId and pageId in each nested object.

What do you think of this structure? I thought it made sense because if a user is doing both stream slicing and pagination, then seeing each page broken down by slice makes sense. If they are not using one of those, then there will just be one element in that level, e.g. if they are not using pagination then there will just be a single element in pages.

I also wasn't super sure about sliceId and pageId -- I figured we wanted some way to identify which slices and which pages are which, but I wasn't sure if we have something to use as the "IDs" here. Since they are just strings, maybe we can just put in the URL parameter / header that changes for each slice/page?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in practice, stream slices correspond to states, so we could identify them using the state object, for example:

{
"repository": "airbytehq/integration-test",
"created_at": "2021-06-29T03:44:45Z"
}

we don't currently persist or keep track of the number of pages within a slice, but each page corresponds to a single request/response so we could identify them using a counter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I think that makes sense as a way to differentiate them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to just remove the sliceId and pageId fields from this. I didn't feel like they would add much value, because the slice "state" can be extracted from the results, and the page number is basically just the index of the page in the pages array

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the slice "state" can be extracted from the results
the slice represents a range ("2021-05-29T03:44:45Z" - "2021-06-29T03:44:45Z"), whereas the output state only represents the end state.
Do we only care about showing the end state here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add endpoints for check and discovery? An alternative would be to specify the operation as part of the request, but I think they're different enough to warrant different endpoints

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note from discussion: check/discover/spec endpoints can be added in a later phase. Phase 1 will only be focused on read. Will add a note about this to the tech spec as well

post:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come we want these endpoints defined as POST calls? Because we're using a request body? Or I guess a real sync is sort of a POST call since the end result has records written to a destination, even if the test one doesnt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was sort of following the convention in our main Airbyte API where we use post calls and pass all of the inputs in the request body. Especially for this API, since we need to pass the entire connector definition contents to the server I felt like that needed to go in a request body as opposed to being something like a URL parameter. And we can't have a request bodies for get requests

summary: Reads a specific stream in the source
requestBody:
content:
application/json:
schema:
$ref: "#/components/schemas/StreamReadRequestBody"
required: true
responses:
"200":
description: Successful operation
content:
application/json:
schema:
$ref: "#/components/schemas/StreamRead"
"400":
$ref: "#/components/responses/ExceptionResponse"
"422":
$ref: "#/components/responses/InvalidInputResponse"
/v1/streams/list:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint solves two problems listed in the tech spec:

  1. (tech spec link) allows the webapp to populate the stream dropdown without having to manually parse the connector definition yaml file
  2. (tech spec link) provides the URLs for each stream so that the webapp does not need to manually parse the connector definition yaml file

Since the backend implementation of this endpoint is just parsing the connector definition file to extract the stream info, it should be very fast. This is important because this request will likely need to be submitted on every change to the yaml contents in the webapp (potentially after some short delay), so that the stream list and URLs are always accurate.

post:
summary: List all streams present in the connector definition, along with their specific request URLs
requestBody:
content:
application/json:
schema:
$ref: "#/components/schemas/StreamsListRequestBody"
required: true
responses:
"200":
description: Successful operation
content:
application/json:
schema:
$ref: "#/components/schemas/StreamsListRead"
"400":
$ref: "#/components/responses/ExceptionResponse"
"422":
$ref: "#/components/responses/InvalidInputResponse"

components:
schemas:
StreamRead:
type: object
required:
- results
properties:
results:
type: array
description: The RECORD and STATE AirbyteMessages coming from the read operation
items:
type: string
logs:
type: array
description: The LOG AirbyteMessages coming from the read operation
items:
type: string
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set these to string, because I didn't seen an obvious way to reference just the AirbyteMessage or AirbyteLogMessage schema from the protocol file, but I'm open to suggestions here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured out how to do this after all! Just needed to reference the whole airbyte protocol in one schema, and then add more slashes when referencing that schema to access its children

httpRequests:
type: array
description: HTTP requests sent to the source API, and their responses
items:
type: object
required:
- request
- response
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to combine a request and the response for that request into a single object here, since I think every request should have a response and it may be more helpful to our users if they are presented as such. But let me know if anyone disagrees with this approach, or if it is overly complex to implement for any reason.

properties:
request:
type: object
description: The HTTP request sent to the source API
required:
- url
- headers
properties:
url:
type: string
description: URL that the request was sent to
body:
type: object
description: The body of the HTTP request, if present
headers:
type: object
description: The headers of the HTTP request
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't super sure if these should be JSON objects or just strings. I went with JSON objects here because it looks like that's what they are in the debug log messages of the read command, but open to feedback here

response:
type: object
description: The response from the source API
required:
- status
- headers
properties:
status:
type: integer
description: The status of the response
body:
type: object
description: The body of the HTTP response, if present
headers:
type: object
description: The headers of the HTTP response
StreamReadRequestBody:
type: object
required:
- definition
- stream
properties:
definition:
$ref: "#/components/schemas/ConnectorDefinitionBody"
description: The config-based connector definition contents
stream:
type: string
description: Name of the stream to read
ConnectorDefinitionBody:
$ref: ../../../../airbyte-cdk/python/airbyte_cdk/sources/declarative/config_component_schema.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need to make sure to wire up the builds to regen if this referenced file changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherifnada Do you have an example or know how that would be done? If not I can make a ticket

StreamsListRequestBody:
type: object
required:
- definition
properties:
definition:
$ref: "#/components/schemas/ConnectorDefinitionBody"
description: The config-based connector definition contents
StreamsListRead:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this how the server will share the schema (whenever we support schema detection)?

Copy link
Contributor Author

@lmossman lmossman Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, because detecting the schema will require making an API call to the source.

This /streams/list endpoint is meant to be very fast, as it will likely be called on every change to the yaml editor contents in order to keep the stream list up to date. I think for schema detection, we will likely just want to do that as part of the /stream/read API call, e.g. add another property to the return value of that function that contains the schema of the results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be valuable to also return the stream's slices so they can be displayed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean exactly by "return the stream's slices"? Are you suggesting that this should return all of the possible slices that are configured for a given stream, e.g. if they are using a list slicer? And this would be displayed somewhere in the testing panel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this over zoom - the idea here is to have this return a list of all the slices that will be read for this stream, so that they can be displayed somewhere to provide visual feedback on how far through the stream slicing the user has made it. Added a commented-out block here as this can be done in a future phase (will add a note to the tech spec as well)

type: object
required:
- streams
properties:
streams:
type: array
items:
type: object
description: The stream names present in the connector definition
required:
- name
- url
properties:
name:
type: string
description: The name of the stream
url:
type: string
format: uri
description: The URL to which read requests will be made for this stream

# The following exception structs were copied from airbyte-api/src/main/openapi/config.yaml
InvalidInputProperty:
type: object
required:
- propertyPath
properties:
propertyPath:
type: string
invalidValue:
type: string
message:
type: string
KnownExceptionInfo:
type: object
required:
- message
properties:
message:
type: string
exceptionClassName:
type: string
exceptionStack:
type: array
items:
type: string
InvalidInputExceptionInfo:
type: object
required:
- message
- validationErrors
properties:
message:
type: string
exceptionClassName:
type: string
exceptionStack:
type: array
items:
type: string
validationErrors:
type: array
items:
$ref: "#/components/schemas/InvalidInputProperty"

responses:
InvalidInputResponse:
description: Input failed validation
content:
application/json:
schema:
$ref: "#/components/schemas/InvalidInputExceptionInfo"
ExceptionResponse:
description: Exception occurred; see message for details.
content:
application/json:
schema:
$ref: "#/components/schemas/KnownExceptionInfo"