-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add openAPI spec for Connector Builder Server #17535
Changes from all commits
c8acf88
72c7a0e
d09e168
a32a05a
2b52fb0
423097e
f6b6cf8
b7daa82
c63fd4c
e06e325
5220cad
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,243 @@ | ||
openapi: 3.0.0 | ||
info: | ||
description: | | ||
Connector Builder Server API | ||
|
||
version: "1.0.0" | ||
title: Connector Builder Server API | ||
contact: | ||
email: contact@airbyte.io | ||
license: | ||
name: MIT | ||
url: "https://opensource.org/licenses/MIT" | ||
externalDocs: | ||
description: Find out more about Connector Builder | ||
url: "https://docs.airbyte.com/connector-development/config-based/overview/" | ||
|
||
paths: | ||
/v1/stream/read: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should we add endpoints for check and discovery? An alternative would be to specify the operation as part of the request, but I think they're different enough to warrant different endpoints There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note from discussion: check/discover/spec endpoints can be added in a later phase. Phase 1 will only be focused on read. Will add a note about this to the tech spec as well |
||
post: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how come we want these endpoints defined as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I was sort of following the convention in our main Airbyte API where we use post calls and pass all of the inputs in the request body. Especially for this API, since we need to pass the entire connector definition contents to the server I felt like that needed to go in a request body as opposed to being something like a URL parameter. And we can't have a request bodies for get requests |
||
summary: Reads a specific stream in the source. TODO in a later phase - only read a single slice of data. | ||
requestBody: | ||
content: | ||
application/json: | ||
schema: | ||
$ref: "#/components/schemas/StreamReadRequestBody" | ||
required: true | ||
responses: | ||
"200": | ||
description: Successful operation | ||
content: | ||
application/json: | ||
schema: | ||
$ref: "#/components/schemas/StreamRead" | ||
"400": | ||
$ref: "#/components/responses/ExceptionResponse" | ||
"422": | ||
$ref: "#/components/responses/InvalidInputResponse" | ||
/v1/streams/list: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This endpoint solves two problems listed in the tech spec:
Since the backend implementation of this endpoint is just parsing the connector definition file to extract the stream info, it should be very fast. This is important because this request will likely need to be submitted on every change to the yaml contents in the webapp (potentially after some short delay), so that the stream list and URLs are always accurate. |
||
post: | ||
summary: List all streams present in the connector definition, along with their specific request URLs | ||
requestBody: | ||
content: | ||
application/json: | ||
schema: | ||
$ref: "#/components/schemas/StreamsListRequestBody" | ||
required: true | ||
responses: | ||
"200": | ||
description: Successful operation | ||
content: | ||
application/json: | ||
schema: | ||
$ref: "#/components/schemas/StreamsListRead" | ||
"400": | ||
$ref: "#/components/responses/ExceptionResponse" | ||
"422": | ||
$ref: "#/components/responses/InvalidInputResponse" | ||
|
||
components: | ||
schemas: | ||
StreamRead: | ||
type: object | ||
required: | ||
- slices | ||
properties: | ||
slices: | ||
type: array | ||
description: The stream slices returned from the read command | ||
items: | ||
type: object | ||
required: | ||
- sliceDescriptor | ||
- pages | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should have another field describing the slice (it's defined as a json object) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @girarda By "describing the slice", are you talking about the state object that is emitted at the end of the slice? If so, that will be contained in the last page's Or is there some other information about the slice that should be returned here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean the exact slice object
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I understand now. Yeah, I think that would be useful information to show to the user. I will add a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
properties: | ||
sliceDescriptor: | ||
type: object | ||
description: 'An object describing the current slice, e.g. {start_time: "2021-01-01", end_time: "2021-01-31"}' | ||
pages: | ||
type: array | ||
description: The pages returned from the read command | ||
items: | ||
type: object | ||
required: | ||
- airbyteMessages | ||
- request | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @girarda is there already a plan for how these objects specifically will be returned from the CDK? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm pretty sure that'll require changes on multiple layers of the CDK because I have a rough proposal here: #17839. Will bring it up for grooming |
||
- response | ||
properties: | ||
airbyteMessages: | ||
type: array | ||
description: The RECORD/STATE/LOG AirbyteMessages coming from the read operation for this page | ||
items: | ||
$ref: "#/components/schemas/AirbyteProtocol/definitions/AirbyteMessage" | ||
Comment on lines
+88
to
+92
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since I was able to figure out how to reference the actual airbyte protocol yaml definition for AirbyteMessage in this API spec, I decided to just consolidate the |
||
request: | ||
sherifnada marked this conversation as resolved.
Show resolved
Hide resolved
|
||
$ref: "#/components/schemas/HttpRequest" | ||
response: | ||
$ref: "#/components/schemas/HttpResponse" | ||
StreamReadRequestBody: | ||
type: object | ||
required: | ||
- definition | ||
- stream | ||
properties: | ||
definition: | ||
$ref: "#/components/schemas/ConnectorDefinitionBody" | ||
description: The config-based connector definition contents | ||
stream: | ||
type: string | ||
description: Name of the stream to read | ||
state: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this be a proper STATE message? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good callout! Yes it should be, will update |
||
$ref: "#/components/schemas/AirbyteProtocol/definitions/AirbyteStateMessage" | ||
# --- Potential addition for a later phase --- | ||
# numPages: | ||
# type: integer | ||
# description: Number of pages to read from the source for each slice | ||
# default: 1 | ||
HttpRequest: | ||
type: object | ||
required: | ||
- url | ||
properties: | ||
url: | ||
type: string | ||
description: URL that the request was sent to | ||
parameters: | ||
type: object | ||
description: The request parameters that were set on the HTTP request, if any | ||
body: | ||
type: object | ||
description: The body of the HTTP request, if present | ||
headers: | ||
type: object | ||
description: The headers of the HTTP request, if any | ||
HttpResponse: | ||
type: object | ||
required: | ||
- status | ||
properties: | ||
status: | ||
type: integer | ||
description: The status of the response | ||
body: | ||
type: object | ||
description: The body of the HTTP response, if present | ||
headers: | ||
type: object | ||
description: The headers of the HTTP response, if any | ||
ConnectorDefinitionBody: | ||
$ref: ../../../../airbyte-cdk/python/airbyte_cdk/sources/declarative/config_component_schema.json | ||
lmossman marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we'll need to make sure to wire up the builds to regen if this referenced file changes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sherifnada Do you have an example or know how that would be done? If not I can make a ticket |
||
AirbyteProtocol: | ||
$ref: ../../../../airbyte-protocol/protocol-models/src/main/resources/airbyte_protocol/airbyte_protocol.yaml | ||
StreamsListRequestBody: | ||
type: object | ||
required: | ||
- definition | ||
properties: | ||
definition: | ||
$ref: "#/components/schemas/ConnectorDefinitionBody" | ||
description: The config-based connector definition contents | ||
StreamsListRead: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this how the server will share the schema (whenever we support schema detection)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think so, because detecting the schema will require making an API call to the source. This /streams/list endpoint is meant to be very fast, as it will likely be called on every change to the yaml editor contents in order to keep the stream list up to date. I think for schema detection, we will likely just want to do that as part of the /stream/read API call, e.g. add another property to the return value of that function that contains the schema of the results. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would it be valuable to also return the stream's slices so they can be displayed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean exactly by "return the stream's slices"? Are you suggesting that this should return all of the possible slices that are configured for a given stream, e.g. if they are using a list slicer? And this would be displayed somewhere in the testing panel? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Discussed this over zoom - the idea here is to have this return a list of all the slices that will be read for this stream, so that they can be displayed somewhere to provide visual feedback on how far through the stream slicing the user has made it. Added a commented-out block here as this can be done in a future phase (will add a note to the tech spec as well) |
||
type: object | ||
required: | ||
- streams | ||
properties: | ||
streams: | ||
type: array | ||
items: | ||
type: object | ||
description: The stream names present in the connector definition | ||
required: | ||
- name | ||
- url | ||
properties: | ||
name: | ||
type: string | ||
description: The name of the stream | ||
url: | ||
type: string | ||
format: uri | ||
description: The URL to which read requests will be made for this stream | ||
# --- Potential addition for a later phase --- | ||
# slices: | ||
# type: array | ||
# description: list of slices that will be retrieved for this stream | ||
# items: | ||
# type: object | ||
|
||
# The following exception structs were copied from airbyte-api/src/main/openapi/config.yaml | ||
lmossman marked this conversation as resolved.
Show resolved
Hide resolved
|
||
InvalidInputProperty: | ||
type: object | ||
required: | ||
- propertyPath | ||
properties: | ||
propertyPath: | ||
type: string | ||
invalidValue: | ||
type: string | ||
message: | ||
type: string | ||
KnownExceptionInfo: | ||
type: object | ||
required: | ||
- message | ||
properties: | ||
message: | ||
type: string | ||
exceptionClassName: | ||
type: string | ||
exceptionStack: | ||
type: array | ||
items: | ||
type: string | ||
InvalidInputExceptionInfo: | ||
type: object | ||
required: | ||
- message | ||
- validationErrors | ||
properties: | ||
message: | ||
type: string | ||
exceptionClassName: | ||
type: string | ||
exceptionStack: | ||
type: array | ||
items: | ||
type: string | ||
validationErrors: | ||
type: array | ||
items: | ||
$ref: "#/components/schemas/InvalidInputProperty" | ||
|
||
responses: | ||
InvalidInputResponse: | ||
description: Input failed validation | ||
content: | ||
application/json: | ||
schema: | ||
$ref: "#/components/schemas/InvalidInputExceptionInfo" | ||
ExceptionResponse: | ||
description: Exception occurred; see message for details. | ||
content: | ||
application/json: | ||
schema: | ||
$ref: "#/components/schemas/KnownExceptionInfo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@girarda @brianjlai see this commit for the changes to this endpoint coming from our Backlog Grooming discussion: 2b52fb0
Here's a screenshot of the updated Swagger view of this spec since it's a little easier to see what the final product looks like there:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a nested grouping here of pages nested inside of slices, with a sliceId and pageId in each nested object.
What do you think of this structure? I thought it made sense because if a user is doing both stream slicing and pagination, then seeing each page broken down by slice makes sense. If they are not using one of those, then there will just be one element in that level, e.g. if they are not using pagination then there will just be a single element in
pages
.I also wasn't super sure about
sliceId
andpageId
-- I figured we wanted some way to identify which slices and which pages are which, but I wasn't sure if we have something to use as the "IDs" here. Since they are just strings, maybe we can just put in the URL parameter / header that changes for each slice/page?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in practice, stream slices correspond to states, so we could identify them using the state object, for example:
we don't currently persist or keep track of the number of pages within a slice, but each page corresponds to a single request/response so we could identify them using a counter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I think that makes sense as a way to differentiate them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to just remove the
sliceId
andpageId
fields from this. I didn't feel like they would add much value, because the slice "state" can be extracted from the results, and the page number is basically just the index of the page in thepages
arrayThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.