-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding incremental to the data model #998
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it overall. Not mergable as-is so I'm just leaving a comment.
airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_message.yaml
Outdated
Show resolved
Hide resolved
airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_message.yaml
Outdated
Show resolved
Hide resolved
airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_message.yaml
Outdated
Show resolved
Hide resolved
airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_message.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modulo name comments I think this LGTM. So we are expecting users to configure incremental/full-refresh at the stream level? I think this makes sense. however our current configs expect this to be configured at the connection level so we should adjust our behavior accordingly.
airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_message.yaml
Outdated
Show resolved
Hide resolved
airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_message.yaml
Outdated
Show resolved
Hide resolved
airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_message.yaml
Outdated
Show resolved
Hide resolved
airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_message.yaml
Outdated
Show resolved
Hide resolved
airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_message.yaml
Outdated
Show resolved
Hide resolved
docs/architecture/incremental.md
Outdated
@@ -0,0 +1,22 @@ | |||
# Configuration: Streams & Incremental Syncs | |||
|
|||
## AirbyteStream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea. It is a bit confusing to use AirbyteCatalog for something that come from the source and something that is fed back to the source.
What about we do (while reusing types):
- sources produce an
AirbyteCatalog
which only contains a list of stream - sources consume a
ConfiguredAirbyteCatalog
which contains the list ofConfiguredAirbyteStreams
?
This way it is clearer that one is R/O while the other one got some configuration applied to it. ConfiguredAirbyteCatalog should ideally be backward compatible with AirbyteCatalog (not sure how long that can hold) which would mean: sync everything with full refresh.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bigger change. We can try it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will try this tomorrow. think it's bigger than i have brain to do now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this just changes the project from something that could be done last night to something that'll take a couple days. i think it's better. i had considered doing this but thought it was worth the time since this is the chokepoint for all work on incremental. i'll reallocate folks to other stuff while we massage this. will try this approach today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implemented this.
Complexity. Will there be edge cases that could be painful to handle?
On Mon, Nov 16, 2020 at 10:47 PM Charles ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_message.yaml
<#998 (comment)>:
> type: object
- existingJavaType: com.fasterxml.jackson.databind.JsonNode
+ required:
+ - json_schema
+ - supported_sync_modes
+ properties:
+ json_schema:
+ description: Stream schema using Json Schema specs.
+ type: object
+ existingJavaType: com.fasterxml.jackson.databind.JsonNode
+ supported_sync_modes:
+ type: array
+ items:
+ "$ref": "#/definitions/SyncMode"
+ default_cursor_field:
what is the risk you're worried about?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#998 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIZ2IKBMCYAOJ4H7UODQ5DSQIMAPANCNFSM4TXZ6PWA>
.
--
*Michel Tricot*
Co-Founder & CEO @ Daxtarity <https://daxtarity.com/>
415-307-4864
* <https://www.linkedin.com/in/micheltricot/>*
|
Actually both work. Just a suggestion :)
Recency works well with the concept of incremental but cursor is a bit more
generic. No strong feelings about this one
On Mon, Nov 16, 2020 at 10:48 PM Charles ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_message.yaml
<#998 (comment)>:
> + items:
+ "$ref": "#/definitions/SyncMode"
+ default_cursor_field:
+ description: Path to the field that will be used to determine if a record is new or modified since the last sync. If not provided by the source, the end user will have to specify the comparable themselves.
+ type: array
+ items: string
+ configuration:
+ description: Contains information about how the data should be synced. this section is configured by the end-user and is never touched by the source.
+ required:
+ - sync_mode
+ properties:
+ # backwards compatibility: if not present, assume full_refresh.
+ sync_mode:
+ "$ref": "#/definitions/SyncMode"
+ default: full_refresh
+ cursor_field:
can you describe why you prefer recency over cursor?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#998 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIZ2IO3NE7BBFDFWW7BLL3SQIMCFANCNFSM4TXZ6PWA>
.
--
*Michel Tricot*
Co-Founder & CEO @ Daxtarity <https://daxtarity.com/>
415-307-4864
* <https://www.linkedin.com/in/micheltricot/>*
|
@michel-tricot - here's the version with catalog and configured catalog split. working on propagating this throughout the system now. |
@@ -40,15 +41,15 @@ | |||
// todo (cgardens) - hack, remove after we've gotten rid of Schema object. | |||
public class AirbyteProtocolConverters { | |||
|
|||
public static AirbyteCatalog toCatalog(Schema schema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only ever need to convert from schema into a configured catalog (never the R/O one).
--- | ||
"$schema": http://json-schema.org/draft-07/schema# | ||
"$id": https://github.com/airbytehq/airbyte/blob/master/airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_protocol.yaml | ||
title: AirbyteProtocol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file needed to be renamed and restructured so that java code gen would pick up the new ConfiguredAirbyteCatalog
.
- The issue is that the codegen starts from the "main" object in the file and generates all of its dependencies.
ConfiguredAirbyteCatalog
was not a dependency of AirbyteMessage, so it was not being generated. - We need
ConfiguredAirbyteCatalog
in this file because it shares an enum withAirbyteCatalog
and the python code gen can't handle structs with dependencies in different files.
* of a JsonSchema file (instead of the main object in that file). | ||
* @return schema object processed from across all dependency files. | ||
*/ | ||
public static JsonNode getSchema(final File schemaFile, String definitionStructName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the structure of the airbyte_message / airbyte_protocol file means that we need to get the schema from an object that is not the main object in the jsonschema file. So needed to rejigger this internal lib to handle the case where we want to put the jsonschema of an object defined in "definitions" instead of the main file.
@@ -41,6 +41,6 @@ | |||
* will always be called once regardless of success or failure. | |||
* @throws Exception - any exception. | |||
*/ | |||
DestinationConsumer<AirbyteMessage> write(JsonNode config, AirbyteCatalog catalog) throws Exception; | |||
DestinationConsumer<AirbyteMessage> write(JsonNode config, ConfiguredAirbyteCatalog catalog) throws Exception; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks so much better!
Terrific! |
cool. going to start pushing new images then! |
This reverts commit 29692b4.
1b0b7c0
to
80d956a
Compare
What
Checklist
Misc
Please no more requests for name changes in AirbyteCatalog and AirbyteStream