Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update incremental docs #1364

Merged
merged 5 commits into from
Dec 17, 2020
Merged

update incremental docs #1364

merged 5 commits into from
Dec 17, 2020

Conversation

cgardens
Copy link
Contributor

What

  • User Story: I read that Airbyte now supports "Incremental - Append"; what's that? (this person may not be terribly technical and / or is not looking to spend a lot of time reading dense material.)

How

  • Restructured and reworded incremental docs. My goal is that anyone who hears that we now support incremental will be able to read from the beginning all of the examples of what incremental looks like without it requiring too much thought or technical know how.
  • The last two sections source-defined cursor and user defined cursor remain more technical and I am not really expecting anyone to read those sections on their first skim through. I think they still need to be documented for when some sits down to try to understand our data model.


In this flavor of incremental records in the warehouse will never be deleted or mutated. A new copy of any new or updated records is appended to the data in the warehouse. This means you can find multiple copies of the same record twice in the warehouse and will need to de-duplicate them yourself. We provided an "at least once" guarantee of replicating each record that is present when the sync runs.
In this flavor of incremental, records in the warehouse will never be deleted or mutated. A copy of each new or updated records is appended to the data in the warehouse. This means you can find multiple copies of the same record twice in the warehouse and will need to de-duplicate them yourself. We provided an "at least once" guarantee of replicating each record that is present when the sync runs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in DBT, we could deduplicate rows in the future

Or point to the user how to do it in the DBT models we generate...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we should write this tutorial! @cgardens do we already have an issue for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no we don't have one yet. please go ahead and make it!

@@ -51,7 +45,7 @@ At the end of this incremental sync the data warehouse would now contain:

### Updating a Record

Let's assume that our warehouse contains all of the data that it did at the end of the previous section. Now unfortunately the king and queen lose their heads. Let's see that delta:
Let's assume that our warehouse contains all the data that it did at the end of the previous section. Now unfortunately the king and queen lose their heads. Let's see that delta:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💀

@@ -72,6 +66,16 @@ The output we expect to see in the warehouse is as follows.
]
```

## Source-Defined Cursor

Some sources are able to determine the cursor that the use without any user input. For example, in the exchange rates api source, the source determines that date field should be used to determine the last record that was synced. In these cases, the source will set the `source_defined_cursor` attribute in the `AirbyteStream` (You can find a more detailed description of the configuration data model [here](catalog.md)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Some sources are able to determine the cursor that the use without any user input. For example, in the exchange rates api source, the source determines that date field should be used to determine the last record that was synced. In these cases, the source will set the `source_defined_cursor` attribute in the `AirbyteStream` (You can find a more detailed description of the configuration data model [here](catalog.md)).
Some sources are able to determine the cursor to use without any user input. For example, in the exchange rates api source, the source determines that date field should be used to determine the last record that was synced. In these cases, the source will set the `source_defined_cursor` attribute in the `AirbyteStream` (You can find a more detailed description of the configuration data model [here](catalog.md)).


Some sources cannot define the cursor without user input. For example, in the postgres source, the user needs to choose which column in a database table they want to user as the `cursor field`. The author of the source cannot predict this. In these cases the user sets the `cursor_field` in the `ConfiguredAirbyteStream`. (You can find a more detailed description of the configuration data model [here](catalog.md)).

In some cases, the source may propose a `default_cursor_field` in the `AirbyteStream`. When it does, if the user does not specify a `cursor_field` in the `ConfiguredAirbyteStream`, Airbyte will fallback on the default provided by the source. The user is allowed to override the source's `default_cursor_field` by setting the `cursor_field` value in the `ConfiguredAirbyteStream`, but they CANNOT override the `cursor_field` specified in an `AirbyteStream`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What info do I get with the last part of the sentence? Is it smth controversial?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm removing. not relevant to someone just using the UI.

@cgardens cgardens merged commit e5fe44e into master Dec 17, 2020
@cgardens cgardens deleted the cgardens/update_incremental_docs branch December 17, 2020 17:38
davydov-d added a commit that referenced this pull request Jan 26, 2023
davydov-d added a commit that referenced this pull request Jan 26, 2023
davydov-d added a commit that referenced this pull request Jan 27, 2023
* Source Stripe: fix  field name for subscription stream

* Source Stripe: bump version

* Source Stripe: update changelog

* #1364 Source Stripe: fix stream schemas

* #1364 source Stripe: bump major version

* auto-bump connector version

---------

Co-authored-by: Denys Davydov <davydov.den18@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants