Skip to content

Commit

Permalink
[Docs] Update Getting Started (#34237)
Browse files Browse the repository at this point in the history
Co-authored-by: Tim Roes <tim@airbyte.io>
  • Loading branch information
nataliekwong and timroes authored Jan 18, 2024
1 parent 1d6e628 commit 14c6199
Show file tree
Hide file tree
Showing 31 changed files with 150 additions and 78 deletions.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 9 additions & 8 deletions docs/cloud/managing-airbyte-cloud/configuring-connections.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
products: all
---

# Configuring connections
# Configuring Connections

A connection links a source to a destination and defines how your data will sync. After you have created a connection, you can modify any of the configuration settings or stream settings.

Expand All @@ -28,11 +28,12 @@ You can configure the following settings:

| Setting | Description |
|--------------------------------------|-------------------------------------------------------------------------------------|
| [Replication frequency](/using-airbyte/core-concepts/sync-schedules.md) | How often the data syncs |
| [Destination namespace](/using-airbyte/core-concepts/namespaces.md) | Where the replicated data is written |
| Destination stream prefix | How you identify streams from different connectors |
| [Detect and propagate schema changes](/cloud/managing-airbyte-cloud/manage-schema-changes.md) | How Airbyte handles syncs when it detects schema changes in the source |
| [Connection Data Residency](/cloud/managing-airbyte-cloud/manage-data-residency.md) | Where data will be processed |
| Connection Name | A custom name for your connection |
| [Replication frequency](/using-airbyte/core-concepts/sync-schedules.md) | How often data syncs (can be scheduled, cron, API-triggered or manual) |
| [Destination namespace](/using-airbyte/core-concepts/namespaces.md) | Where the replicated data is written to in the destination |
| Destination stream prefix | A prefix added to each table name in the destination |
| [Detect and propagate schema changes](/cloud/managing-airbyte-cloud/manage-schema-changes.md) | How Airbyte handles schema changes in the source |
| [Connection Data Residency](/cloud/managing-airbyte-cloud/manage-data-residency.md) | Where data will be processed (Cloud only) |

## Modify streams in your connection

Expand Down Expand Up @@ -80,9 +81,9 @@ Source-defined cursors and primary keys are selected automatically and cannot be

7. The **Stream configuration changed** dialog displays. This gives you the option to reset streams when you save the changes.

:::caution
:::tip

Airbyte recommends that you reset streams. A reset will delete data in the destination of the affected streams and then re-sync that data. Skipping a reset is discouraged and might lead to unexpected behavior.
When editing the stream configuration, Airbyte recommends that you reset streams. A reset will delete data in the destination of the affected streams and then re-sync that data. Skipping a reset is discouraged and might lead to unexpected behavior.

:::

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
products: all
---

# Manage the connection state
# Modifying connection state

The connection state provides additional information about incremental syncs. It includes the most recent values for the global or stream-level cursors, which can aid in debugging or determining which data will be included in the next sync.

Expand Down
14 changes: 7 additions & 7 deletions docs/cloud/managing-airbyte-cloud/manage-data-residency.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,21 @@
products: cloud
---

# Manage data residency
# Setting data residency

In Airbyte Cloud, you can set the default data residency and choose the data residency for individual connections, which can help you comply with data localization requirements.
In Airbyte Cloud, you can set the default data residency for your workspace and also set the the data residency for individual connections, which can help you comply with data localization requirements.

## Choose your default data residency
## Choose your workspace default data residency

Default data residency allows you to choose where your data is processed. Set the default data residency before creating a new source or connection so workflows that rely on the default data residency, such as fetching the schema or testing the source or destination, can process data in the correct region.
Setting a default data residency allows you to choose where your data is processed. Set the default data residency **before** creating a new source or connection so that subsequent workflows that rely on the default data residency, such as fetching the schema or testing the source or destination, can process data in the correct region.

:::note

While the data is processed in a data plane of the chosen residency, the cursor and primary key data is stored in the US control plane. If you have data that cannot be stored in the US, do not use it as a cursor or primary key.

:::

When you set the default data residency, it applies to new connections only. If you do not set the default data residency, the [Airbyte Default](configuring-connections.md) region is used. If you want to change the data residency for a connection, you can do so in its [connection settings](configuring-connections.md).
When you set the default data residency, it applies your preference to new connections only. If you do not adjust the default data residency, the [Airbyte Default](configuring-connections.md) region is used (United States). If you want to change the data residency for an individual connection, you can do so in its [connection settings](configuring-connections.md).

To choose your default data residency:

Expand All @@ -35,9 +35,9 @@ Depending on your network configuration, you may need to add [IP addresses](/ope
:::

## Choose the data residency for a connection
You can choose the data residency for your connection in the connection settings. You can also choose data residency when creating a new connection, or you can set the default data residency for your workspace.
You can additionally choose the data residency for your connection in the connection settings. You can choose the data residency when creating a new connection, or you can set the default data residency for your workspace so that it applies for any new connections moving forward.

To choose the data residency for your connection:
To choose a custom data residency for your connection:

1. In the Airbyte UI, click **Connections** and then click the connection that you want to change.

Expand Down
2 changes: 1 addition & 1 deletion docs/cloud/managing-airbyte-cloud/manage-schema-changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
products: all
---

# Manage schema changes
# Schema Change Management

You can specify for each connection how Airbyte should handle any change of schema in the source. This process helps ensure accurate and efficient data syncs, minimizing errors and saving you time and effort in managing your data pipelines.

Expand Down
2 changes: 2 additions & 0 deletions docs/cloud/managing-airbyte-cloud/review-connection-status.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ products: all
# Review the connection status
The connection status displays information about the connection and of each stream being synced. Reviewing this summary allows you to assess the connection's current status and understand when the next sync will be run.

![Connection Status](./assets/connection-status-page.png)

To review the connection status:
1. In the Airbyte UI, click **Connections**.

Expand Down
4 changes: 3 additions & 1 deletion docs/cloud/managing-airbyte-cloud/review-sync-history.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ products: all
# Review the sync history

The job history displays information about synced data, such as the amount of data moved, the number of records read and committed, and the total sync time. Reviewing this summary can help you monitor the sync performance and identify any potential issues.


![Job History](./assets/connection-job-history.png)

To review the sync history, click a connection in the list to view its sync history. Sync History displays the sync status or [reset](/operator-guides/reset.md) status. The sync status is defined as:

| Status | Description |
Expand Down
2 changes: 1 addition & 1 deletion docs/operator-guides/browsing-output-logs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
products: all
---

# Browsing Logs
# Browsing logs

Airbyte records the full logs as a part of each sync. These logs can be used to understand the underlying operations Airbyte performs to read data from the source and write to the destination as a part of the [Airbyte Protocol](/understanding-airbyte/airbyte-protocol.md). The logs includes many details, including any errors that can be helpful when troubleshooting sync errors.

Expand Down
2 changes: 1 addition & 1 deletion docs/operator-guides/reset.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
products: all
---

# Resetting Your Data
# Resetting your data

Resetting your data allows you to drop all previously synced data so that any ensuing sync can start syncing fresh. This is useful if you don't require the data replicated to your destination to be saved permanently or are just testing Airbyte.

Expand Down
24 changes: 10 additions & 14 deletions docs/using-airbyte/core-concepts/namespaces.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,13 @@ products: all

# Namespaces

## High-Level Overview

Namespaces are used to generally organize data, separate tests and production data, and enforce permissions. In most cases, namespaces are schemas in the database you're replicating to.

As a part of connection setup, you select where in the destination you want to write your data. Note: The default configuration is **Destination default**.

| Destination Namepsace | Description |
| Destination Namespace | Description |
| ---------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| Destination default | All streams will be replicated to the single default namespace defined by the Destination. |
| Destination default | All streams will be replicated to the single default namespace defined in the Destination's settings. |
| Mirror source structure | Some sources (for example, databases) provide namespace information for a stream. If a source provides namespace information, the destination will mirror the same namespace when this configuration is set. For sources or streams where the source namespace is not known, the behavior will default to the "Destination default" option. |
| Custom format | All streams will be replicated to a single user-defined namespace. See<a href="/understanding-airbyte/namespaces#--custom-format"> Custom format</a> for more details |

Expand Down Expand Up @@ -58,13 +56,17 @@ When replicating multiple sources into the same destination, you may create tabl
For example, a Github source can be replicated into a `github` schema. However, you may have multiple connections writing from different GitHub repositories \(common in multi-tenant scenarios\).

:::tip
To keep the same table names, Airbyte recommends writing the connections to unique namespaces to avoid mixing data from the different GitHub repositories.
To write more than 1 table with the same name to your destination, Airbyte recommends writing the connections to unique namespaces to avoid mixing data from the different GitHub repositories.
:::

You can enter plain text (most common) or additionally add a dynamic parameter `${SOURCE_NAMESPACE}`, which uses the namespace provided by the source if available.

### Examples

:::info
If the Source does not support namespaces, the data will be replicated into the Destination's default namespace. If the Destination does not support namespaces, any preference set in the connection is ignored.
:::

The following table summarises how this works. In this example, we're looking at the replication configuration between a Postgres Source and Snowflake Destination \(with settings of schema = "my\_schema"\):

| Namespace Configuration | Source Namespace | Source Table Name | Destination Namespace | Destination Table Name |
Expand All @@ -78,21 +80,15 @@ The following table summarises how this works. In this example, we're looking at
| Custom format = `"my\_${SOURCE\_NAMESPACE}\_schema"` | public | my\_table | my\_public\_schema | my\_table |
| Custom format = " " | public | my\_table | my\_schema | my\_table |

## Syncing Details

If the Source does not support namespaces, the data will be replicated into the Destination's default namespace. For databases, the default namespace is the schema provided in the destination configuration.

If the Destination does not support namespaces, any preference set in the connection is ignored.

## Using Namespaces with Basic Normalization

As part of the connections sync settings, it is possible to configure the namespace used by: 1. destination connectors: to store the `_airbyte_raw_*` tables. 2. basic normalization: to store the final normalized tables.
As part of the connection settings, it is possible to configure the namespace used by: 1. destination connectors: to store the `_airbyte_raw_*` tables. 2. basic normalization: to store the final normalized tables.

:::info
When basic normalization is enabled, this is the location that both your normalized and raw data will get written to. Your raw data will show up with the prefix `_airbyte_raw_` in the namespace you define. If you don't enable basic normalization, you will only receive the raw tables.
:::note

:::note
Note custom transformation outputs are not affected by the namespace settings from Airbyte: It is up to the configuration of the custom dbt project, and how it is written to handle its [custom schemas](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-custom-schemas). The default target schema for dbt in this case, will always be the destination namespace.
:::

## Requirements

Expand Down
26 changes: 17 additions & 9 deletions docs/using-airbyte/core-concepts/sync-schedules.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,11 @@ For each connection, you can select between three options that allow a sync to r

* Only one sync per connection can run at a time.
* If a sync is scheduled to run before the previous sync finishes, the scheduled sync will start after the completion of the previous sync.
* Syncs can run at most every 60 minutes. Reach out to [Sales](https://airbyte.com/company/talk-to-sales) if you require replication more frequently than once per hour.
* Syncs can run at most every 60 minutes in Airbyte Cloud. Reach out to [Sales](https://airbyte.com/company/talk-to-sales) if you require replication more frequently than once per hour.

:::note
For Scheduled or cron scheduled syncs, Airbyte guarantees syncs will initiate with a schedule accuracy of +/- 30 minutes.
:::

## Scheduled syncs
When a scheduled connection is first created, a sync is executed immediately after creation. After that, a sync is run once the time since the last sync \(whether it was triggered manually or due to a schedule\) has exceeded the schedule interval. For example:
Expand All @@ -27,17 +31,21 @@ When a scheduled connection is first created, a sync is executed immediately aft
- **October 3rd, 5:01pm:** It has been more than 24 hours since the last sync, so a sync is run

## Cron Scheduling
If you prefer more flexibility in scheduling your sync, you can also use CRON scheduling to set a precise time of day or month.
If you prefer more precision in scheduling your sync, you can also use CRON scheduling to set a specific time of day or month.

Airbyte uses the CRON scheduler from [Quartz](http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html). We recommend reading their [documentation](http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html) to learn more about how to
Airbyte uses the CRON scheduler from [Quartz](http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html). We recommend reading their [documentation](http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html) to understand the required formatting. You can also refer to these examples:

When setting up the cron extpression, you will also be asked to choose a time zone the sync will run in.

:::note
For Scheduled or cron scheduled syncs, Airbyte guarantees syncs will initiate with a schedule accuracy of +/- 30 minutes.
:::
| Cron string | Sync Timing|
| - | - |
| 0 0 * * * ? | Every hour, at 0 minutes past the hour |
| 0 0 15 * * ? | At 15:00 every day |
| 0 0 15 * * MON,TUE | At 15:00, only on Monday and Tuesday |
| 0 0 0,2,4,6 * * ? | At 12:00 AM, 02:00 AM, 04:00 AM and 06:00 AM every day |
| 0 0 */15 * * ? | At 0 minutes past the hour, every 15 hours |
When setting up the cron expression, you will also be asked to choose a time zone the sync will run in.

## Manual Syncs
When the connection is set to replicate with `Manual` frequency, the sync will not automatically run.

It can be triggered by clicking the "Sync Now" button at any time through the UI or be triggered through the UI.
It can be triggered by clicking the "Sync Now" button at any time through the UI or be triggered through the API.
Loading

0 comments on commit 14c6199

Please sign in to comment.