From 29d4a67c591f3e22269cb8d45f0750f5ef8d73cd Mon Sep 17 00:00:00 2001 From: Charity Holt <38872070+charholt@users.noreply.github.com> Date: Fri, 28 Feb 2025 14:23:17 -0500 Subject: [PATCH 1/3] Update system.md Adding __reverse_etl schema info to public docs --- src/connections/reverse-etl/system.md | 28 +++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/src/connections/reverse-etl/system.md b/src/connections/reverse-etl/system.md index cf7c8613a0..e0baee8995 100644 --- a/src/connections/reverse-etl/system.md +++ b/src/connections/reverse-etl/system.md @@ -16,6 +16,34 @@ For Segment to compute the data changes within your warehouse, Segment needs to > warning "" > There may be cost implications to having Segment query your warehouse tables. +## Reverse ETL Schema +When using Reverse ETL with Segment, several system tables are created within the `__segment_reverse_etl` schema in your Snowflake instance. These tables are crucial for managing the sync process efficiently and tracking state information. Below are the details of the system tables in this schema: + +**1. Records Table** + +`records_` table is located within the` __segment_reverse_etl` schema, this table contains two key columns: + +`record_id`: A unique identifier for each record. + +`checksum`: A checksum value that is used to detect changes to a record since the last sync. +The records table helps in determining new and updated rows by comparing the checksum values during each sync. If a record’s checksum changes, it indicates that the record has been modified and should be included in the next sync. This ensures that only the necessary updates are processed, reducing the amount of data transferred. + +**2. Checkpoint Table** + +The `checkpoints_` tables are located within the __segment_reverse_etl schema, this table contains the following columns: + +`source_id`: Identifies the source from which the data is being synced. + +`model_id`: Identifies the specific model or query that is used to pull data. +checkpoint: Stores a timestamp value that represents the last sync point for a particular model. +The checkpoints table is used for timestamp-based checkpointing between syncs. This enables Segment to track the last successful sync for each model and avoid duplicating data when syncing, ensuring incremental and efficient data updates. + +### Important Considerations + +Do not modify or delete these tables: Altering or deleting the records and checkpoints tables can cause unpredictable behavior in the sync process. These tables are essential for maintaining the integrity of data during Reverse ETL operations. +State management: The `__segment_reverse_etl` schema and its associated tables (records and checkpoints) manage the state of each sync, ensuring that only necessary data changes are synced and that the sync process can resume where it left off. + + ## Limits To provide consistent performance and reliability at scale, Segment enforces default use and rate limits for Reverse ETL. From 872922fdf6e8014f7840c82161cd4ed951484d24 Mon Sep 17 00:00:00 2001 From: Charity Holt <38872070+charholt@users.noreply.github.com> Date: Thu, 20 Mar 2025 18:18:03 -0400 Subject: [PATCH 2/3] Update system.md removed snowflake --- src/connections/reverse-etl/system.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/connections/reverse-etl/system.md b/src/connections/reverse-etl/system.md index e0baee8995..f27bceccf2 100644 --- a/src/connections/reverse-etl/system.md +++ b/src/connections/reverse-etl/system.md @@ -17,7 +17,7 @@ For Segment to compute the data changes within your warehouse, Segment needs to > There may be cost implications to having Segment query your warehouse tables. ## Reverse ETL Schema -When using Reverse ETL with Segment, several system tables are created within the `__segment_reverse_etl` schema in your Snowflake instance. These tables are crucial for managing the sync process efficiently and tracking state information. Below are the details of the system tables in this schema: +When using Reverse ETL with Segment, several system tables are created within the `__segment_reverse_etl` schema in your warehouse. These tables are crucial for managing the sync process efficiently and tracking state information. Below are the details of the system tables in this schema: **1. Records Table** From 32d662aea5ac82db5e3ab94d305cf18f6ad6ed4b Mon Sep 17 00:00:00 2001 From: forstisabella <92472883+forstisabella@users.noreply.github.com> Date: Fri, 21 Mar 2025 09:47:35 -0400 Subject: [PATCH 3/3] Apply suggestions from code review --- src/connections/reverse-etl/system.md | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/src/connections/reverse-etl/system.md b/src/connections/reverse-etl/system.md index f27bceccf2..c24aaec4fe 100644 --- a/src/connections/reverse-etl/system.md +++ b/src/connections/reverse-etl/system.md @@ -16,31 +16,34 @@ For Segment to compute the data changes within your warehouse, Segment needs to > warning "" > There may be cost implications to having Segment query your warehouse tables. -## Reverse ETL Schema +## Reverse ETL schema When using Reverse ETL with Segment, several system tables are created within the `__segment_reverse_etl` schema in your warehouse. These tables are crucial for managing the sync process efficiently and tracking state information. Below are the details of the system tables in this schema: -**1. Records Table** +### Records table -`records_` table is located within the` __segment_reverse_etl` schema, this table contains two key columns: +`records_` table is located within the ` __segment_reverse_etl` schema. -`record_id`: A unique identifier for each record. +This table contains two key columns: -`checksum`: A checksum value that is used to detect changes to a record since the last sync. +- `record_id`: A unique identifier for each record. +- `checksum`: A checksum value that is used to detect changes to a record since the last sync. The records table helps in determining new and updated rows by comparing the checksum values during each sync. If a record’s checksum changes, it indicates that the record has been modified and should be included in the next sync. This ensures that only the necessary updates are processed, reducing the amount of data transferred. -**2. Checkpoint Table** +### Checkpoint table -The `checkpoints_` tables are located within the __segment_reverse_etl schema, this table contains the following columns: +The `checkpoints_` tables are located within the __segment_reverse_etl schema. -`source_id`: Identifies the source from which the data is being synced. +This table contains the following columns: + +- `source_id`: Identifies the source from which the data is being synced. +- `model_id`: Identifies the specific model or query that is used to pull data. +- `checkpoint`: Stores a timestamp value that represents the last sync point for a particular model. -`model_id`: Identifies the specific model or query that is used to pull data. -checkpoint: Stores a timestamp value that represents the last sync point for a particular model. The checkpoints table is used for timestamp-based checkpointing between syncs. This enables Segment to track the last successful sync for each model and avoid duplicating data when syncing, ensuring incremental and efficient data updates. ### Important Considerations -Do not modify or delete these tables: Altering or deleting the records and checkpoints tables can cause unpredictable behavior in the sync process. These tables are essential for maintaining the integrity of data during Reverse ETL operations. +Do not modify or delete these tables. Altering or deleting the records and checkpoints tables can cause unpredictable behavior in the sync process. These tables are essential for maintaining the integrity of data during Reverse ETL operations. State management: The `__segment_reverse_etl` schema and its associated tables (records and checkpoints) manage the state of each sync, ensuring that only necessary data changes are synced and that the sync process can resume where it left off.