Merge pull request stitchdata#538 from stitchdata/aug-17-support

August 17 support fixes
shedd · Aug 19, 2020 · 3803694 · 3803694
2 parents b6c3545 + 071d5fe
commit 3803694
Show file tree

Hide file tree

Showing 6 changed files with 231 additions and 90 deletions.
diff --git a/_data/destinations/bigquery/loading-errors.yml b/_data/destinations/bigquery/loading-errors.yml
@@ -6,9 +6,12 @@
 
 # "Primary key change is not permitted"
 
+numeric-out-of-range: &numeric-out-of-range "Numeric out of range for BigQuery on [NUMERIC]"
+primary-key-change: &primary-key-change "Primary key change is not permitted"
+
 all:
 ## Primary Key change not allowed
-  - message: "Primary key change is not permitted"
+  - message: *primary-key-change
     id: "pk-change-not-permitted"
     applicable-to: "Google BigQuery v2 destinations"
     level: "critical"
@@ -25,7 +28,7 @@ all:
     fix-it: |
       Reset the table(s) mentioned in the error. This will queue a full re-replication of the table(s), which will ensure Primary Keys are correctly captured and used to de-dupe data when loading.
 
-  - message: "Numeric out of range for BigQuery on [NUMERIC]"
+  - message: *numeric-out-of-range
     id: "numeric-out-of-range"
     applicable-to: "All Google BigQuery destination versions"
     level: "warning"

diff --git a/_data/errors/extraction/databases/mongo.yml b/_data/errors/extraction/databases/mongo.yml
@@ -18,6 +18,11 @@ raw-error:
 
 # '[VALUE]' is not a valid ObjectId, it must be a 12-byte input or a 24-character hex string
 
+  oplog-age-out: &oplog-age-out |
+    Clearing state because Oplog has aged out
+    Must complete full table sync before starting oplog replication for [COLLECTION_NAME]
+
+
 documentation:
   projection-queries: &projection-queries
     category: "Projection queries"
@@ -147,4 +152,20 @@ all:
 
       {% for integration in applicable-integrations %}
       - [{{ integration.display_name }}]({{ integration.url | prepend: site.baseurl | append: "/v2" | append: "#create-a-database-user" }})
-      {% endfor %}
+      {% endfor %}
+
+  - message: *oplog-age-out
+    id: "oplog-age-out-full-table-replication"
+    applicable-to: *all-mongo
+    level: "info"
+    category: "Log-based Incremental Replication"
+    category-doc: |
+      {{ link.replication.log-based-incremental | prepend: site.baseurl | append: "#limitation--log-retention" }}
+    version: "1,2"
+    summary: "Insufficient maximum OpLog size"
+    cause: |
+      The OpLog's maximum size is insufficient, causing log files to age out before Stitch can replicate them. When this occurs, Stitch will clear the saved log position ID for any affection collection(s) and re-replicate them in full.
+    fix-it: |
+      Increase the maximum size of the OpLog using the [replSetResizeOplog](https://docs.mongodb.com/v4.0/reference/command/replSetResizeOplog/#dbcmd.replSetResizeOplog){:target="new"} command.
+
+      **Note**: As the maximum size you need depends on your database, it may take some experimentation to identify the best setting. Mongo doesn't currently recommend an OpLog size.
diff --git a/_destinations/redshift/guides/redshift-apply-encodings-sort-dist-keys.md b/_destinations/redshift/guides/redshift-apply-encodings-sort-dist-keys.md
@@ -25,15 +25,15 @@ use-tutorial-sidebar: false
 # -------------------------- #
 
 intro: |
-  {% include important.html type="single-line" content="The process we outline in this tutorial - which includes dropping tables - can lead to data corruption and other issues if done incorrectly. **Please proceed with caution or reach out to Stitch support if you have questions.**" %}
+  {% include important.html type="single-line" content="The process we outline in this tutorial - which includes dropping tables - can lead to data corruption and other issues if done incorrectly. **Proceed with caution or reach out to Stitch support if you have questions.**" %}
 
-  Want to improve your query performance? In this article, we’ll walk you through how to use encoding, Sort, and Distribution keys to streamline query processing.
+  Want to improve your query performance? In this guide, we’ll walk you through how to use encoding, SORT, and DIST (distribution) keys to streamline query processing.
 
   Before we dive into their application, here's a quick overview of each of these performance enhancing tools.
 
   - **Encodings**, or [compression types](http://docs.aws.amazon.com/redshift/latest/dg/t_Compressing_data_on_disk.html), are used to reduce the amount of required storage space and the size of data that’s read from storage. This in turn can lead to a reduction in processing time for queries.
 
-  - **[Sort keys](http://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html)** determine the order in which rows in a table are stored. When properly applied, Sort Keys allow large chunks of data to be skipped during query processing. Less data to scan means a shorter processing time, thus improving the query’s performance.
+  - **[SORT keys](http://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html)** determine the order in which rows in a table are stored. When properly applied, SORT Keys allow large chunks of data to be skipped during query processing. Less data to scan means a shorter processing time, thus improving the query’s performance.
 
   - **[Distribution, or DIST keys](http://docs.aws.amazon.com/redshift/latest/dg/t_Distributing_data.html)** determine where data is stored in Redshift. When data is replicated into your data warehouse, it’s stored across the compute nodes that make up the cluster. If data is heavily skewed - meaning a large amount is placed on a single node - query performance will suffer. Even distribution prevents these bottlenecks by ensuring that nodes equally share the processing load.
 
@@ -57,18 +57,18 @@ steps:
 
       We’ll use a table called `orders`, which is contained in the `rep_sales` schema.
 
-      Log into your Redshift database using your SQL client to get started.
+      To get started, log into your Redshift database using [psql](https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-from-psql.html){:target="new"}.
 
-      Use this command to retrieve the table schema, replacing `rep_sales` and `orders` with the names of your schema and table, respectively: 
+      Use this command to retrieve the table schema, replacing `rep_sales` and `orders` with the names of your schema and table, respectively:
 
-      ```sql
-      \d+ rep_sales.orders
-      ```
+      {% capture code %}\d+ rep_sales.orders
+      {% endcapture %}
+
+      {% include layout/code-snippet.html code=code language="sql" %}
 
       For the `rep_sales.orders` table, the result looks like this:
 
-      ```
-      | Column              | Data Type                  |
+      {% capture code %}| Column              | Data Type                  |
       | --------------------+----------------------------|
       | id [pk]             | BIGINT                     |
       | rep_name            | VARCHAR(128)               |
@@ -80,7 +80,9 @@ steps:
       | _sdc_batched_at     | TIMESTAMP WITHOUT TIMEZONE |
       | _sdc_table_version  | BIGINT                     |
       | _sdc_replication_id | VARCHAR(128)               |
-      ```
+      {% endcapture %}
+
+      {% include layout/code-snippet.html code=code language="sql" %}
 
       In this example, we'll perform the following:
 
@@ -101,17 +103,19 @@ steps:
 
           Retrieve the table's Primary Key using the following query:
 
-          ```sql
-          SELECT description
+          {% capture code %}SELECT description
             FROM pg_catalog.pg_description
            WHERE objoid = 'old_orders'::regclass;
-          ```
+          {% endcapture %}
+
+          {% include layout/code-snippet.html code=code language="sql" %}
 
           The result will look like the following, where `primary_keys` is an array of strings referencing the columns used as the table's Primary Key:
 
-          ```sql
-          {"primary_keys":["id"]}
-          ```
+          {% capture code %}{"primary_keys":["id"]}
+          {% endcapture %}
+
+          {% include layout/code-snippet.html code=code language="sql" %}
 
           {% include important.html first-line="**Primary Key comments**" content="Redshift doesn’t enforce the use of Primary Keys, but Stitch requires them to replicate data. In the following example, you'll see `COMMENT` being used to note the table's Primary Key. **Make sure you include the Primary Key comment in the next step, as missing or incorrectly defined Primary Key comments will cause issues with data replication.**" %}
 
@@ -128,8 +132,7 @@ steps:
 
           For the `rep_sales.orders` example table, this is the transaction that will perform the actions listed above:
 
-          ```sql
-          SET search_path to rep_sales;
+          {% capture code %}SET search_path to rep_sales;
           BEGIN;
           ALTER TABLE orders RENAME TO old_orders;
           CREATE TABLE new_orders (
@@ -154,7 +157,9 @@ steps:
           ALTER TABLE orders OWNER TO <stitch_user>;      /* Grants table ownership to Stitch */
           DROP TABLE old_orders;                        /* Drops the "old" table */
           END;
-          ```
+          {% endcapture %}
+
+          {% include layout/code-snippet.html code=code language="sql" %}
 
   - title: "Verify the table owner"
     anchor: "verify-table-owner"
@@ -163,34 +168,36 @@ steps:
 
       To verify the table's owner, run the following query and replace `rep_sales` and `orders` with the names of the schema and table, respectively:
 
-      ```sql
-      SELECT schemaname,
+      {% capture code %}SELECT schemaname,
              tablename,
              tableowner
         FROM pg_catalog.pg_tables
        WHERE schemaname = 'rep_sales'
          AND tablename = 'order'
-      ```
+      {% endcapture %}
+
+      {% include layout/code-snippet.html code=code language="sql" %}
 
       If Stitch is not the owner of the table, run the following command:
 
-      ```sql
-      ALTER TABLE <schema_name>.<table_name> OWNER TO <stitch_user>;
-      ```
+      {% capture code %}ALTER TABLE <schema_name>.<table_name> OWNER TO <stitch_user>;
+      {% endcapture %}
+
+      {% include layout/code-snippet.html code=code language="sql" %}
 
   - title: "Verify the encoding and key application"
     anchor: "verify-application"
     content: |
-      To verify that the changes were applied correctly, retrieve the table’s schema again using this command, replacing `rep_sales` and `orders` with the names of your schema and table, respectively: 
+      To verify that the changes were applied correctly, retrieve the table’s schema again using this command, replacing `rep_sales` and `orders` with the names of your schema and table, respectively:
 
-      ```sql
-      \d+ rep_sales.orders
-      ```
+      {% capture code %}\d+ rep_sales.orders
+      {% endcapture %}
+
+      {% include layout/code-snippet.html code=code language="sql" %}
 
       In this example, if the Keys and encodings were applied correctly, the response would look something like this:
 
-      ```sql
-      | Column              | Data type                  | Encoding | Distkey | Sortkey |
+      {% capture code %}| Column              | Data type                  | Encoding | Distkey | Sortkey |
       |---------------------+----------------------------+----------+---------+---------|
       | id                  | BIGINT                     | none     | true    | true    |  
       | rep_name            | VARCHAR(128)               | bytedict | false   | false   |  
@@ -202,7 +209,9 @@ steps:
       | _sdc_batched_at     | TIMESTAMP WITHOUT TIMEZONE | none     | false   | false   |
       | _sdc_table_version  | BIGINT                     | none     | false   | false   |
       | _sdc_replication_id | VARCHAR(128)               | none     | false   | false   |
-      ```
+      {% endcapture %}
+
+      {% include layout/code-snippet.html code=code language="sql" %}
 
       For the `id` column, the `Distkey` and `Sortkey` is set to `true`, meaning that the keys were properly applied.