dlt-hub · sh-rp · Oct 2, 2024 · Oct 1, 2024 · Oct 1, 2024 · Oct 2, 2024
diff --git a/docs/website/docs/reference/explainers/airflow-gcp-cloud-composer.md b/docs/website/docs/reference/explainers/airflow-gcp-cloud-composer.md
@@ -10,13 +10,11 @@ keywords: [airflow, github, google cloud composer]
 
 This setup will allow you to deploy the main branch of your Airflow project from GitHub to Cloud Composer.
 
-- Create a GitHub repository ie. by following our how-to guide on [deployment for Airflow](../../walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer.md)
+- Create a GitHub repository, for example, by following our how-to guide on [deployment for Airflow](../../walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer.md).
 
-- In Google Cloud web interface, go to Source Repositories and create a repository that mirrors your
-  GitHub repository. This will simplify the authentication by doing it through this mirroring
-  service.
+- In the Google Cloud web interface, go to Source Repositories and create a repository that mirrors your GitHub repository. This will simplify authentication by using this mirroring service.
 
-- In Cloud Build, add a trigger on commit to main.
+- In Cloud Build, add a trigger on commit to the main branch.
 
 - Point it to your Cloud Build file. In our example, we place our file at `build/cloudbuild.yaml`.
 
@@ -26,31 +24,29 @@ This setup will allow you to deploy the main branch of your Airflow project from
 
   ![test-composer](/img/test-composer.png)
 
-- In your `cloudbuild.yaml`, set the bucket name
+- In your `cloudbuild.yaml`, set the bucket name.
 
-- Make sure your repository code is pushed to main.
+- Make sure your repository code is pushed to the main branch.
 
-- Run the trigger you build (in Cloud Build).
+- Run the trigger you built (in Cloud Build).
 
-- Wait a minute, and check if your files arrived in the bucket. In our case, we added a `pipedrive`
-  folder, and we can see it appeared.
+- Wait a minute, and check if your files have arrived in the bucket. In our case, we added a `pipedrive` folder, and we can see it appeared.
 
   ![bucket-details](/img/bucket-details.png)
 
 ### Airflow setup
 
 ### Adding the libraries needed
 
-Assuming you already spun up a Cloud Composer.
+Assuming you have already spun up a Cloud Composer:
 
-- Make sure the user you added has rights to change the base image (add libraries). I already had
-  these added, you may get away with less (not clear in docs):
+- Make sure the user you added has rights to change the base image (add libraries). I already had these added; you may get away with fewer (not clear in docs):
 
   - Artifact Registry Administrator;
   - Artifact Registry Repository Administrator;
   - Remote Build Execution Artifact Admin;
 
-- Navigate to your composer environment and add the needed libraries. In the case of this example
-  pipedrive pipeline, we only need dlt, so add `dlt` library.
+- Navigate to your composer environment and add the needed libraries. In the case of this example pipedrive pipeline, we only need the sdf library, so add the `dlt` library.
 
   ![add-package](/img/add-package.png)
+
diff --git a/docs/website/docs/reference/explainers/how-dlt-works.md b/docs/website/docs/reference/explainers/how-dlt-works.md
@@ -7,8 +7,8 @@ keywords: [architecture, extract, normalize, load]
 # How `dlt` works
 
 `dlt` automatically turns JSON returned by any [source](../../general-usage/glossary.md#source)
-(e.g. an API) into a live dataset stored in the
-[destination](../../general-usage/glossary.md#destination) of your choice (e.g. Google BigQuery). It
+(e.g., an API) into a live dataset stored in the
+[destination](../../general-usage/glossary.md#destination) of your choice (e.g., Google BigQuery). It
 does this by first [extracting](how-dlt-works.md#extract) the JSON data, then
 [normalizing](how-dlt-works.md#normalize) it to a schema, and finally [loading](how-dlt-works#load)
 it to the location where you will store it.
@@ -24,14 +24,15 @@ JSON and provides it to `dlt` as input, which then normalizes that data.
 ## Normalize
 
 The configurable normalization engine in `dlt` recursively unpacks this nested structure into
-relational tables (i.e. inferring data types, linking tables to create nested relationships,
+relational tables (i.e., inferring data types, linking tables to create nested relationships,
 etc.), making it ready to be loaded. This creates a
-[schema](../../general-usage/glossary.md#schema), which will automatically evolve to any future
-source data changes (e.g. new fields or tables).
+[schema](../../general-usage/glossary.md#schema), which will automatically evolve to accommodate any future
+source data changes (e.g., new fields or tables).
 
 ## Load
 
 The data is then loaded into your chosen [destination](../../general-usage/glossary.md#destination).
 `dlt` uses configurable, idempotent, atomic loads that ensure data safely ends up there. For
-example, you don't need to worry about the size of the data you are loading and if the process is
+example, you don't need to worry about the size of the data you are loading, and if the process is
 interrupted, it is safe to retry without creating errors.
+
diff --git a/docs/website/docs/walkthroughs/add-a-verified-source.md b/docs/website/docs/walkthroughs/add-a-verified-source.md
@@ -27,7 +27,7 @@ List available sources to see their names and descriptions:
 dlt init --list-sources
 ```
 
-Now pick one of the source names, for example `pipedrive` and a destination i.e. `bigquery`:
+Now pick one of the source names, for example, `pipedrive` and a destination, i.e., `bigquery`:
 
 ```sh
 dlt init pipedrive bigquery
@@ -80,7 +80,7 @@ For adding them locally or on your orchestrator, please see the following guide
 
 ## 3. Customize or write a pipeline script
 
-Once you initialized the pipeline, you will have a sample file `pipedrive_pipeline.py`.
+Once you have initialized the pipeline, you will have a sample file `pipedrive_pipeline.py`.
 
 This is the developer's suggested way to use the pipeline, so you can use it as a starting point -
 in our case, we can choose to run a method that loads all data, or we can choose which endpoints
@@ -95,7 +95,7 @@ You can modify an existing verified source in place.
 - If that modification is generally useful for anyone using this source, consider contributing it
   back via a PR. This way, we can ensure it is tested and maintained.
 - If that modification is not a generally shared case, then you are responsible for maintaining it.
-  We suggest making any of your own customisations modular is possible, so you can keep pulling the
+  We suggest making any of your own customizations modular if possible, so you can keep pulling the
   updated source from the community repo in the event of source maintenance.
 
 ## 5. Add more sources to your project
@@ -120,7 +120,7 @@ the parent folder:
 dlt init pipedrive bigquery
 ```
 
-## 7. Advanced: Using dlt init with branches, local folders or git repos
+## 7. Advanced: Using dlt init with branches, local folders, or git repos
 
 To find out more info about this command, use --help:
 
@@ -134,9 +134,9 @@ To deploy from a branch of the `verified-sources` repo, you can use the followin
 dlt init source destination --branch <branch_name>
 ```
 
-To deploy from another repo, you could fork the verified-sources repo and then provide the new repo
-url as below, replacing `dlt-hub` with your fork name:
+To deploy from another repo, you could fork the verified-sources repo and then provide the new repo URL as below, replacing `dlt-hub` with your fork name:
 
 ```sh
 dlt init pipedrive bigquery --location "https://github.com/dlt-hub/verified-sources"
 ```
+
diff --git a/docs/website/docs/walkthroughs/add-incremental-configuration.md b/docs/website/docs/walkthroughs/add-incremental-configuration.md
@@ -7,12 +7,12 @@ slug: sql-incremental-configuration
 
 # Add incremental configuration to SQL resources
 Incremental loading is the act of loading only new or changed data and not old records that have already been loaded.
-For example, a bank loading only the latest transactions or a company updating its database with new or modified user
+For example, a bank loads only the latest transactions, or a company updates its database with new or modified user
 information. In this article, we’ll discuss a few incremental loading strategies.
 
 :::important
 Processing data incrementally, or in batches, enhances efficiency, reduces costs, lowers latency, improves scalability,
- and optimizes resource utilization.
+and optimizes resource utilization.
 :::
 
 ### Incremental loading strategies
@@ -28,25 +28,26 @@ In this guide, we will discuss various incremental loading methods using `dlt`,
 
 ## Code examples
 
+
+
 ### 1. Full load (replace)
 
-A full load strategy completely overwrites the existing data with the new dataset. This is useful when you want to
-refresh the entire table with the latest data.
+A full load strategy completely overwrites the existing data with the new dataset. This is useful when you want to refresh the entire table with the latest data.
 
 :::note
 This strategy technically does not load only new data but instead reloads all data: old and new.
 :::
 
 Here’s a walkthrough:
 
-1. The initial table, named "contact", in the SQL source looks like this:
+1. The initial table, named "contact," in the SQL source looks like this:
 
     | id | name | created_at |
     | --- | --- | --- |
     | 1 | Alice | 2024-07-01 |
     | 2 | Bob | 2024-07-02 |
 
-2. The python code illustrates the process of loading data from an SQL source into BigQuery using the `dlt` pipeline. Please note the `write_disposition = "replace”` used below.
+2. The Python code illustrates the process of loading data from an SQL source into BigQuery using the `dlt` pipeline. Please note the `write_disposition = "replace"` used below.
 
     ```py
     def load_full_table_resource() -> None:
@@ -94,24 +95,22 @@ Here’s a walkthrough:
 
 **What happened?**
 
-After running the pipeline, the original data in the "contact" table (Alice and Bob) is completely replaced with the new
-updated table with data “Charlie” and “Dave” added and “Bob” removed. This strategy is useful for scenarios where the entire
-dataset needs to be refreshed/replaced with the latest information.
+After running the pipeline, the original data in the "contact" table (Alice and Bob) is completely replaced with the new updated table with data “Charlie” and “Dave” added and “Bob” removed. This strategy is useful for scenarios where the entire dataset needs to be refreshed or replaced with the latest information.
 
 ### 2. Append new records based on incremental ID
 
 This strategy appends only new records to the table based on an incremental ID. It is useful for scenarios where each new record has a unique, incrementing identifier.
 
 Here’s a walkthrough:
 
-1. The initial table, named "contact", in the SQL source looks like this:
+1. The initial table, named "contact," in the SQL source looks like this:
 
     | id | name | created_at |
     | --- | --- | --- |
     | 1 | Alice | 2024-07-01 |
     | 2 | Bob | 2024-07-02 |
 
-2. The python code demonstrates loading data from an SQL source into BigQuery using an incremental variable, `id`. This variable tracks new or updated records in the `dlt` pipeline. Please note the `write_disposition = "append”` used below.
+2. The Python code demonstrates loading data from an SQL source into BigQuery using an incremental variable, `id`. This variable tracks new or updated records in the `dlt` pipeline. Please note the `write_disposition = "append"` used below.
 
     ```py
     def load_incremental_id_table_resource() -> None:
@@ -133,7 +132,7 @@ Here’s a walkthrough:
         print(info)
     ```
 
-3. After running the `dlt` pipeline, the data loaded into BigQuery "contact" table looks like:
+3. After running the `dlt` pipeline, the data loaded into the BigQuery "contact" table looks like:
 
     | Row | id | name | created_at | _dlt_load_id | _dlt_id |
     | --- | --- | --- | --- | --- | --- |
@@ -161,20 +160,20 @@ Here’s a walkthrough:
 
 In this scenario, the pipeline appends new records (Charlie and Dave) to the existing data (Alice and Bob) without affecting the pre-existing entries. This strategy is ideal when only new data needs to be added, preserving the historical data.
 
-### 3. Append new records based on timestamp ("created_at")
+### Append new records based on timestamp ("created_at")
 
 This strategy appends only new records to the table based on a date/timestamp field. It is useful for scenarios where records are created with a timestamp, and you want to load only those records created after a certain date.
 
 Here’s a walkthrough:
 
-1. The initial dataset, named "contact", in the SQL source looks like this:
+1. The initial dataset, named "contact," in the SQL source looks like this:
 
     | id | name | created_at |
     | --- | --- | --- |
     | 1 | Alice | 2024-07-01 00:00:00 |
     | 2 | Bob | 2024-07-02 00:00:00 |
 
-2. The python code illustrates the process of loading data from an SQL source into BigQuery using the `dlt` pipeline. Please note the `write_disposition = "append"`, with `created_at` being used as the incremental parameter.
+2. The Python code illustrates the process of loading data from an SQL source into BigQuery using the `dlt` pipeline. Please note the `write_disposition = "append"`, with `created_at` being used as the incremental parameter.
 
     ```py
     def load_incremental_timestamp_table_resource() -> None:
@@ -199,7 +198,7 @@ Here’s a walkthrough:
     load_incremental_timestamp_table_resource()
     ```
 
-3. After running the `dlt` pipeline, the data loaded into BigQuery "contact" table looks like:
+3. After running the `dlt` pipeline, the data loaded into the BigQuery "contact" table looks like:
 
     | Row | id | name | created_at | _dlt_load_id | _dlt_id |
     | --- | --- | --- | --- | --- | --- |
@@ -225,13 +224,11 @@ Here’s a walkthrough:
 
 **What happened?**
 
-The pipeline adds new records (Charlie and Dave) that have a `created_at` timestamp after the specified initial value while
-retaining the existing data (Alice and Bob). This approach is useful for loading data incrementally based on when it was created.
+The pipeline adds new records (Charlie and Dave) that have a `created_at` timestamp after the specified initial value while retaining the existing data (Alice and Bob). This approach is useful for loading data incrementally based on when it was created.
 
-### 4. Merge (Update/Insert) records based on timestamp ("last_modified_at") and ID
+### 4. Merge (update/insert) records based on timestamp ("last_modified_at") and ID
 
-This strategy merges records based on a composite key of ID and a timestamp field. It updates existing records and inserts
-new ones as necessary.
+This strategy merges records based on a composite key of ID and a timestamp field. It updates existing records and inserts new ones as necessary.
 
 Here’s a walkthrough:
 
@@ -242,7 +239,7 @@ Here’s a walkthrough:
     | 1 | Alice | 2024-07-01 00:00:00 |
     | 2 | Bob | 2024-07-02 00:00:00 |
 
-2. The Python code illustrates the process of loading data from an SQL source into BigQuery using the `dlt` pipeline Please note the `write_disposition = "merge"`, with `last_modified_at` being used as the incremental parameter.
+2. The Python code illustrates the process of loading data from an SQL source into BigQuery using the `dlt` pipeline. Please note the `write_disposition = "merge"`, with `last_modified_at` being used as the incremental parameter.
 
     ```py
     def load_merge_table_resource() -> None:
@@ -292,9 +289,7 @@ Here’s a walkthrough:
 
 **What happened?**
 
-The pipeline updates the record for Alice with the new data, including the updated `last_modified_at` timestamp, and adds a
-new record for Hank. This method is beneficial when you need to ensure that records are both updated and inserted based on a
-specific timestamp and ID.
+The pipeline updates the record for Alice with the new data, including the updated `last_modified_at` timestamp, and adds a new record for Hank. This method is beneficial when you need to ensure that records are both updated and inserted based on a specific timestamp and ID.
+
+The examples provided explain how to use `dlt` to achieve different incremental loading scenarios, highlighting the changes before and after running each pipeline.
 
-The examples provided explain how to use `dlt` to achieve different incremental loading scenarios, highlighting the changes
-before and after running each pipeline.