Skip to content

Commit

Permalink
Merge branch 'current' into dbeatty/new-cross-db-macros
Browse files Browse the repository at this point in the history
  • Loading branch information
dbeatty10 authored Sep 26, 2022
2 parents da38078 + 93bb937 commit d48fb50
Show file tree
Hide file tree
Showing 70 changed files with 1,089 additions and 200 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,6 @@ website/build/
website/yarn.lock
website/node_modules
website/i18n/*

# Local Netlify folder
.netlify
2 changes: 2 additions & 0 deletions netlify.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[build]
functions = "functions"
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ So when presented with a near-real-time modeling request, I (and you as well!) h

Recently I was working on a JetBlue project and was presented with a legitimate use case: operational data. JetBlue’s Crewmembers need to make real-time decisions on when to close the door to a flight or rebook a flight. If you have ever been to an airport when there is a flight delay, you know how high the tension is in the room for airline employees to make the right decisions. They literally cannot do their jobs without real-time data.

If possible, the best thing to do is to query data as close to the source as possible. You don’t want to hit your production database unless you want to frighten and likely anger your DBA. Instead, the preferred approach is to replicate the source data to your analytics warehouse, which would provide a suitable environment for analytic queries. In JetBlue’s case, the data arrives in JSON blobs, which then need to be unnested, transformed, and joined before the data becomes useful for analysis. There was no way to just query from the source to get the information people required.
If possible, the best thing to do is to query data as close to the source as possible. You don’t want to hit your production database unless you want to frighten and likely anger your DBA. Instead, the preferred approach is to replicate the source data to your analytics warehouse, which would provide a suitable environment for analytic queries. In JetBlue’s case, the data arrives in <Term id="json" /> blobs, which then need to be unnested, transformed, and joined before the data becomes useful for analysis. There was no way to just query from the source to get the information people required.

Tldr: If you need transformed, operational data to make in-the-moment decisions then you probably need real-time data.

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2021-11-29-open-source-community-growth.md
Original file line number Diff line number Diff line change
Expand Up @@ -465,7 +465,7 @@ A lineage graph of the entire pipeline can now be viewed in Marquez, which shows

This is the simplest part, by far. Since we have a set of tables with clearly-defined measures and dimensions, getting everything working in a system like Apache Superset is straightforward.

Configuring the data source and adding each table to a Preset Workspace was easy. First, I connected my BigQuery database by uploading a JSON key for my service account.
Configuring the data source and adding each table to a Preset Workspace was easy. First, I connected my BigQuery database by uploading a <Term id="json" /> key for my service account.

Once the database connection was in place, I created datasets for each of my `*_daily_summary` tables by selecting the database/schema/table from a dropdown.

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-04-14-add-ci-cd-to-bitbucket.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ Reading the file over, you can see that we:
3. Specify that this pipeline is a two-step process
4. Specify that in the first step called “Deploy to production”, we want to:
1. Use whatever pip cache is available, if any
2. Keep whatever JSON files are generated in this step in target/
2. Keep whatever <Term id="json" /> files are generated in this step in target/
3. Run the dbt setup by first installing dbt as defined in requirements.txt, then adding `profiles.yml` to the location dbt expects them in, and finally running `dbt deps` to install any dbt packages
4. Run `dbt seed`, `run`, and `snapshot`, all with `prod` as specified target
5. Specify that in the first step called “Upload artifacts for slim CI runs”, we want to use the Bitbucket “pipe” (pre-defined action) to authenticate with environment variables and upload all files that match the glob `target/*.json`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ dbt-cloud job run --job-id 43167

You probably agree that the latter example is definitely more elegant and easier to read. `dbt-cloud` handles the request boilerplate (e.g., api token in the header, endpoint URL) so that you don’t need to worry about authentication or remember which endpoint to use. Also, the CLI implements additional functionality (e.g., `--wait`) for some endpoints; for example, `dbt cloud job run --wait` will issue the job trigger, wait until the job finishes, fails or is cancelled and then prints out the job status response.

In addition to CLI commands that interact with a single dbt Cloud API endpoint there are composite helper commands that call one or more API endpoints and perform more complex operations. One example of composite commands are `dbt-cloud job export` and `dbt-cloud job import` where, under the hood, the export command performs a `dbt-cloud job get` and writes the job metadata to a JSON file and the import command reads job parameters from a JSON file and calls `dbt-cloud job create`. The export and import commands can be used in tandem to move dbt Cloud jobs between projects. Another example is the `dbt-cloud job delete-all` which fetches a list of all jobs using `dbt-cloud job list` and then iterates over the list prompting the user if they want to delete the job. For each job that the user agrees to delete a `dbt-cloud job delete` is performed.
In addition to CLI commands that interact with a single dbt Cloud API endpoint there are composite helper commands that call one or more API endpoints and perform more complex operations. One example of composite commands are `dbt-cloud job export` and `dbt-cloud job import` where, under the hood, the export command performs a `dbt-cloud job get` and writes the job metadata to a <Term id="json" /> file and the import command reads job parameters from a JSON file and calls `dbt-cloud job create`. The export and import commands can be used in tandem to move dbt Cloud jobs between projects. Another example is the `dbt-cloud job delete-all` which fetches a list of all jobs using `dbt-cloud job list` and then iterates over the list prompting the user if they want to delete the job. For each job that the user agrees to delete a `dbt-cloud job delete` is performed.

To install the CLI in your Python environment run `pip install dbt-cloud-cli` and you’re all set. You can use it locally in your development environment or e.g. in a GitHub actions workflow.

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-07-26-pre-commit-dbt.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ Adding periodic pre-commit checks can be done in 2 different ways, through CI (C

The example below will assume GitHub actions as the CI engine but similar behavior could be achieved in any other CI tool.

As described before, we need to run a `dbt docs generate` in order to create updated JSON artifacts used in the pre-commit hooks.
As described before, we need to run a `dbt docs generate` in order to create updated <Term id="json" /> artifacts used in the pre-commit hooks.

For that reason, we will need our CI step to execute this command, which will require setting up a `profiles.yml` file providing dbt the information to connect to the data warehouse. Profiles files will be different for each data warehouse ([example here](https://docs.getdbt.com/reference/warehouse-profiles/snowflake-profile)).

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-08-31-august-product-update.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ You’ll hear more in [Tristan’s keynote](https://coalesce.getdbt.com/agenda/k
I just discovered the treasure trove of excellent resources from dbt Labs consulting partners, and want to start sharing more here. Here’s a few you might have missed over the summer:

- **Reduce ETL costs:** I’ve only just seen [this blog](https://www.mighty.digital/blog/how-dbt-helped-us-reduce-our-etl-costs-significantly) from Mighty Digital, but found it to be a super practical (and concise) introductory guide to rethinking your ETL pipeline with dbt.
- **Explore data:** [Part two of a series on exploring data](https://vivanti.com/2022/07/28/exploring-data-with-dbt-part-2-extracting/) brought to you by Vivanti. This post focuses on working with JSON objects in dbt, but I also recommend the preceding post if you want to see how they spun up their stack.
- **Explore data:** [Part two of a series on exploring data](https://vivanti.com/2022/07/28/exploring-data-with-dbt-part-2-extracting/) brought to you by Vivanti. This post focuses on working with <Term id="json" /> objects in dbt, but I also recommend the preceding post if you want to see how they spun up their stack.
- **Track historical changes:** [](https://blog.montrealanalytics.com/using-dbt-snapshots-with-dev-prod-environments-e5ed63b2c343)Snapshots are a pretty handy feature for tracking changes in dbt, but they’re often overlooked during initial onboarding. [Montreal Analytics explains how to set them up](https://blog.montrealanalytics.com/using-dbt-snapshots-with-dev-prod-environments-e5ed63b2c343) in dev/prod environments
- **Learn dbt:** Have some new faces on the data team that might need an introduction to dbt? Our friends at GoDataDriven are hosting a [virtual dbt Learn Sept 12-14](https://www.tickettailor.com/events/dbtlabs/752537).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ $ dbt run --vars '{"key": "value"}'
```

The `--vars` argument accepts a YAML dictionary as a string on the command line.
YAML is convenient because it does not require strict quoting as with JSON.
YAML is convenient because it does not require strict quoting as with <Term id="json" />.

Both of the following are valid and equivalent:

Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/building-a-dbt-project/exposures.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ _Expected:_
- **depends_on**: list of refable nodes (`ref` + `source`)

_Optional:_
- **url**
- **url**: enables the link to **View this exposure** in the upper right corner of the generated documentation site
- **maturity**: one of `high`, `medium`, `low`
- **owner**: name

Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/building-a-dbt-project/snapshots.md
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,7 @@ If you apply business logic in a snapshot query, and this logic changes in the f

Basically – keep your query as simple as possible! Some reasonable exceptions to these recommendations include:
* Selecting specific columns if the table is wide.
* Doing light transformation to get data into a reasonable shape, for example, unpacking a JSON blob to flatten your source data into columns.
* Doing light transformation to get data into a reasonable shape, for example, unpacking a <Term id="json" /> blob to flatten your source data into columns.

## Snapshot meta-fields

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ clean-targets:
- Do you select tests using the old names for test types? (`test_type:schema`, `test_type:data`, `--schema`, `--data`)
- Do you have custom macro code that calls the (undocumented) global macros `column_list`, `column_list_for_create_table`, `incremental_upsert`?
- Do you have custom scripts that parse dbt JSON artifacts?
- Do you have custom scripts that parse dbt <Term id="json" /> artifacts?
- (BigQuery only) Do you use dbt's legacy capabilities around ingestion-time-partitioned tables?

If you believe your project might be affected, read more details in the migration guide [here](/guides/migration/versions/upgrading-to-v1.0).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ more information on configuring a Snowflake OAuth connection in dbt Cloud, pleas

:::info Uploading a service account JSON keyfile

While the fields in a BigQuery connection can be specified manually, we recommend uploading a service account JSON keyfile to quickly and accurately configure a connection to BigQuery.
While the fields in a BigQuery connection can be specified manually, we recommend uploading a service account <Term id="json" /> keyfile to quickly and accurately configure a connection to BigQuery.

:::

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Usage statistics are tracked once weekly, and include the following information:
- The number of developer and read only licenses utilized in each account
- The version of dbt Cloud installed in the on-premises environment

This information is sent as a JSON payload to usage.getdbt.com. A typical
This information is sent as a <Term id="json" /> payload to usage.getdbt.com. A typical
payload looks like:

```json
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
title: "List Steps API endpoint deprecation warning"
id: "liststeps-endpoint-deprecation.md"
description: "List Steps API deprecation"
sidebar_label: "Deprecation: List Steps API endpoint"
tags: [Sept-15-2022]
---

On October 14th, 2022 dbt Labs is deprecating the [List Steps](https://docs.getdbt.com/dbt-cloud/api-v2#tag/Runs/operation/listSteps) API endpoint. From October 14th, any GET requests to this endpoint will fail. Please prepare to stop using the List Steps endpoint as soon as possible.

dbt Labs will continue to maintain the [Get Run](https://docs.getdbt.com/dbt-cloud/api-v2#tag/Runs/operation/getRunById) endpoint, which is a viable alternative depending on the use case.

You can fetch run steps for an individual run with a GET request to the following URL:

`https://cloud.getdbt.com/api/v2/accounts/{accountId}/runs/{runId}/?include_related=["run_steps"]`
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ First, make sure to configure your sources to [snapshot freshness information](u

<Changelog>

- **v0.21.0:** Renamed `dbt source snapshot-freshness` to `dbt source freshness`. If using an older version of dbt, the command is `snapshot-freshness`.
- **v0.21.0:** Renamed `dbt source snapshot-freshness` to `dbt source freshness`. If using an older version of dbt, the command is `snapshot-freshness`.
To have dbt Cloud display data source freshness as a rendered user interface, you will still need to use the pre-v0.21 syntax of `dbt source snapshot-freshness`.

</Changelog>

Expand Down
Loading

0 comments on commit d48fb50

Please sign in to comment.