-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBT project for Airbyte normalization #802
Merged
Merged
Changes from 19 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
bad20b2
Start DBT module for running normalization transformations
ChristopheDuong cf7b921
Generates DBT models from JSON Schema catalog
ChristopheDuong bb5f6eb
Add Doc on how to run transformation of catalog
ChristopheDuong ec8f803
Make generated DBT SQL cross db compatible
ChristopheDuong 7693d5c
Tweaks from review
ChristopheDuong f02d70d
turn on basic normalization for data warehouses
cgardens f5d1ec9
Merge remote-tracking branch 'origin/master' into normalization
ChristopheDuong 8bbb4c9
Hook up dbt normalization
ChristopheDuong eebc286
Merge remote-tracking branch 'origin/cgardens/turn_on_normalization' …
ChristopheDuong 9a2c8c9
Connect normalization pieces together
ChristopheDuong a44b52c
Exclude spotless from DBT folders
ChristopheDuong 1ad010a
Add placeholder adapter functions for postgres flavor of sql
ChristopheDuong 08b063c
Correct some snowflakes specifities
ChristopheDuong c9c6fe4
pull from master
sherifnada 73ce71b
merge master part 2
sherifnada 56aa32e
Merge branch 'master' of github.com:airbytehq/airbyte into chris/dbt
sherifnada 3a18856
Postgres normalization (#831)
sherifnada 05d422f
Changes from code reviews
ChristopheDuong fb32f13
Handle Quoting in generated SQL so it can be configured as needed per…
ChristopheDuong e6cc9b1
Check if items is in dict before accessing it
ChristopheDuong 9904b29
It works on snowflake!!
ChristopheDuong File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,3 +15,5 @@ __pycache__ | |
.venv | ||
.mypy_cache | ||
|
||
# dbt | ||
profiles.yml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,3 +3,4 @@ | |
!entrypoint.sh | ||
!setup.py | ||
!normalization | ||
!dbt-project-template |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4 changes: 4 additions & 0 deletions
4
airbyte-integrations/bases/base-normalization/dbt-project-template/.gitignore
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
build/ | ||
logs/ | ||
models/generated/ | ||
dbt_modules/ |
19 changes: 19 additions & 0 deletions
19
airbyte-integrations/bases/base-normalization/dbt-project-template/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
## Installing DBT | ||
|
||
1. Activate your venv and run `pip3 install dbt` | ||
1. Copy `airbyte-normalization/sample_files/profiles.yml` over to `~/.dbt/profiles.yml` | ||
1. Edit to configure your profiles accordingly | ||
|
||
## Running DBT | ||
|
||
1. `cd airbyte-normalization` | ||
1. You can now run DBT commands, to check the setup is fine: `dbt debug` | ||
1. To build the DBT tables in your warehouse: `dbt run` | ||
|
||
## Running DBT from Airbyte generated config | ||
|
||
1. You can also change directory (`cd /tmp/dev_root/workspace/1/0/normalize` for example) to one of the workspace generated by Airbyte within one of the `normalize` folder. | ||
1. You should find `profiles.yml` and a bunch of other DBT files/folders created there. | ||
1. To check everything is setup properly: `dbt debug --profiles-dir=$(pwd) --project-dir=$(pwd)` | ||
1. You can modify the `.sql` files and run `dbt run --profiles-dir=$(pwd) --project-dir=$(pwd)` too | ||
1. You can inspect compiled DBT `.sql` files before they are run in the destination engine in `normalize/build/compiled` or `normalize/build/run` folders |
40 changes: 40 additions & 0 deletions
40
airbyte-integrations/bases/base-normalization/dbt-project-template/dbt_project.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# Name your package! Package names should contain only lowercase characters | ||
# and underscores. A good package name should reflect your organization's | ||
# name or the intended use of these models | ||
name: 'airbyte' | ||
version: '1.0' | ||
config-version: 2 | ||
|
||
# This setting configures which "profile" dbt uses for this project. Profiles contain | ||
# database connection information, and should be configured in the ~/.dbt/profiles.yml file | ||
profile: 'normalize' | ||
|
||
# These configurations specify where dbt should look for different types of files. | ||
# The `source-paths` config, for example, states that source models can be found | ||
# in the "models/" directory. You probably won't need to change these! | ||
source-paths: ["models"] | ||
docs-paths: ["docs"] | ||
analysis-paths: ["analysis"] | ||
test-paths: ["tests"] | ||
data-paths: ["data"] | ||
macro-paths: ["macros"] | ||
|
||
target-path: "build" # directory which will store compiled SQL files | ||
clean-targets: # directories to be removed by `dbt clean` | ||
- "build" | ||
- "dbt_modules" | ||
|
||
# https://docs.getdbt.com/reference/project-configs/quoting/ | ||
quoting: | ||
database: true | ||
schema: true | ||
identifier: true | ||
|
||
# You can define configurations for models in the `source-paths` directory here. | ||
# Using these configurations, you can enable or disable models, change how they | ||
# are materialized, and more! | ||
models: | ||
airbyte: | ||
# Schema (dataset) defined in profiles.yml is concatenated with schema below for dbt's final output | ||
+schema: NORMALIZED | ||
+materialized: view |
34 changes: 34 additions & 0 deletions
34
...ntegrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/array.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
{# | ||
Adapter Macros for the following functions: | ||
- Bigquery: unnest() -> https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#flattening-arrays-and-repeated-fields | ||
- Snowflake: flatten() -> https://docs.snowflake.com/en/sql-reference/functions/flatten.html | ||
- Redshift: -> https://blog.getdbt.com/how-to-unnest-arrays-in-redshift/ | ||
- postgres: unnest() -> https://www.postgresqltutorial.com/postgresql-array/ | ||
#} | ||
|
||
{# flatten ------------------------------------------------- #} | ||
|
||
{% macro unnest(array_col) -%} | ||
{{ adapter.dispatch('unnest')(array_col) }} | ||
{%- endmacro %} | ||
|
||
{% macro default__unnest(array_col) -%} | ||
unnest({{ adapter.quote_as_configured(array_col, 'identifier')|trim }}) | ||
{%- endmacro %} | ||
|
||
{% macro bigquery__unnest(array_col) -%} | ||
unnest({{ adapter.quote_as_configured(array_col, 'identifier')|trim }}) | ||
{%- endmacro %} | ||
|
||
{% macro postgres__unnest(array_col) -%} | ||
unnest({{ adapter.quote_as_configured(array_col, 'identifier')|trim }}) | ||
{%- endmacro %} | ||
|
||
{% macro redshift__unnest(array_col) -%} | ||
-- FIXME to implement as described here? https://blog.getdbt.com/how-to-unnest-arrays-in-redshift/ | ||
{%- endmacro %} | ||
|
||
{% macro snowflake__unnest(array_col) -%} | ||
-- TODO test this!! not so sure yet... | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it working? |
||
table(flatten({{ adapter.quote_as_configured(array_col, 'identifier')|trim }})) | ||
{%- endmacro %} |
110 changes: 110 additions & 0 deletions
110
...s/bases/base-normalization/dbt-project-template/macros/cross_db_utils/json_operations.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
{# | ||
Adapter Macros for the following functions: | ||
- Bigquery: JSON_EXTRACT(json_string_expr, json_path_format) -> https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions | ||
- Snowflake: JSON_EXTRACT_PATH_TEXT( <column_identifier> , '<path_name>' ) -> https://docs.snowflake.com/en/sql-reference/functions/json_extract_path_text.html | ||
- Redshift: json_extract_path_text('json_string', 'path_elem' [,'path_elem'[, …] ] [, null_if_invalid ] ) -> https://docs.aws.amazon.com/redshift/latest/dg/JSON_EXTRACT_PATH_TEXT.html | ||
- Postgres: json_extract_path_text(<from_json>, 'path' [, 'path' [, ...]]) -> https://www.postgresql.org/docs/12/functions-json.html | ||
#} | ||
|
||
{# format_json_path -------------------------------------------------- #} | ||
{% macro format_json_path(json_path_list) -%} | ||
{{ adapter.dispatch('format_json_path')(json_path_list) }} | ||
{%- endmacro %} | ||
|
||
{% macro default__format_json_path(json_path_list) -%} | ||
{{ '.' ~ json_path_list|join('.') }} | ||
{%- endmacro %} | ||
|
||
{% macro bigquery__format_json_path(json_path_list) -%} | ||
{{ '"$.' ~ json_path_list|join('"."') ~ '"' }} | ||
{%- endmacro %} | ||
|
||
{% macro postgres__format_json_path(json_path_list) -%} | ||
{{ "'" ~ json_path_list|join("','") ~ "'" }} | ||
{%- endmacro %} | ||
|
||
{% macro redshift__format_json_path(json_path_list) -%} | ||
{{ "'" ~ json_path_list|join("','") ~ "'" }} | ||
{%- endmacro %} | ||
|
||
{% macro snowflake__format_json_path(json_path_list) -%} | ||
{{ "'" ~ json_path_list|join('"."') ~ "'" }} | ||
{%- endmacro %} | ||
|
||
{# json_extract ------------------------------------------------- #} | ||
|
||
{% macro json_extract(json_column, json_path_list) -%} | ||
{{ adapter.dispatch('json_extract')(json_column, json_path_list) }} | ||
{%- endmacro %} | ||
|
||
{% macro default__json_extract(json_column, json_path_list) -%} | ||
json_extract({{ adapter.quote_as_configured(json_column, 'identifier')|trim }}, {{ format_json_path(json_path_list) }}) | ||
{%- endmacro %} | ||
|
||
{% macro bigquery__json_extract(json_column, json_path_list) -%} | ||
json_extract({{ adapter.quote_as_configured(json_column, 'identifier')|trim }}, {{ format_json_path(json_path_list) }}) | ||
{%- endmacro %} | ||
|
||
{% macro postgres__json_extract(json_column, json_path_list) -%} | ||
jsonb_extract_path_text({{ adapter.quote_as_configured(json_column, 'identifier')|trim }}, {{ format_json_path(json_path_list) }}) | ||
{%- endmacro %} | ||
|
||
{% macro redshift__json_extract(json_column, json_path_list) -%} | ||
json_extract_path_text({{ adapter.quote_as_configured(json_column, 'identifier')|trim }}, {{ format_json_path(json_path_list) }}) | ||
{%- endmacro %} | ||
|
||
{% macro snowflake__json_extract(json_column, json_path_list) -%} | ||
json_extract_path_text({{ adapter.quote_as_configured(json_column, 'identifier')|trim }}, {{ format_json_path(json_path_list) }}) | ||
{%- endmacro %} | ||
|
||
{# json_extract_scalar ------------------------------------------------- #} | ||
|
||
{% macro json_extract_scalar(json_column, json_path_list) -%} | ||
{{ adapter.dispatch('json_extract_scalar')(json_column, json_path_list) }} | ||
{%- endmacro %} | ||
|
||
{% macro default__json_extract_scalar(json_column, json_path_list) -%} | ||
json_extract_scalar({{ adapter.quote_as_configured(json_column, 'identifier')|trim }}, {{ format_json_path(json_path_list) }}) | ||
{%- endmacro %} | ||
|
||
{% macro bigquery__json_extract_scalar(json_column, json_path_list) -%} | ||
json_extract_scalar({{ adapter.quote_as_configured(json_column, 'identifier')|trim }}, {{ format_json_path(json_path_list) }}) | ||
{%- endmacro %} | ||
|
||
{% macro postgres__json_extract_scalar(json_column, json_path_list) -%} | ||
jsonb_extract_path_text({{ adapter.quote_as_configured(json_column, 'identifier')|trim }},{{ format_json_path(json_path_list) }}) | ||
{%- endmacro %} | ||
|
||
{% macro redshift__json_extract_scalar(json_column, json_path_list) -%} | ||
json_extract_path_text({{ adapter.quote_as_configured(json_column, 'identifier')|trim }}, {{ format_json_path(json_path_list) }}) | ||
{%- endmacro %} | ||
|
||
{% macro snowflake__json_extract_scalar(json_column, json_path_list) -%} | ||
json_extract_path_text({{ adapter.quote_as_configured(json_column, 'identifier')|trim }}, {{ format_json_path(json_path_list) }}) | ||
{%- endmacro %} | ||
|
||
{# json_extract_array ------------------------------------------------- #} | ||
|
||
{% macro json_extract_array(json_column, json_path_list) -%} | ||
{{ adapter.dispatch('json_extract_array')(json_column, json_path_list) }} | ||
{%- endmacro %} | ||
|
||
{% macro default__json_extract_array(json_column, json_path_list) -%} | ||
json_extract_array({{ adapter.quote_as_configured(json_column, 'identifier')|trim }}, {{ format_json_path(json_path_list) }}) | ||
{%- endmacro %} | ||
|
||
{% macro bigquery__json_extract_array(json_column, json_path_list) -%} | ||
json_extract_array({{ adapter.quote_as_configured(json_column, 'identifier')|trim }}, {{ format_json_path(json_path_list) }}) | ||
{%- endmacro %} | ||
|
||
{% macro postgres__json_extract_array(json_column, json_path_list) -%} | ||
jsonb_array_elements(jsonb_extract_path({{ adapter.quote_as_configured(json_column, 'identifier')|trim }},{{ format_json_path(json_path_list) }}) | ||
{%- endmacro %} | ||
|
||
{% macro redshift__json_extract_array(json_column, json_path_list) -%} | ||
json_extract_path_text({{ adapter.quote_as_configured(json_column, 'identifier')|trim }}, {{ format_json_path(json_path_list) }}) | ||
{%- endmacro %} | ||
|
||
{% macro snowflake__json_extract_array(json_column, json_path_list) -%} | ||
json_extract_path_text({{ adapter.quote_as_configured(json_column, 'identifier')|trim }}, {{ format_json_path(json_path_list) }}) | ||
{%- endmacro %} |
5 changes: 5 additions & 0 deletions
5
airbyte-integrations/bases/base-normalization/dbt-project-template/packages.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# add dependencies. these will get pulled during the `dbt deps` process. | ||
|
||
packages: | ||
- git: "https://github.com/fishtown-analytics/dbt-utils.git" | ||
revision: 0.6.2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
31 changes: 31 additions & 0 deletions
31
airbyte-integrations/bases/base-normalization/normalization/destination_type.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
""" | ||
MIT License | ||
|
||
Copyright (c) 2020 Airbyte | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. | ||
""" | ||
|
||
from enum import Enum | ||
|
||
|
||
class DestinationType(Enum): | ||
bigquery = "bigquery" | ||
postgres = "postgres" | ||
snowflake = "snowflake" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do
Instead?