Using dbt (data build tool) to convert Synthea synthetic data to OMOP Common Data Model in PostgreSQL
- dbt-core >= 1.0.0
- dbt-postgres >= 1.0.0
- sqlfluff >= 2.0.0
- sqlfluff-templater-dbt >= 2.0.0
- Install dbt with Postgres adapter, sqlfluff and sqlfluff-templater-dbt
pip install dbt-postgres sqlfluff sqlfluff-templater-dbt
- If the installation is successful, you should have dbt CLI accessible globally. To test run
dbt --version
-
Go to
~/.dbt/
(user home directory, i.e., not this repo directory), createprofiles.yml
with the example fromexample.profiles.yml
. Fill in database credentials, read instructions in Postgres Profile config.
dbt can write and read to one database connection at a time. dbt is not an ETL tool, as it does just the T. -
Test settings & DB connection, run
dbt debug
- Install packages for macro
dbt deps
-
Write code. Learn more from Official free online courses.
-
We use SQLFluff to make our SQL code clean and consistent.
Install SQLFluff and sqlfluff-templater-dbt with
pip install sqlfluff sqlfluff-templater-dbt
To lint SQL codes, run
sqlfluff lint .
To fix linting issues, run
sqlfluff fix .
or the following to force fix
sqlfluff fix -f .
- To test whether the code is good on our database or not, compile the model with
dbt compile
(optional: with --models like dbt run
below).
- To run models,
dbt run
- regular run- Model selection syntax (source). Specifying models can save you a lot of time by only running/testing the models that you think are relevant. However, there is a risk that you'll forget to specify an important upstream dependency so it's a good idea to understand the syntax thoroughly:
dbt run --models modelname
- will only run modelnamedbt run --models +modelname
- will run modelname and all parentsdbt run --models modelname+
- will run modelname and all childrendbt run --models +modelname+
- will run modelname, and all parents and childrendbt run --models @modelname
- will run modelname, all parents, all children, AND all parents of all childrendbt run --exclude modelname
- will run all models except modelname- Note that all of these work with folder selection syntax too:
dbt run --models folder
- will run all models in a folderdbt run --models folder.subfolder
- will run all models in the subfolderdbt run --models +folder.subfolder
- will run all models in the subfolder and all parents
- To run models with test, use
dbt build
(optional: with --models like dbt run
above).
- To generate doc, run
dbt docs generate
, serve bydbt docs serve
. Or, run them together as
dbt docs generate && dbt docs serve --port 8080
- Learn more about dbt in the docs, reference, best practices
- dbt project maturity levels repo, video
- GitLab's dbt repo