Tuva Seed Reader

This little application is to be used in conjunction with The Tuva Project DBT package when utilizing PostgesSQL for you datawarehouse. The Tuva Seed Reader allows you to override the load_seed macro with a PostgreSQL version that leverages COPY FROM PROGRAM. You guessed it the program that you are copying the data from is the Tuva Seed Reader. The same URI and PATTERN argumanets passed to the post-hook calls found in The Tuva Project dbt_project.yml file are used by the Tuva Seed Reader.

How Get it Working

You need to complete two task in order to get the Tuva Seed Reader working with your Tuva data project.

Install the tuva seed reader on you PostgresSQL server.
Override the load_macro macro in your DBT Project.

Installing Tuva Seed Reader

Windows does not have a global installtion for pipx so you need to run the PostgreSQL service under a local or domain account. When you connect you will need to connect at this service users so you can complete the following steps.

Connect to your PostgreSQL Server and complete these steps:

Install python 3.9 or greater. (I tested mainly with 3.12)
Connect to the server as the service user.
Install pipx.
```
py -m pip install --user pipx
```
Ensure pipx path.
```
pipx ensurepath
```

Install tuva seed reader.

pipx install https://github.com/tuva-health/tuva-seed-reader/archive/refs/heads/main.zip

Test you can run the Tuva Seed Reader
```
tuva-seed-reader
```

Override the `load_seed` Macro

In your DBT project folder locate your dbt_project.yml file. Open it and add the below dispatch. Change the placeholder 'your_project_name_here' to you project's macro name space. If you do not have, one use the name of your project. That is what I did and it seems to work nicely.

dispatch:
  - macro_namespace: the_tuva_project
    search_order: ['your_project_name_here','the_tuva_project']

In your DBT project folder, create a macros forlder if you do not already have one. In that folder create a new file and name it load_seed.sql. Cut and paste in the below code and save the file.

{% macro postgres__load_seed(uri,pattern,compression,headers,null_marker) %}

{% set sql %}
COPY {{ this }}
  FROM PROGRAM 'tuva-seed-reader {{ uri }} {{ pattern }}'
  WITH (FORMAT CSV{% if headers == True %}, HEADER MATCH{% endif %})
{% endset %}

{% if execute %}
{# debugging { log(sql, True)} #}
{% set results = run_query(sql) %}
{{ log("Loaded data from external s3 resource\n  loaded to: " ~ this ~ "\n  from: s3://" ~ uri ,True) }}
{# debugging { log(results, True) } #}
{% endif %}

{% endmacro %}

Testing

I would suggest running dbt seed or dbt build. Hopefully you will see all the terminology, reference, and valueset data get seeded properly. If you have, a failure and you want to see more in the logs you can add --log-level debug to the dbt command you used.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
tuva_seed_reader		tuva_seed_reader
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tuva Seed Reader

How Get it Working

Installing Tuva Seed Reader

Override the `load_seed` Macro

Testing

About

Releases

Packages

Contributors 2

Languages

License

tuva-health/tuva-seed-reader

Folders and files

Latest commit

History

Repository files navigation

Tuva Seed Reader

How Get it Working

Installing Tuva Seed Reader

Override the load_seed Macro

Testing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Override the `load_seed` Macro

Packages