Setup

Repository for the following blogs

dbt(data build tool) Tutorial

Setup

Prerequisites

In addition to the tools, you would also need to know what dbt is, you can learn about it here: dbt tutorial.

Clone the git repo as shown below:

git clone https://github.com/josephmachado/simple_dbt_project.git
cd simple_dbt_project

Demo on CodeSpaces

Here is a demo of how to run this on CodeSpaces:

Setup python virtual environment as shown below:

rm -rf myenv
# set up venv and run dbt
python -m venv myenv
source myenv/bin/activate
pip install -r requirements.txt

Run dbt

Run dbt commands as shown below:

dbt clean
dbt deps
dbt snapshot
dbt run 
dbt test
dbt docs generate
dbt docs serve

Go to http://localhost:8080 to see the dbt documentation. If you are running this on GitHub CodeSpaces, follow this section to expose port 8080 for access from your browser.

Press Ctrl + c to stop the document server.

Create snapshots

Let's do some testing, Insert some data into source customer table(in our case the new_customer data is appended into customers.csv), to demonstrate dbt snapshots. Since we are using duckdb and the base table is essentially data at customer.csv we have to append new data to this customer.csv file as shown below:

# Remove header from ./raw_data/customers_new.csv
# and append it to ./raw_data/customers.csv
echo "" >> ./raw_data/customers.csv
tail -n +2 ./raw_data/customer_new.csv >> ./raw_data/customers.csv

# NOTE: Windows users need to do this manually or via powershell as

Run snapshot and create models again.

dbt snapshot 
dbt run

# reset customers.csv
head -n -5 ./raw_data/customers.csv > temp
cat temp > ./raw_data/customers.csv 
rm temp

Let's open a python REPL and check our data, as shown below:

import duckdb
con = duckdb.connect("dbt.duckdb")
results = con.execute("select * from snapshots.customers_snapshot where customer_id = 82").fetchall()
for row in results:
    print(row)
# NOTE: You will see 2 rows printed
exit()

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
analysis		analysis
data		data
macros		macros
models		models
raw_data		raw_data
snapshots		snapshots
tests		tests
.gitignore		.gitignore
.tool-versions		.tool-versions
README.md		README.md
dbt_project.yml		dbt_project.yml
packages.yml		packages.yml
profiles.yml		profiles.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setup

Prerequisites

Demo on CodeSpaces

Run dbt

Create snapshots

About

Releases

Packages

Contributors 2

josephmachado/simple_dbt_project

Folders and files

Latest commit

History

Repository files navigation

Setup

Prerequisites

Demo on CodeSpaces

Run dbt

Create snapshots

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages