Use the information below to log in to the course Snowflake instance:
- URL: https://efruuit-zh04827.snowflakecomputing.com/
- Username: Your username will be the email you use to login to CoRise, with non-alphanumeric characters removed.
Here are some examples of how to derive your username from your email. sourabh@corise.com ➡ sourabhcorisecom sourabh.bajaj@corise.com ➡ sourabhbajajcorisecom Sourabh.BAJAJ.1234@gmail.com ➡ SourabhBAJAJ1234gmailcom
- Password: ADD PASSWORD (you'll be prompted to change this, remember to save this password)
- Once logged into Snowflake, ensure you have access to the TRANSFORMER role (each student will be using their own custom schema)
We'll be setting up our GitHub fork. Below are the steps you will need to take. Please review the visual walk-through for more detail.
- Make sure you have a GitHub (https://github.com/) account.
- Go to the course project repo at https://github.com/lindsaymurphy4/bingeflix
- Create a fork of the main repo to create your own copy
- Copy the URL of the fork. It should be something like https://github.com/<GITHUB_USERNAME>/bingeflix
Next, you'll set up Gitpod using the repo fork you just created. Follow the steps below to set this up:
- Go to gitpod.io and click Dashboard (or Login) in the top right corner
- Click 'Continue with GitHub' to log in if you are not already logged in
- Enter your GitHub credentials
- Select the editor you'd like to use (VSCode in Browser is our default, but you'll see you can choose others if you prefer)
- Click Continue to open the workspace (this may take a few minutes)
- Explore the workspace
- You'll need to "Pin" your workspace so that it doesn't get Archived after 14 days, and always "Stop" your workspace so that you don't run out of hours. See video on CoRise week 0 for how to do so.
We'll go over virtual environments in more detail during week 1, but for now, you can follow these instructions to get dbt installed:
- Ensure you're in the root directory of the repo (/workspace/course_advanced_dbt)
- Run touch Pipfile to create a blank Pipfile (this should not have a file extension)
- Run open Pipfile. This will open the blank file in the editor window.
- Copy and paste the following into the Pipfile:
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"
[packages]
dbt-core = "==1.5.2"
dbt-snowflake = "==1.5.2"
sqlfluff = "==2.1.2"
sqlfluff-templater-dbt = "==2.1.2"
pre-commit = "==3.3.3"
[requires]
python_version = "3.11"
- Run
pip install pipenv
to install the pipenv tool - Run
pipenv install
to create the virtual environment and install the packages inside of the Pipfile (this may take a few moments, you should see installation logs in your terminal) - Run
pipenv shell
to activate the virtual environment - Run
dbt --version
to confirm that dbt has been installed properly - Important: you will need to activate the virtual environment each time you open Gitpod by running
pipenv shell
. dbt is installed inside of this virtual environment, so if it is not activated, you will not have access to run dbt.
- Run cd .. to move up from the project root folder to the /workspace directory
- Run ls -a to list the contents of this folder, including hidden folders–you should see a folder in the list called .dbt (if you don't, run mkdir .dbt to create it)
- Run cd .dbt to open this hidden folder
- Run touch profiles.yml to create a new profiles.yml file
- Run open profiles.yml to open the file you just created.
- Copy and paste the following credentials into the new profiles.yml file you just created:
advanced_dbt:
target: dev
outputs:
dev:
type: snowflake
account: efruuit-zh04827
user: <ADD YOUR SNOWFLAKE USER NAME>
password: <ADD YOUR SNOWFLAKE PASSWORD>
role: transformer
database: dev
warehouse: transforming
schema: dbt_<YOUR SNOWFLAKE USERNAME>
threads: 4
Run cd /workspace/advanced_dbt to navigate to the folder that contains the dbt_project.yml file
Validate that dbt can connect with Snowflake by running dbt debug
Awesome! You are now ready to start the course! 🚀🚀🚀
- Use UPPER case for all keywords
- Use trailing commas in SELECT statements
- Use Snowflake dialect
- Use consistent style in GROUP BY and ORDER BY (either names or numbers, not both)
- The primary key source column must have not_null and unique generic tests.
- All boolean columns must have an accepted_values schema test. The accepted values are true and false.
- Columns that contain category values must have an accepted_values schema test.
- Columns that should never be null must have a not_null schema test.
- Columns that should be unique must have a unique schema test.
- The primary key column must have not_null and unique schema tests.
- All boolean columns must have an accepted_values schema test. The accepted values are true and false.
- Columns that contain category values must have an accepted_values schema test.
- Columns that should never be null must have a not_null schema test.
- Columns that should be unique must have a unique schema test.
- Where possible, use schema tests from the dbt_utils or dbt_expectations packages to perform extra verification.