SQL Case Studies

Documenting SQL case studies from Danny Ma's 8 Week SQL Challenge for learning and practice purposes.

Option 1: Docker, PostgreSQL and SQLPad

$ docker compose up

SQLPad can accessed at http://localhost:3000 or the port specified in the compose.yml file.

Stop and remove the containers with:

$ docker compose down

Option 2: Python, Amazon Athena, and Jupyter Notebook

Virtual Environment

Create a virtual environment with a preferred tool, e.g. conda, virtualenv, pipenv, poetry, etc., and install the required packages listed under the tool.poetry.dependencies section in the pyproject.toml file.

For example, with poetry & conda, this can be done as follows:

$ yes | conda create --name sql_case_studies python=3.11
$ conda activate sql_case_studies
# Poetry detects and respects the activated conda environment
$ poetry install --without docs

Amazon Athena

The Athena class can be used to interact with Amazon Athena. To use this class, the principal whose credentials are used to access the AWS services must have the necessary permissions for Athena plus S3 if a non-default bucket is used to store the query results (see below for more details).

The needed permissions can be encapsulated in a boto3 session instance and passed as the first argument to the constructor of the Athena class. The create_session utility function can be used to create the session instance.

S3 Bucket

The data parquet files for the case studies must be stored in an S3 bucket. All DDL queires are stored in the sql directory under each case study directory. These must be adjusted to point to the correct S3 urls. The data files can be uploaded to an S3 bucket using the aws cli or the console.

# Create a bucket
$ aws s3api create-bucket --bucket sql-case-studies --profile profile-name
# Upload all data files to the bucket
$ aws s3 cp data/ s3://sql-case-studies/ --recursive --profile profile-name

Optionally, query results can configured to be stored in a non-default (i.e., aws-athena-query-results-accountid-region) s3 bucket. The query result S3 url can be stored as an environment variable, e.g. ATHENA_S3_OUTPUT=s3://bucket-name/path/to/output/, which can then be passed as the s3_output argument to the Athena class constructor. The client creates the default bucket if the s3_output argument is not provided.

import os 
s3_output = os.getenv('ATHENA_S3_OUTPUT', '')

Jupyter Notebook

Each case study folder contains a notebooks directory containing Jupyter notebooks that can be used to run SQL queries.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
case_studies		case_studies
data		data
docs		docs
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
compose.yml		compose.yml
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SQL Case Studies

Option 1: Docker, PostgreSQL and SQLPad

Option 2: Python, Amazon Athena, and Jupyter Notebook

Virtual Environment

Amazon Athena

S3 Bucket

Jupyter Notebook

About

Releases

Packages

Contributors 2

Languages

yangwu1227/sql-case-studies

Folders and files

Latest commit

History

Repository files navigation

SQL Case Studies

Option 1: Docker, PostgreSQL and SQLPad

Option 2: Python, Amazon Athena, and Jupyter Notebook

Virtual Environment

Amazon Athena

S3 Bucket

Jupyter Notebook

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages