In this project, I design a database of ice cream shops that sell products appropriate for people with dietary requirements.
I created this project to practice SQL querying and database design. I list below the skills I applied:
Skills applied:
- PostgreSQL
- Database design
- SQL Querying
- Web scraping
- Data Wrangling
- Data Visualization.
I used draw.io to design the schema.
I used Postbird Client to access my PostgreSQL database. Below a screenshot of the statements for creating the tables.
I used python for writing the INSERT statements (I saved myself from writing hundreds of lines of SQL code manually - see SQL here). Below is a screenshot of the cross reference table for the many-to-many relationship between flavours and dietary requirements (I web scraped this data).
For the reviews table I created random reviews with Python to have data to work with. Please see below a screenshot of the Database Table after the data was inserted:
I executed the following queries shown below. I thought of queries to respond to questions that explain the data:
- Which one is the flavor with the best reviews?
- How about the best rating of a specific group (or diet requirement)?
- Have people given reviews consistently throughout time? Where is the peak?
-- Nut-free Icecream with Best Average Rating
SELECT
icecream.name,
ROUND(AVG(review.rating),2) AS average_rating,
COUNT(*) AS review_count
FROM review
JOIN icecream ON icecream.id = review.icecream_id
JOIN icecream_dietr ON icecream_dietr.icecream_id = icecream.id
JOIN dietr ON dietr.id = icecream_dietr.dietr_id
WHERE dietr.name = 'nut_free'
GROUP BY icecream.name
ORDER BY average_rating DESC
;
-- Dietary requirements with highest Average Rating
SELECT
dietr.name,
ROUND(AVG(review.rating),2) AS average_rating,
COUNT(*) AS review_count
FROM dietr
JOIN icecream_dietr ON dietr.id = icecream_dietr.dietr_id
JOIN icecream ON icecream_dietr.icecream_id = icecream.id
JOIN review ON icecream.id = review.icecream_id
GROUP BY dietr.name
ORDER BY average_rating DESC
;
-- Icecream with Best Average Rating
SELECT
icecream.name,
ROUND(AVG(review.rating),2) AS average_rating,
COUNT(*) AS review_count
FROM review
JOIN icecream
ON icecream.id = review.icecream_id
GROUP BY icecream.name
ORDER BY average_rating DESC
;
-- Average Review Each Month
SELECT
TO_CHAR(DATE_TRUNC('month', date), 'YYYY Month') AS formatted_month,
ROUND(AVG(rating),2),
COUNT(*) AS review_count
FROM review
GROUP BY DATE_TRUNC('month', date)
ORDER BY DATE_TRUNC('month', date)
;
Please find below an image of the output of one of the queries in Postbird.
I used pandas and matplotlib to create visualizations of the previously written queries. Please see below two of the visualizations:
Monthly average ratings (see code here)
Nut-free gelato average ratings (see code here)
- Diagrams drawn with http://draw.io/
- Icecream flavours from https://gelatomessina.com/collections/classic-flavours - See my Web Scraper Here