Skip to content

Database design, creation, querying and data visualization. Db Schema | Postbird | Python SQL Queries Automation | pandas | matplotlib

Notifications You must be signed in to change notification settings

lmponcio/PostgreSQL-db-gelato

Repository files navigation

PostgreSQL Gelato Database

In this project, I design a database of ice cream shops that sell products appropriate for people with dietary requirements.

I created this project to practice SQL querying and database design. I list below the skills I applied:

Skills applied:

  • PostgreSQL
  • Database design
  • SQL Querying
  • Web scraping
  • Data Wrangling
  • Data Visualization.

Database Design

I used draw.io to design the schema.

Tables creation

I used Postbird Client to access my PostgreSQL database. Below a screenshot of the statements for creating the tables.

Filling tables

I used python for writing the INSERT statements (I saved myself from writing hundreds of lines of SQL code manually - see SQL here). Below is a screenshot of the cross reference table for the many-to-many relationship between flavours and dietary requirements (I web scraped this data).

For the reviews table I created random reviews with Python to have data to work with. Please see below a screenshot of the Database Table after the data was inserted:

Querying

I executed the following queries shown below. I thought of queries to respond to questions that explain the data:

  • Which one is the flavor with the best reviews?
  • How about the best rating of a specific group (or diet requirement)?
  • Have people given reviews consistently throughout time? Where is the peak?
-- Nut-free Icecream with Best Average Rating
SELECT 
	icecream.name, 
  ROUND(AVG(review.rating),2) AS average_rating,
  COUNT(*) AS review_count
FROM review
JOIN icecream ON icecream.id = review.icecream_id 
JOIN icecream_dietr ON icecream_dietr.icecream_id = icecream.id
JOIN dietr ON dietr.id = icecream_dietr.dietr_id
WHERE dietr.name = 'nut_free'
GROUP BY icecream.name
ORDER BY average_rating DESC
;

-- Dietary requirements with highest Average Rating
SELECT 
	dietr.name, 
  ROUND(AVG(review.rating),2) AS average_rating, 
  COUNT(*) AS review_count
FROM dietr
JOIN icecream_dietr ON dietr.id = icecream_dietr.dietr_id
JOIN icecream ON icecream_dietr.icecream_id = icecream.id 
JOIN review ON  icecream.id = review.icecream_id
GROUP BY dietr.name
ORDER BY average_rating DESC
;

-- Icecream with Best Average Rating
SELECT 
	icecream.name, 
  ROUND(AVG(review.rating),2) AS average_rating,
  COUNT(*) AS review_count
FROM review
JOIN icecream
ON icecream.id = review.icecream_id 
GROUP BY icecream.name
ORDER BY average_rating DESC
;

-- Average Review Each Month
SELECT 
	TO_CHAR(DATE_TRUNC('month', date), 'YYYY Month') AS formatted_month, 
  ROUND(AVG(rating),2), 
  COUNT(*) AS review_count
FROM review
GROUP BY DATE_TRUNC('month', date)
ORDER BY DATE_TRUNC('month', date)
;

Please find below an image of the output of one of the queries in Postbird.

Visualization

I used pandas and matplotlib to create visualizations of the previously written queries. Please see below two of the visualizations:

Monthly average ratings (see code here)

Nut-free gelato average ratings (see code here)

Bibliography

About

Database design, creation, querying and data visualization. Db Schema | Postbird | Python SQL Queries Automation | pandas | matplotlib

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages