-
Notifications
You must be signed in to change notification settings - Fork 0
Review from first #1
base: review
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review
By following requirements, this exercise work doesn't meet them.
Clean design and sound software engineering practices.
Structured repo so can be used as working repo in future.
Production-ready solution.
Schema designing need more critical thinking. For junior level, this is okay. for mid, acceptable, but for senior, sorry to say it's not there yet.
DBHOST=localhost | ||
DBUSER=demouser | ||
DBPASSWORD=demopassword | ||
DATABASE=financial_database | ||
TABLE=financial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ETL script refer to .env
file but none found in repository. I add this one my self
cols = '`,`'.join([str(i) for i in list_col]) | ||
|
||
''' CONNECT TO THE DATABASE ''' | ||
connection = pymysql.connect( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies that we need MySQL running on default port 3306
only. No flexibility for port connection
USE financial_database; | ||
|
||
DROP TABLE IF EXISTS financial; | ||
CREATE TABLE financial ( financial_id int unsigned not null auto_increment, segment varchar(128), country varchar(128), product varchar(128), discount_band varchar(128), units_sold float, manufacturing_price int, sale_price int, gross_sales float, discounts float, sales float, cogs float, profit float, date datetime, month_number int, month_name varchar(128), year int, primary key (financial_id) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About schema design
month_name varchar(128)
We would never have any english month name exceed 128 length.
product varchar(128)
Excel has 32,767 characters limit per cells. These will easily lead to truncation issue if data in xlsx is change.
manufacturing_price int, sale_price int
Price values as Integer ???
date datetime
Why date time type if data will never specify the time?
Mostly Top set 128-char limit for every varchar
fields. Some won't need that long, Some might exceed. So in point of foreseeing / scalable design, this design doesn't meet this scenario.
@@ -0,0 +1,9 @@ | |||
INSERT INTO mysql.user (User,Host,authentication_string,ssl_cipher,x509_issuer,x509_subject) VALUES('demouser','localhost',PASSWORD('demopassword'),'','',''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PASSWORD()
is compatible only for MySQL 5 or less. As the repository doesn't specific the version of MySQL (they released version 8) already. This simply breaks the execution.
@@ -0,0 +1,9 @@ | |||
INSERT INTO mysql.user (User,Host,authentication_string,ssl_cipher,x509_issuer,x509_subject) VALUES('demouser','localhost',PASSWORD('demopassword'),'','',''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's CREATE USER
for simply user creation, but Top chose to use INSERT into user table. This is not wrong but it looks like he thinks too complex for simply requirement but too simply for detailed requirement. IMO 🤔
for i, row in df['Sheet1'].iterrows(): | ||
sql = 'INSERT INTO `'+table+'` (`'+cols +'`) VALUES ('+'%s,'*(len(row)-1)+'%s)' | ||
# row represents the data points in DataFrame | ||
cursor.execute(sql, tuple(row)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's bulk statement compatibility or cursor.executemany
, why opt to INSERT one by one??
load_dotenv() # Load .env | ||
dbhost = os.getenv('DBHOST', "") | ||
dbuser = os.getenv('DBUSER', "") | ||
dbpassword = os.getenv('DBPASSWORD', "") | ||
database = os.getenv('DATABASE', "") | ||
table = os.getenv('TABLE', "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talking in developer world, this one needs encapsulation. It's okay to do this for single script, but what in real world, we never have only single magic script. Every scripts will have to change this part if connection have to be changed.
load_dotenv() # Load .env | ||
dbhost = os.getenv('DBHOST', "") | ||
dbuser = os.getenv('DBUSER', "") | ||
dbpassword = os.getenv('DBPASSWORD', "") | ||
database = os.getenv('DATABASE', "") | ||
table = os.getenv('TABLE', "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Environment variable for single table name ??
@@ -0,0 +1,58 @@ | |||
from dotenv import load_dotenv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About ETL Script
This script is fixed to one task. It will require some work to reuse this for other task.
cursor.execute(sql, tuple(row)) | ||
|
||
# The connection is not autocommitted by default, so we must commit to save our changes | ||
connection.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Txn will be commit but if this one failed it will never be rollback.
No description provided.