Skip to content

Latest commit

 

History

History
241 lines (162 loc) · 8.32 KB

question5.md

File metadata and controls

241 lines (162 loc) · 8.32 KB

Stars Badge Forks Badge Pull Requests Badge Issues Badge GitHub contributors Visitors

Don't forget to hit the ⭐ if you like this repo.

Special Topic Data Engineering (SECP3843): Alternative Assessment

Name: Lee Cai Xuan

Matric No.: A20EC0062

Dataset: Analytics Dataset

Question 5 (a)

Step 1 - Import data

Import Accounts and Transaction data

import pandas as pd

df_acc = pd.read_json('accounts.json')

df_acc = pd.DataFrame(df_acc)

# View the DataFrame as a table
df_acc
import pandas as pd

df_trans = pd.read_json('transactions.json')

df_trans = pd.DataFrame(df_trans)

# View the DataFrame as a table
df_trans

Step 2 - Remove unwanted elements for value in the column

df_acc['_id'] = df_acc['_id'].str['$oid']
df_acc['account_id'] = df_acc['account_id'].str['$numberInt']
df_acc['limit'] = df_acc['limit'].str['$numberInt']
df_acc

df_acc['products'] = df_acc['products'].astype(str).str.replace('[', '').str.replace(']', '')
df_acc

df_trans['_id'] = df_trans['_id'].str['$oid']
df_trans['account_id'] = df_trans['account_id'].str['$numberInt']
df_trans['transaction_count'] = df_trans['transaction_count'].str['$numberInt']
df_trans['bucket_start_date'] = df_trans['bucket_start_date'].str['$date']
df_trans['bucket_end_date'] = df_trans['bucket_end_date'].str['$date']
df_trans

df_trans['bucket_start_date'] = df_trans['bucket_start_date'].str['$numberLong']
df_trans['bucket_end_date'] = df_trans['bucket_end_date'].str['$numberLong']
df_trans

Step 3 - Remove unwanted column

columns_to_remove = ['transactions']
df_trans = df_trans.drop(columns_to_remove, axis=1)
df_trans

Step 4 - Concat both Accounts and Transactions Table

df= pd.concat([df_acc, df_trans])
df

Question 5 (b)

Creating dashboard using Power BI

Step 1 - Import data

Import all the Analytics JSON files into Power BI.

Result:

  • Accounts Table

  • Transactions Table

Step 2 - Merge table

Merge the accounts and transactions table based on the same account_id. Select inner join for merging both tables.

Choose the columns you want to show in the merged tables.

Step 3 - Remove errors and null value

To remove null values of each column, select 'remove errors', the null value will be removed.

Rename the column of the table.

Step 4 - Create graphs

  • Count of limit by transaction count

This graph shows that the account limit does not restrict the user to make transaction as we can see as the account limit increases, the transaction count increases too.

  • Sum of transaction.price by products

The graph below shows that the product 'investmentStock' has the highest transaction price compared to the others. The other products have almost the same amount of transaction price.

  • Count of product by transaction code

There are two types of transaction code which are sell and buy. Based on the bar chart below, the products that are sold is slightly higher that the products that are bought which is 50.02% higher than 49.98%.

  • Sum of transaction amount and total by products

Step 5 - Dashboard

Drag all the graphs created before in the canvas in Power BI. Add slicer to filter the graph results by products.

Dashboard result:

When the products is chosen from the slicer, all the graphs will interact and display the result based on the products.

Contribution 🛠️

Please create an Issue for any improvements, suggestions or errors in the content.

You can also contact me using Linkedin for any other queries or feedback.

Visitors