DatabasePS1

This is the problem set 1 for CS-UH 2214 Database Systems course at NYUAD

Citation

problem set from CSE 544 - Principles of Database Systems taught @ University of Washington.

What are Here and How to Run

the .csv files in this folder are just for easier reference.

2 Database Design and Integration

When creating the schema, I didn't add primary and foreign key constraints yet dur to the long run time. I added after the table was populated.

3 Transformation

I'm using intermediate tables: (1) merged: joining the publication types with what's alreay in field table (2) a_with_homepage: authors with homepage, for easier update of author table (3) a_without_homepage: authors without homepage, for easier update of author dable
I drop them all at the end of the sql script

4.2 Data Integration

Question 4.2.2

First run crawler.py, it will create 2 csv files 'journal_ranking.csv' and 'conference_ranking.csv' with conference keys, names, and rankings.
To successfully load the data into PostgreSQL, make sure to change the [absolute paths] in 'integration.sql' (under '-- Question 2:') to the paths of the two csv's above. Then run 'integration.sql'.

Question 4.2.5 Extra Credit

I'm using external python script sadly... How to run: 0. navigate to the bottom of integration.sql, to where -- Question 5 is

comment out section 2 and run section 1
run extra_credit_google_scholar.py
change absolute path in section 2
comment out section 1 and run section 2

4.3 Data Visualization

before running vis.sql, change (at the bottom of the file) the absolute path directory to the absolute path of this assignment folder. Then run. This file will create two csv files for visualization purpose.
Then run vis.py. It will create histograms.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
ER.pdf		ER.pdf
README.md		README.md
conference_ranking.csv		conference_ranking.csv
crawler.py		crawler.py
createPubSchema.sql		createPubSchema.sql
extra_credit_google_scholar.py		extra_credit_google_scholar.py
integration.sql		integration.sql
journal_ranking.csv		journal_ranking.csv
solution-analysis.sql		solution-analysis.sql
solution-raw.sql		solution-raw.sql
transform.sql		transform.sql
vis.pdf		vis.pdf
vis.py		vis.py
vis.sql		vis.sql
vis1.csv		vis1.csv
vis2.csv		vis2.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DatabasePS1

Citation

What are Here and How to Run

2 Database Design and Integration

3 Transformation

4.2 Data Integration

Question 4.2.2

Question 4.2.5 Extra Credit

4.3 Data Visualization

About

Releases

Packages

Languages

SilvesterYu/CS-UH2214-Database-Systems-PS1

Folders and files

Latest commit

History

Repository files navigation

DatabasePS1

Citation

What are Here and How to Run

2 Database Design and Integration

3 Transformation

4.2 Data Integration

Question 4.2.2

Question 4.2.5 Extra Credit

4.3 Data Visualization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages