first you need to have installed Python and Psycopg2 v2.7.5 to deal with PostgreSQL.
- download and install vagrant version 1.9.5 is the one that works without problems
- download and install virtual box version 5.2.
- download and install postgreSQL version 9.5
- Create an empty folder to hold your project folders and files; name this folder
log-analysis-dir
. Insidelog-analysis-dir
download the vagrant file (https://goo.gl/wLBxDA). Create another folder namedlog-analysis-project
and place it next to the vagrant file inside thelog-analysis-dir
this folder will share files with the virtual machine. - Until now, our project folder should look like this:
\
log-analysis-dir
--vagrant file --log-analysis-project
- Go to
log-analysis-project
folder orcd
to it in your terminal. Here, create the loganalysis.py file and download and unzip the newsdata.sql database file. Now your project folder should look like this:
-
//vagrant file
//log-analysis-project
- //------ newsdata.sql
- //------ loganalysis.py
vagrant up
and wait till the virtual machine is created.- Then
vagrant ssh
, which will make the terminal prompt turn into something like:vagrant@vagrant:
. - Then,
cd /vagrant
, which will take us to the directory shared between the virtual machine and our computer. - Here, we use
cd log-analysis-project
command - use the command
psql -d news -f newsdata.sql
and thenpsql -d news
to connect to the news database. Note: we can skip to the commandpsql -d news
directly once we have already connected to the database. - run this code to create views necessary to run our loganalysis python app:
psql -d news -f views.sql
- Now run our python file which contains our code using
python loganalysis.py
- We should see the output of these three SQL queries run against the news database:
- What are the most popular three articles of all time?
- Who are the most popular article authors of all time?
- On which days did more than 1% of requests lead to errors?
AuthorsVa
: this view summs the number of views for each article belongs to every author the news database have. We make use of this view to sum all the views for each article that is written by each author.totals
: this view sums up the total views for all the articles wrtten by each author. This view actually answers the question number 2.ratio
: this view sums the logs of each day grouped by status( i.e. how many http requests did the website had at each day that were '200 OK' success response and how many that were '400 NOT FOUND' failure response).errorlogs
: this view sums up all th logs for each day no matter what its status were. We will use this sum to devide the number of failure responses for that day by it, to get the ratio of faliure for each day alone.
/log-analysis-dir
Go to log-analysis-dir
folder and git bash there, run these commands:
vagrant@vagrant:/vagrant/log-analysis$ psql -d news -f views.sql
CREATE VIEW
CREATE VIEW
CREATE VIEW
CREATE VIEW