The purpose of this project is largely just to make sure you are able to get your programming environment up and running.
From your terminal / command line, navigate to a directory where you'd like to store your work. Then, clone the assignment's github repository and cd into the directory to begin working.
git clone https://github.com/scottfrees/cmps530-wp1.git
cd cmps530-wp1
Python is a language that uses many third party libraries and dependencies. It's hard keeping track of them all, so most developers use environments to isolate their projects from each other, and keep their dependencies in one project from interfering with another. We use Anaconda for this. You should either (1) create an environment and use it for all of your projects in this course, or (2) create new environments for each project.
For example, if you are creating an environment to reuse throughout the semester (easiest option), then you do the following just once:
conda create --name cmps530 python=3.7
Then, before using the environment, just make sure you activate it.
conda activate cmps530
You should have installed Python 3 using Anaconda. Once installed, you have a conda command on your command prompt / terminal which will allow you to install additional libraries.
In your project template file, you'll notice that most of the code I provided you centers around commenting what you should be doing - however - I did provide a very small amount of code that creates a bar graph of the data. This bar graph is drawn with plotly. You won't be able to run your program correctly unless you install this. Use the following command to do so:
conda install plotly
The git repository has .dvc files, you just need to pull the data to get the actual data set associated with this project.
dvc pull
The project uses our standard AWS data repository, make sure you've configured your dvc installation so you can pull the data.
Your primary task in this assignment is to compute the average, minimum, and monthly monthly rainfall in NJ from 1700-2000 and to create a bar chart to visualize it. You don't need to do much to create the bar chart - I've provided most of the work for that - but computing average, minimum, and maximum is all you.
Please open analysis.py and read my comments carefully, which explains the requirements for the program in detail.
As we will discuss many times throughout the semester, data is rarely in great shape. For this project, I've given you a file containing the average monthly rainfall in the state of New Jersey for every year from 1700 to 2000 (hint... I made this data up). Each year is written on a single line - which makes it pretty easy to calculate the average, maximum, and minimum - except some lines will have words, not numbers. You'll also call an add_to_plot function (we'll cover these next week) that the script uses to eventually draw a bar graph.
The tricky part is that some lines have -1, some are blank, and some lines say "unavailable". For whatever reason, there is no data available for these years - and there wasn't a well agreed upon standard way of representing "no data". In these cases, you should make sure you call add_to_plot with 0 for the year - so the bar graph will contain the missing year. You should not include the missing year in your calculation of the average, minimum, or maximum however.
Sometimes checking that your analysis is accurate is challenging - and we'll discuss this a lot during the remainder of the semester. For this project, it's easy to know if you are correct!
Average Rainfall: 4.204456140350879
Maximum Rainfall: 9.57
Minimum Rainfall: 0.06
Please submit only your analysis.py file to Canvas
Consider the following questions:
- What changes would you recommend to make this data easier to work with?
- What changes would you recommend to make this data more independent and self-describing?
