Skip to content

Latest commit

 

History

History
137 lines (82 loc) · 8.41 KB

workstation_setup.md

File metadata and controls

137 lines (82 loc) · 8.41 KB

🏗️ Workstation Setup

Before starting the Practical Data Scientist course, we must setup your python data analysis workstation. Follow these instructions to install and configure git, python, and jupyter, so that you can run the course notebooks! 🌞

Some operating systems make this step difficult. If you are stuck with the workstation setup, please reach out in the course channel on discord, where the teachers (and other students!) can assist you. 🤗

1 Git

git is a crucial tool for programmers. It facilitates sharing and collaboration of code, files, and resources. All course contents are distributed with git, and found in the course repository hosted on GitHub.

If you are already comfortable with git, please clone the course repository and skip to the Python Environment section.

1.1 Install

  • Download & install the GitHub desktop client.
  • Skip the GitHub account registration
  • Enter your git information (name & email)

1.2 Cloning

Downloading resources with git is called "cloning". We are effectively copying the course contents from GitHub to your local machine.

  • On the Getting Started homepage, select Clone a repository from the internet...
  • Select the URL tab and enter JungleProgram/practical-data-scientist
  • Pick a destination (this is where the course repository will be copied)

Once cloned, you should see No local changes in the center of the window, and Current Repository: practical-data-scientist in the top left corner as such:

Github Desktop

This means you have successfully downloaded the course repository! 🎉

1.3 Pulling

Once in a while, the course repository will be updated on GitHub. You can "pull" those changes to your local repository by clicking the Fetch origin button on the top of the course repository window of the GitHub Desktop app.

2 Python Environment

Before we start doing fancy things with python 🐍, we've got to install in a way that's easy to use and configure. Conda is a pre-packaged dependency manager for python, and will save us a lot of time with installs and imports.

ℹ️ If you are a seasoned developer and don't like conda, this pyenv+pipenv setup gives more fine-grained control over your python environments. Warning, this install requires unix & bash experience.

2.1 Conda

2.2 Python Dependencies

This course uses a few libraries that aren't installed with conda by default. The Anaconda Navigator app allows us to manage our python environments and dependencies.

If you are comfortable with the terminal, you can install course dependencies directly with conda install folium spacy wordcloud -c conda-forge, and skip to the Jupyter Notebooks section.

We first have to add the conda-forge channel to our conda.

  • Open the Anaconda-Navigator app and select the Environments tab.
  • UnderChannels -> Add..., enter conda-forge
  • Select Channels -> Update channels
  • Select Update index...

Now we have setup the conda-forge channel, we can search and download packages. The course dependencies we need to install are folium, spacy, and wordcloud. For each:

  • Search for the package name in Search Packages
  • Select the package tickbox
  • Select Apply then Apply again You should the status: Installing packages on /opt/anaconda3 at the bottom of the window as such:

Conda Install

Once complete, you have successfully downloaded the course python dependencies! 🎉

3 Jupyter Notebooks

Jupyter notebooks are a fundamental tool for data scientists. They are a code shell / text hybrid, hosted on a web server, which weaves data processing and analysis all in one interactive interface. This web server can be hosted in the cloud, or run locally.

Today, we'll setup a local jupyter notebook server on your machine. Since this is python, jupyter is managed by conda, and accessible directly in the Navigator!

3.1 Running the Course Notebook Server

If you are comfortable with the terminal, you can open the course jupyter notebook server directly by navigating to the course repository directory, and entering jupyter notebook.

  • Open the Anaconda-Navigator app and select the Home tab.
  • Launch the Jupyter Notebook application

A window should open in your browser as such: Jupyter Notebook Server

You have started a jupyter notebook server! 🎉

3.2 Opening & Running Course Notebooks

I won't write about the basics of notebooks here, since jupyter hosts a wonderful tutorial notebook on binder.

  • Open the jupyter tutorial on binder, and check out the "Notebook Basics" to learn how to navigate the server, open, and run notebooks.
  • In your local jupyter notebook server, nagivate to the course repository (practical-data-scientist). You should find it where you cloned it with GitHub Desktop.
  • Under prepwork, open prepwork.ipynb

The prepwork notebook should open in a new tab in your browser as such: Prepwork Notebook

Congratulations, you are now running local jupyter notebooks, and are ready to start the course! 🎉🎉

3.3 Editing & Saving Course Notebooks

Throughout the course, you might want to edit or add to some of the course notebooks and save those changes. However, this might lead to conflicts with the notebooks hosted in GitHub. Since no one likes to resolve merge conflicts, it is recommended to make a copy of the notebook before editing and saving it.

  • Open the notebook in question
  • Select File -> Make a Copy...

You should now have a new notebook named XXXXX-Copy1. This copy is safe to edit and save, and will not conflict with the notebooks hosted on GitHub.

Resources

Core Resources

Additional Resources

GitHub Desktop Client

Conda

Jupyter