Instructor: Grishma Jena
Want to perform Data Science but don’t know how to go about it? Have a dataset that you really want to analyze but not sure where to start? This hands-on session teaches how to explore datasets, use Machine Learning algorithms and derive insights from predictive models using popular tools in Python.
We will be using Jupyter to execute Python code for the purpose of this Data Science tutorial. A virtual environment can be used to manage and isolate the packages for our project. Please follow these instructions to have all the dependencies ready before the tutorial as that will enable us to hit the ground running.
Pre-requisites
Option A : Using Jupyter on your local machine
Requires installation of packages but you will be able to use Jupyter and run code offline.
-
Ensure that pip is installed and upgrade it. Pip should already be available if you are using Python 3 >= 3.4 downloaded from python.org. For further installation instructions check this.
-
Optional: If you plan on using a virtual environment, ensure venv (Python 3) is installed. Create a virtual environment and activate it. Detailed instructions here.
-
Install the required packages using pip in the terminal:
python3 -m pip install jupyter numpy pandas nltk lxml requests matplotlib sklearn graphviz
wikipedia gensim wordcloud --user
If you face problems installing NLTK, take a look at this.
-
Open a Jupyter notebook with jupyter notebook in your terminal. This opens in your browser at default port 8888.
-
Download the sample notebook ‘Introduction to Data Science test’ and open it in Jupyter. Execute the code by clicking on Cell -> Run Cells. Check out this video for a quick introduction to Jupyter.
Option B: Using Google colaboratory on the cloud
Bypasses installation of packages but needs internet to run code on the cloud.
-
Access Google colab and familiarize yourself with running code in the browser.
-
Read the 'Getting started' guide and take a look at the introductory video.
-
Download the sample notebook ‘Introduction to Data Science test’, upload it in Colab and open it. Execute the code by clicking on Runtime -> Run all.
Feel free to contact me in case of any queries.