{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Numpy, Matplotlib and Sklearn Tutorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We often use numpy to handle high dimensional arrays.\n", "\n", "Let's try the basic operation of numpy:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import numpy as np\n", "\n", "a = np.array([[1,2,3], [2,3,4]])\n", "print(a.ndim, a.shape, a.size, a.dtype, type(a))\n", "\n", "b = np.zeros((3,4))\n", "c = np.ones((3,4))\n", "d = np.random.randn(2,3)\n", "e = np.array([[1,2], [2,3], [3,4]])\n", "f = b*2 - c*3\n", "g = 2*c*f\n", "h = np.dot(a,e)\n", "i = d.mean()\n", "j = d.max(axis=1)\n", "k = a[-1][:2]\n", "\n", "# You can print from a to k for details" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "matplotlib.pyplot provides very useful apis for drawing graphs.\n", "\n", "Let's try the basic operation of matplotlib.pyplot:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "x = np.arange(2, 10, 0.2)\n", "\n", "plt.plot(x, x**1.5*.5, 'r-', x, np.log(x)*5, 'g--', x, x, 'b.')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to print them in different graphs, try this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def f(x):\n", " return np.sin(np.pi*x)\n", "\n", "x1 = np.arange(0, 5, 0.1)\n", "x2 = np.arange(0, 5, 0.01)\n", "\n", "plt.subplot(211)\n", "plt.plot(x1, f(x1), 'go', x2, f(x2-1))\n", "\n", "plt.subplot(212)\n", "plt.plot(x2, f(x2), 'r--')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How about printing images?\n", "\n", "Let's try to print a image whose pixels gradually change:\n", "\n", "Different pixel values represent different gray levels." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "img = np.arange(0, 1, 1/32/32) # define an 1D array with 32x32 elements gradually increasing\n", "img = img.reshape(32, 32) # reshape it into 32x32 array, the array represents a 32x32 image,\n", " # each element represents the corresponding pixel of the image\n", "plt.imshow(img, cmap='gray')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Based on numpy, Scikit-learn (sklearn) provides a lot of tools for machine learning.It is a very powerful machine learning library.\n", "\n", "Then, let's use it for mnist classification:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "from sklearn.datasets import fetch_openml\n", "\n", "# download and load mnist data from https://www.openml.org/d/554\n", "# for this tutorial, the data have been downloaded already in './scikit_learn_data'\n", "X, Y = fetch_openml('mnist_784', version=1, data_home='./scikit_learn_data', return_X_y=True)\n", "\n", "# make the value of pixels from [0, 255] to [0, 1] for further process\n", "X = X / 255.\n", "\n", "# print the first image of the dataset\n", "img1 = X[0].reshape(28, 28)\n", "plt.imshow(img1, cmap='gray')\n", "plt.show()\n", "\n", "# print the images after simple transformation\n", "img2 = 1 - img1\n", "plt.imshow(img2, cmap='gray')\n", "plt.show()\n", "\n", "img3 = img1.transpose()\n", "plt.imshow(img3, cmap='gray')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# split data to train and test (for faster calculation, just use 1/10 data)\n", "from sklearn.model_selection import train_test_split\n", "X_train, X_test, Y_train, Y_test = train_test_split(X[::10], Y[::10], test_size=1000)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "#### Q1:\n", "Please use the logistic regression(default parameters) in sklearn to classify the data above, and print the training accuracy and test accuracy." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# TODO:use logistic regression\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn import metrics\n", "\n", "\n", "\n", "\n", "\n", "print('Training accuracy: %0.2f%%' % (train_accuracy*100))\n", "print('Testing accuracy: %0.2f%%' % (test_accuracy*100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Q2:\n", "Please use the naive bayes(Bernoulli, default parameters) in sklearn to classify the data above, and print the training accuracy and test accuracy." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO:use naive bayes\n", "from sklearn.naive_bayes import BernoulliNB\n", "\n", "\n", "\n", "\n", "\n", "print('Training accuracy: %0.2f%%' % (train_accuracy*100))\n", "print('Testing accuracy: %0.2f%%' % (test_accuracy*100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Q3:\n", "Please use the support vector machine(default parameters) in sklearn to classify the data above, and print the training accuracy and test accuracy." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO:use support vector machine\n", "from sklearn.svm import LinearSVC\n", "\n", "\n", "\n", "\n", "\n", "print('Training accuracy: %0.2f%%' % (train_accuracy*100))\n", "print('Testing accuracy: %0.2f%%' % (test_accuracy*100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Q4:\n", "Please adjust the parameters of SVM to increase the testing accuracy, and print the training accuracy and test accuracy." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO:use SVM with another group of parameters\n", "\n", "\n", "\n", "\n", "\n", "print('Training accuracy: %0.2f%%' % (train_accuracy*100))\n", "print('Testing accuracy: %0.2f%%' % (test_accuracy*100))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }