-
Notifications
You must be signed in to change notification settings - Fork 26
/
polynomial_regression.py
93 lines (57 loc) · 3.27 KB
/
polynomial_regression.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# This file was generated from polynomial_regression.ipynb with nbconvert
# Source: https://github.com/groda/big_data
#!/usr/bin/env python
# coding: utf-8
# <a href="https://colab.research.google.com/github/groda/big_data/blob/master/polynomial_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
# <a href="https://github.com/groda/big_data"><div><img src="https://github.com/groda/big_data/blob/master/logo_bdb.png?raw=true" align=right width="90" alt="Logo Big Data for Beginners"></div></a>
# # Polynomial regression with NumPy and Matplotlib
#
# This tutorial provides a comprehensive, step-by-step example of polynomial regression using NumPy, with Matplotlib for visualizing the results.
# In[1]:
import numpy as np
import matplotlib.pyplot as plt
# Set a random seed
# In[2]:
np.random.seed(12)
# Set the number of data points `N` and the degree `D` of the polynomial that is going to fit to the data.
#
# **Note:** for `D = 1` this is _linear regression_.
# In[3]:
N = 100 # number of points
D = 2 # degree of polynomial
# ### Generate some random data
#
# To generate some random data that is suitable for polynomial regression we're going to use the following functions:
# - [`np.random.uniform`](https://docs.scipy.org/doc/numpy-1.16.0/reference/generated/numpy.random.uniform.html) draws samples from a uniform distribution
# - [`np.random.sample`](https://docs.scipy.org/doc/numpy-1.16.0/reference/generated/numpy.random.sample.html) returns random floats in the half-open interval $[0.0, 1.0)$.
# - [`np.random.normal`](https://docs.scipy.org/doc/numpy-1.16.0/reference/generated/numpy.random.normal.html) draws random samples from a normal (Gaussian) distribution.
# - [`np.polyval`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyval.html) evaluates a polynomial (defined by its coefficients) at the given points
# We are first going to generate a set of $x$ values taken at random in a given range ($[-N, N[$).
#
# The array `c` is used to store the coefficients of a random polynomial. To generate a random polynomial of degree $D$ we're going to generate $D$ random coefficients and set the coefficient of the leading term to $1$.
# In[4]:
x = np.random.uniform(-N , N, N) # x values are uniformly distributed between -100 and +100
c = 200*np.random.sample((D,))-100 # random polynomial coefficients drawn from the interval [-100, 100)
c = np.insert(c, 0, 1) # first coeff. of the random poly is 1
y = np.polyval([1] + c, x)
noise = np.random.normal(-0.5*np.std(y), 0.5*np.std(y), N) # some normally distributed noise
y += noise # add noise to values
plt.scatter(x, y)
plt.show()
# In[5]:
t = np.linspace(-N, N, num=N)
p = np.poly1d(np.polyfit(x, y, D))
plt.plot(x, y, 'o', t, p(t), '-')
plt.show()
# ### How well did we approximate?
#
# Show the coefficients of the approximating polynomial `p`.
# In[6]:
np.set_printoptions(formatter={'float': '{: 12.6f}'.format})
print("Coefficients of p: {}".format(p.c))
print("Original coefficients: {}".format(c))
# Visualize the original polynomial `p_orig` and its approximation `p` obtained with polynomial regression.
# In[7]:
p_orig = np.poly1d(c)
plt.plot(x, y, 'o', t, p(t), '-', t, p_orig(t), '-')
plt.show()