- split the data we have into 2 portions.
- The 1st portion is going to be our usual training set. (70%)
- The 2nd portion is going to be our test set. (30%)
- It there is any sort of ordinary to the data. That should be better to shuffle training/test set randomly.
- Learn parameter θ from training data.
- use θ to compute test set error
- for linear regression :
- error = J(θ) (with out regularization)
- for logistic regression:
- set err( h(x), y ) = (h(x)>=0.5, y=0 or h(x)<0.5, y=1) and 1 or 0
- error = 1/m_test ∑ err( h(x_test), y_test )
- for linear regression :
-
Try serveral models with different degree of polynomial , such as:
- d=1, h(x)=θ₀+θ₁x
- d=2, h(x)=θ₀+θ₁x+θ₂x²
- ...
- d=10, h(x)=θ₀+θ₁x+ ... +θ₁₀x¹⁰
-
split the data into 3 pieces.
- 1st part, training set (60%)
- 2nd part, cross validation (CV) set (20%)
- 3rd part, test set (20%)
-
Learn parameter θ from training data.
-
compute CV set error , pick the best model with lowest error.
-
estimate generalization error for test set.
BiasVsVariance.png
-
Bias(underfit)
- J(θ) of train set will be high,
- J(θ) of CV also will be high.
- J_train ≈ J_cv
-
Variance(overfit)
- J(θ) of train set will be low,
- J(θ) of CV also will be high.
- J_cv >> J_train
λ | θ | fitting result |
---|---|---|
small (eg.0) | no penalty | High variance (overfit) |
intermediate | just right | |
large (eg.100) | heavily penalized → 0 | High Bias(underfit) |
- Try serveral different λs. eg. λ=0 , λ=0.01, λ=0.02, λ=0.04 , ... , λ=10.24 (start from no regularization, and with *2 step )
- minimize J_train(θ) with regularization and computer the parameter θ with each λ
- computer the J_cv(θ) with the different θ (without regularization), and pick the best θ with lowest error (eg. θ₄)
- see how θ₄ works on test set (without regularization 既然已经计算出θ了,就不需要正则了).
train / CV set affected by λ:
Plot learning curve give you a better sense of whether there is a bias or variance problem, or a bit of both.
For variance problem, if you provide more and more training sample, J_train / J_cv may be converge
to each other.
- learn paramete θ from training subset (i.e., X(1:n,:) and y(1:n))
- compute the training set error on training subset
- compute CV set error over the entire cross validation set
使用Learning Curve 确定是bias 还是variance,Jtrain 和 Jcv随着样本数m的增加最终收敛到一起,说明是bias,增加m并不能解决high bias的问题。
you implemented regularization linear regression to predict housing prices.
But you find your hypothesis make unacceptable large errors in its prediction.
What should you try next ?
- Get more training examples -> fix high variance
- Try smaller sets of features -> fix high varinance
- Try getting additional features -> fix high bias
- Try adding polynomial features -> fix high bias
- Try decreasing λ -> fix high bias
- Try increasing λ -> fix high variance
neural network case:
"small" neural network (fewer parameters): more prone to underfitting.
eg. 2-3-1 newwork , it's computationally cheaper.
fewer parameters means fewer hidden layer, fewer hidden units
"large" neural network (more parameters): more prone to overfitting.
eg.2-10-1 , 2-5-5-5-1, it's computationally more expensive.
Using a large
neural network ,and using regularization(λ) to address overfitting, is often more effective than using a smaller neural network.