-
Notifications
You must be signed in to change notification settings - Fork 0
emmakopp1/MASH_Advanced_learning
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Link of the book with the figures (Francis Bach) : https://www.di.ens.fr/~fbach/ltfp_book.pdf Figure 3.1 : Liner Least-squares regression Setting : Ordinary Least Square Polynomial regression with varying number of observations. Figure 3.2 : Liner Least-squares regression Setting : Ordinary Least Square Convergence rate for polynomial regression with error bars (obtained from 32 replications by adding/subtracting standard deviations), plotted in logarithmic scale, with fixed design (left plot). The large error bars for small n in the right plot are due to the lower error bar being negative before taking the logarithm. Figure p.90 : Optimization for machine learning Setting : Ordinary Least Square We consider two quadratic optimization problems in dimension d = 1000, with two different decays of eigenvalues (lambda_k) for the Hessian matrix H, one as 1/k (in blue below) and one in 1/k^2 (in red below), and for which we plot the performance for function values, both in semi-logarithm plots (left) and full-logarithm plots (right). For slow decays (blue), we see the linear convergence kicking in, while for fast decays (red), the rates in 1/t dominate. Figure p108 : Optimization for machine learning Setting : SVM. We consider a simple binary classification problem with linear predictors and features with l2-norm bounded by R. We consider the hinge loss with a square l2-regularizer μ ∥ · ∥2. We measure the excess training objective. We consider two values of μ, and compare the two step-sizes γt = 1/(R2 t) and γt = 1/(μt). We see that for large enough μ, the strongly-convex step-size is better. This is not the case for small μ. Figure 6.2 : Local Averaging Methods Setting : Partition estimator Regressograms in d = 1 dimension, with three values of |J| (the number of sets in the partition). We can observe both underfitting (|J| too small), or overfitting (|J| too large). Note that the target function f∗ is piecewise affine, and that on the affine parts, the estimator is far from linear, that is, the estimator cannot take advantage of extra-regularity. Figure 6.3 : Local Averaging Methods Setting : Nearest neighbors. k-nearest neighbor regression in $d = 1$ dimension, with 3 values of k (the number of neighbors). We can observe both underfitting (k too large), or overfitting (k too small). Figure 6.4 : Local Averaging Methods Nadaraya-Watson Nadaraya-Watson regression in $d = 1$ dimension, with three values of h (the bandwidth), for the Gaussian kernel. We can observe both underfitting (h too large), or overfitting (h too small).
About
Illustration of convergence results of advanced learning methods
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published