tags | |
---|---|
|
src: [@Bubeck:2021vv]
Here's another attempt to explain the overparameterisation enigma found in deep learning (
- to fit (interpolate)
$n$ data points of$d$ dimensions, one requires$nd$ parameters if one wants a smooth function (without the smoothness,$n$ should suffice) - thus, the overparameterisation is necessary to ensure smoothness (which implies robustness)
Their theorem/proof works in the contrapositive: for any function class smoothly parameterised by
-
note that this result doesn't say that you'll fit a smooth function if you're overparameterising -- just that if you don't have enough parameters, then there's no way you're fitting a smooth one.
- thus, in the context of scale (à la [[gpt3]]), this result says that scale is necessary to achieve robustness (though not sufficient).
-
the noise level here is defined as the expected conditional variance
$\sigma^2 := \mathbb{E}_{\mu}\left[\text{Var}(y \mid x)\right] > 0$ . noise is necessary from a theoretical perspective, as it prevents fitting a smooth classifier with perfect accuracy.