A small reimplementation of the weightwatcher
project. No affiliation whatsoever.
Why? For fun. Mostly. And to toy around with Chebyshev polynomials for matrices...
# python demo.py
import pandas as pd
import torchvision.models as models
from weightwatcher_light import weightwatcher
if __name__ == "__main__":
results = []
for model_cls in [models.vgg11, models.vgg13, models.vgg16, models.vgg19]:
print(f"======{model_cls.__name__}======")
model = model_cls(pretrained=True).cuda()
statistics = weightwatcher(model, (1, 3, 32, 32), verbose=True, debug=False)
print(pd.DataFrame(statistics.pop("layers")).to_markdown())
results.append(statistics)
for n, r in zip(["VGG11", "VGG13", "VGG16", "VGG19"], results):
print(n, r)
gives us the following output for the VGG16 network
Beside the much smaller number of features, the most signficicant difference is how we estimate the eigenvalues of convolution layers:
Per default, weightwatcher
treats convolution kernels of size
Estimating powerlaws of the eigenvalues
and hence we can estimate the trace using a finite number of samples.
For a matrix function
and in case of the denominator we are interested in the function family
for the set of possible lower bounds
We apply a similar technique to estimate the number of eigenvalues larger than
You can read more about these Matrix Chebyshev techniques in
- Napoli, E. Di, Polizzi, E., & Saad, Y. (2016). Efficient estimation of eigenvalue counts in an interval. Numerical Linear Algebra with Applications.
- Adams, R. P., Pennington, J., Johnson, M. J., Smith, J., Ovadia, Y., Patton, B., & Saunderson, J. (2018). Estimating the Spectral Density of Large Implicit Matrices (Vol. 46).
Estimating powerlaw
package has similar issues. Honestly, the motivation for the KS-estimator is kind of heuristic, I think looking for alternatives is a good idea.