April 2023

Paper: G. E. Hinton et. al.: Transforming Auto-encoders (2011)

Link: https://www.cs.toronto.edu/~hinton/absps/transauto6.pdf

A discussion paper about apparent fundamental limitations of CNN layers, and how to overcome it using a new type of layer called "capsule". This is interesting in that it should result in a model that is interpretable by design, but it requires specialized training data, which would be fairly easy to render and generate synthetically, but very difficult to label on real-world pictures. Only provides a high-level and abstract discussion, without going into technical details of implementation, nor providing any concrete results or limitations.

Related (to-explore):

Paper: V. Mazzia et. al.: Efficient-CapsNet (2021)
Article: Kalman filter's use in vehicle position estimation

Video: WelcomeAIOverlords: How DeepMind uses Graph Networks for fluid simulation

Link: https://www.youtube.com/watch?v=JSed7OBasXs

Provides a high-level view to get an appreciation for the work. Doesn't go into much technical details, especially about how they efficiently update the graph edges as the system evolves and particles move around so as to facilitate inter-particle interactions. The interview with Jonathan Godwin also remains high-level, but does point out that the model's weakness is in simulating large rigid bodies, as graph networks struggle to pass information between two ends of a large rigid body quickly enough.

Related (to-explore):

Relevant paper (2020): https://arxiv.org/abs/2002.09405
Original papers (2018) on Graph Networks from DeepMind: https://arxiv.org/abs/1806.01242, https://arxiv.org/abs/1806.01261

Video: 3Blue1Brown: Mathematical foundations of Convolution

Link: https://www.youtube.com/watch?v=KuXjwB4LzSA

A video with great visualizations to get a mathematical appreciation of the subject. Also gives a teaser on how FFT can be used to speed up this operation, with links to other great videos.

Paper: G. E. Hinton et. al.: Knowledge Distillation (2015)

Link: https://arxiv.org/abs/1503.02531

Goal: Transfer knowledge from a larger model (say an ensemble) to a smaller model, effectively and efficiently.

Core idea: A "student" model learns from the "soft" labels generated by a larger "teacher" model, instead of the groundtruth ("hard") labels.

Having the student model see a richer view of the similarity structure through soft labels seems to help transfer inductive biases (think: generalizing assumptions) from the teacher to the student.
Useful for compressing knowledge from an ensemble model to a single model
The student model was observed to learn and generalize well with a fraction of training data (e.g. 3% of original train set) with soft-labels
The main limitation of this would be the computational cost of training a large model, generating soft-labels, and the storage cost of storing those soft-labels

Paper: A. Dosovitskiy et. al.: Vision Transformers (ViT)

Link: https://arxiv.org/abs/2010.11929

Input image is split into a number of small patches of fixed size
Each patch is projected into D dimensension using a trainable linear transformation
Alternatively, the image could be passed through CNN layers first and patches could be formed from CNN feature maps
The patch embeddings (+position embeddings) are fed into the transformer encoder, prepended by a [CLS] token embedding (similar to BERT, learnable)
Model is pre-trained only on classification task (unlike BERT, with 2 tasks)
TODO Experiments section needs a better look
Can the model handle images of different aspect ratios?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2023-04.md

2023-04.md

April 2023

Paper: G. E. Hinton et. al.: Transforming Auto-encoders (2011)

Video: WelcomeAIOverlords: How DeepMind uses Graph Networks for fluid simulation

Video: 3Blue1Brown: Mathematical foundations of Convolution

Paper: G. E. Hinton et. al.: Knowledge Distillation (2015)

Paper: A. Dosovitskiy et. al.: Vision Transformers (ViT)

Files

2023-04.md

Latest commit

History

2023-04.md

File metadata and controls

April 2023

Paper: G. E. Hinton et. al.: Transforming Auto-encoders (2011)

Video: WelcomeAIOverlords: How DeepMind uses Graph Networks for fluid simulation

Video: 3Blue1Brown: Mathematical foundations of Convolution

Paper: G. E. Hinton et. al.: Knowledge Distillation (2015)

Paper: A. Dosovitskiy et. al.: Vision Transformers (ViT)