Useful Concepts

Outliers in LLMs

One intriguing trait of LLMs is the exhibition of outlier features, which are the features with significantly larger magnitudes than others. The paper of OWL claims to preserve outlier features. Recent paper Quantizable Transformer finds that the outliers are related to softmax function in attention. See blog for more details.

Scaling laws

Increasing model size or data brings consistent performance improvements, even at very large scale. And this scaling behavior can be predictable by simple power-law curves.

Layer or Depth pruning

Layer pruning is a technique to remove entire layers from the model. It is a coarse-grained pruning method, which may be very effective in some cases.

Dimensional Mismatch Problem: When pruning intermediate layers, the input and output dimensions of subsequent layers may no longer match.
Current LLMs Layer Pruning: Transformer blocks have the exactly same dimension of input and output due to the residual connection. Thus, layer pruning is feasible for LLMs. Did not suit for situation when mismatch between new input and old input, such as VGG layer pruning.

In contrast, width pruning is a fine-grained pruning method, which removes channels or neurons from each layer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

other_concepts.md

other_concepts.md

Useful Concepts

Outliers in LLMs

Scaling laws

Layer or Depth pruning

Files

other_concepts.md

Latest commit

History

other_concepts.md

File metadata and controls

Useful Concepts

Outliers in LLMs

Scaling laws

Layer or Depth pruning