Skip to content

Latest commit

 

History

History
22 lines (15 loc) · 1.55 KB

other_concepts.md

File metadata and controls

22 lines (15 loc) · 1.55 KB

Useful Concepts

Outliers in LLMs

One intriguing trait of LLMs is the exhibition of outlier features, which are the features with significantly larger magnitudes than others. The paper of OWL claims to preserve outlier features. Recent paper Quantizable Transformer finds that the outliers are related to softmax function in attention. See blog for more details.

Scaling laws

Increasing model size or data brings consistent performance improvements, even at very large scale. And this scaling behavior can be predictable by simple power-law curves.

Layer or Depth pruning

Layer pruning is a technique to remove entire layers from the model. It is a coarse-grained pruning method, which may be very effective in some cases.

  • Dimensional Mismatch Problem: When pruning intermediate layers, the input and output dimensions of subsequent layers may no longer match.
  • Current LLMs Layer Pruning: Transformer blocks have the exactly same dimension of input and output due to the residual connection. Thus, layer pruning is feasible for LLMs. Did not suit for situation when mismatch between new input and old input, such as VGG layer pruning.

In contrast, width pruning is a fine-grained pruning method, which removes channels or neurons from each layer.