Skip to content

Latest commit

 

History

History
77 lines (54 loc) · 3.73 KB

README.md

File metadata and controls

77 lines (54 loc) · 3.73 KB

ExCP

ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking

Wenshuo Li, Xinghao Chen, Han Shu, Yehui Tang, Yunhe Wang

ICML 2024 Oral

Updates

  • 2024/06/21: Thanks to the contribution of Cbtor, there is an unofficial serial checkpoints of Pythia-410M checkpoints. You can use scripts/recon_ckpts.sh to reconstruct all checkpoints during the training process.
  • 2024/06/14: Training and compressing codes of Pythia-410M are released here.

Overview

We propose a novel Extreme Checkpoint Compression (ExCP) framework, which significantly reduces the required storage of training checkpoints while achieving nearly lossless performance. We first calculate the residuals of adjacent checkpoints to obtain the essential but sparse information for higher compression ratio. To further excavate the redundancy parameters in checkpoints, we then propose a weight-momentum joint shrinking method to utilize another important information during the model optimization, i.e., momentum. In particular, we exploit the information of both model and optimizer to discard as many parameters as possible while preserving critical information to ensure optimal performance. Furthermore, we utilize non-uniform quantization to further compress the storage of checkpoints.

framework

Figure 1: Overall framework of ExCP.

Requirements

pip install -r requirements.txt

Usage

Training and Compression

  1. Pick corresponding data from PILE dataset according to the lists in ./datalist, and save the data at [pos1]/[pos2].
python generate_subdataset.py --pile_path [PILE] --datalist ./datalist/datalist1.txt --save_path [pos1]
python generate_subdataset.py --pile_path [PILE] --datalist ./datalist/datalist2.txt --save_path [pos2]
  1. Run the scripts.
bash scripts/pretrain_phase1.sh [pos1]
bash scripts/pretrain_phase2.sh [pos2]

If you use the cached data, please uncomment the --data_cache_dir and --read_cached option.

  1. If you want to evaluate the model, you may use open-compass.

Results

Model Method Size hellaswag arc-e piqa C3 csl lambada Avg
Original model 4.53G 32.52 35.80 62.13 37.21 53.75 37.22 43.11
Pythia-410M Residual+7Zip 3.40G 32.52 35.80 62.13 37.21 53.75 37.22 43.11
ExCP (Ours) 0.06G 31.95 37.04 62.62 36.22 52.50 37.24 42.93

loss

Figure 2: Loss and checkpoint size curve of original Pythia-410M and compressed Pythia-410M.

Acknowledgements

We thank the following projects: transformers, stanford_alpaca, open-compass.

Citation

@inproceedings{liexcp,
  title={ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking},
  author={Li, Wenshuo and Chen, Xinghao and Shu, Han and Tang, Yehui and Wang, Yunhe},
  booktitle={Forty-first International Conference on Machine Learning}
}

License

This project is licensed under Apache License 2.0. Redistribution and use should follow this license.