This project provides a comprehensive pipeline for fusing layers in pre-trained causal language models using manifold learning techniques. Layer fusion aims to reduce the number of layers in a model by intelligently combining similar layers, potentially leading to more efficient models without significant loss in performance.
The pipeline involves:
- Model Evaluation: Assessing the model's performance on multiple-choice datasets.
- Activation Extraction: Capturing activations from each model layer for analysis.
- Manifold Learning: Applying diffusion kernels to embed high-dimensional activations into lower-dimensional spaces.
- Similarity Computation: Calculating similarities between layers based on mutual information and entropy estimates.
- Layer Fusion: Combining layers with high similarity to create a more compact model.
- Saving the Fused Model: Exporting the modified model for future use.
-
Clone the Repository:
git clone https://github.com/yourusername/layer-fusion](https://github.com/SempraETY/Pruning-via-Merging-Compressing-LLMs-via-Manifold-Alignment-Based-Layer-Merging.git
-
Create a Virtual Environment (Optional but Recommended):
conda create -n maniflod_compression python=3.9 -y conda activate maniflod_compression
-
Install Dependencies:
pip install torch transformers numpy pandas scikit-learn tqdm
The pipeline expects MMLU data
Run the main script with appropriate command-line arguments to perform layer fusion.
--ntrain
,-k
: (int, default=5) Number of training examples to include in prompts.--ngpu
,-g
: (int, default=4) Number of GPUs to use.--model_path
: (str, default="/data/yangzhao/point/baichuan/Meta-Llama-3-70B") Path to the pre-trained model.--num_tasks
,-n
: (int, default=57) Number of MMLU tasks to process.--num_samples
,-m
: (int, default=1) Number of samples per task.--data_dir
,-d
: (str, default="data") Directory containing the data.--num_layer
,-i
: (int, default=1) Number of layers to fuse.
python pipeline.py --ntrain 10 --ngpu 2 --model_path "/path/to/model" --num_tasks 50 --num_samples 5 --data_dir "./data" --num_layer 2
-
Initialization:
- Parses command-line arguments.
- Sets up directories for embeddings, fusion information, and merged weights.
- Configures logging and sets random seeds for reproducibility.
-
Model and Tokenizer Loading:
- Loads the tokenizer and pre-trained causal language model from the specified
model_path
.
- Loads the tokenizer and pre-trained causal language model from the specified
-
Data Processing:
- Iterates through each subject's test data.
- For each selected sample, formats the prompt and extracts activations from all model layers.
-
Manifold Learning:
- Applies a diffusion kernel to the stacked activations of each layer to obtain lower-dimensional embeddings.
-
Similarity Computation:
- Calculates a similarity matrix based on normalized pointwise information bottleneck (NPIB) between layers.
-
Layer Fusion:
- Identifies pairs of layers with high similarity.
- Fuses specified layers by blending their weights based on computed fusion ratios.
- Updates the model configuration and removes fused layers.
-
Saving the Fused Model:
- Saves the modified model's state dictionary and configuration to the designated directory.
- Embeddings Directory: Contains pickled files of embedded activations for each layer.
- Fusion Info Directory: Stores logs detailing the fusion process.
- Merged Weights Directory: Holds the fused model's configuration and the
pytorch_model.bin
file. - Note: To obtain the final weight, you also need to update the weight index configuration file. If you do not update the configuration file, the weights that do not exist in the index will be randomly initialized, causing problems
Example Structure:
EMNLP2024/
└── layer_fus/
└── al/
└── Meta-Llama-3-70B/
└── fused_2_layers/
└── iteration/
├── embeddings/
│ ├── layer_0_embedded.pkl
│ ├── layer_1_embedded.pkl
│ └── ...
├── fusion_info/
│ └── experiment.log
└── merged_weights/
├── config.json
└── pytorch_model.bin