- Due to licensing restrictions on LLaMA weights, the model cannot be used for commercial purposes. Please adhere strictly to LLaMA's usage policy.
- Considering the limitations of LLaMA's license, we cannot directly distribute the complete model weights. Here, we are only releasing the LoRA weights of CALM-7B.
Creating the environment using Conda, followed by installing the required packages using pip.
pip install -r requirements.txt
Before running, please download rawdata to data/CRA_resample_0.045M.json
export raw_data=/path_to/CRA_resample_0.045M.json
export conv_data=/path_to/CRA_resample_0.045M_conv.json
export data_name=CRA
export dev_data=/path_to/CRA-resample-dev3k.json
export train_data=/path_to/CRA-resample-train4w.json
python scripts/convert_to_conv_data.py \
--orig_data ${raw_data} \
--write_data ${conv_data} \
--dataset_name CRA
head -n 3000 ${conv_data} > ${dev_data}
tail -n +3001 ${conv_data} > ${train_data}
We designate the first 3000 entries as the validation set, while the remaining data serves as the training set.
Training strategy
- LoRA + int8
The initiation script for training is written in train/scripts/run.sh
. You will need to modify the parameters in run.sh
according to your specific requirements.
bash scripts/run_sft.sh
- model_name_or_path: The pretrained model (if it is an LLaMA model, it needs to be converted to the hf format beforehand in order to be loaded using from_pretrained)
- train_file: Training data
- validation_file: Validation data
- output_dir: Path to the training logs and model saves
- cache_dir: Path to the cache data processing process
- cutoff_len: Maximum input sequence length (LLaMA model suggests setting it to 1024 or above, Bloom model suggests setting it to 512 or above)
nohup torchrun --nproc_per_node 2 src/entry_point/sft_train.py \
--model_name_or_path ${model_name_or_path} \
--bf16 True \
--llama True \
--use_lora True \
--deepspeed configs/deepspeed_config_stage3.json \
--lora_config configs/lora_config_llama.json \
--train_file ${train_file} \
--validation_file ${validation_file} \
--per_device_train_batch_size 6 \
--per_device_eval_batch_size 6 \
--gradient_accumulation_steps 1 \
--num_train_epochs 5 \
--model_max_length ${cutoff_len} \
--save_strategy "steps" \
--save_total_limit 3 \
--learning_rate 3e-4 \
--weight_decay 0.00001 \
--warmup_ratio 0.01 \
--lr_scheduler_type "cosine" \
--logging_steps 10 \
--evaluation_strategy "steps" \
--seed 1234 \
--gradient_checkpointing \
--cache_dir ${cache_dir} \
--output_dir ${output_dir} \
--overwrite_output_dir \
> ${log_dir}/train.log 2>&1 &
Parameters
- use_lora: Training using LoRA
- use_int8_training: Training with 8-bit quantization, which significantly reduces memory usage
- lora_config: The parameter configuration for LoRA is provided. If training a Bloom model, it should be changed to "
configs/lora_config_bloom.json
" - deepspeed When training sequences are long, it is recommended to utilize deepspeed stage 3, which effectively distributes model parameters across multiple cards, allowing room to load even longer sequences
Note: Please be aware that you can only choose between "use_int8_training
" and "deepspeed
"; they cannot be used simultaneously.
The structure of the output_dir:
output_dir/
├── checkpoint-244/
│ ├── pytorch_model.bin
│ └── trainer_state.json
├── checkpoint-527/
│ ├── pytorch_model.bin
│ └── trainer_state.json
├── adapter_model.bin
├── print_log.txt
└── adapter_config.json
The highest-level directory stores the final model obtained from the training process.
If you wish to merge the weights of LoRA with a pre-trained model, you can execute the following command:
model_name_or_path=model_path_to/llama-2-7b-chat-T/
lora_path=lora_path_to/checkpoint_2/3739
output_path=out_path_to/CRA__model_2/model_3739
CUDA_VISIBLE_DEVICES=0 python src/merge_llama_with_lora.py \
--model_name_or_path ${model_name_or_path} \
--output_path ${output_path} \
--lora_path ${lora_path} \
--llama
The merged weights will be saved in the "output_path
" directory. You can subsequently load them directly using "from_pretrained
".