Skip to content

Latest commit

 

History

History
168 lines (124 loc) · 8.13 KB

README.md

File metadata and controls

168 lines (124 loc) · 8.13 KB

Platypus: Quick, Cheap, and Powerful Refinement of LLMs (https://platypus-llm.github.io)

Platypus

The Platypus models are a series of fine-tuned and merged variants based on the LLaMA and LLaMa-2 transformer architectures. Platypus takes advantage of LoRA and PEFT.

All models and dataset available via HuggingFace: garage-bAInd

Updates

8/21/23: If you're fine-tuning LLaMa-2 7B, please add bf16=True and change fp16=False in the HF trainer. LLaMa-1 7B works as is. This only applies to LLaMa-2 7B. Additionally, if you are using 1 GPU, please change ddp_find_unused_paramters=False in the HF trainer. We will be updating the fine-tuning script to handle these changes automatically.

8/14/23: We have cleaned up our pipeline and added data refinement and similarity code. Within in the next few days we'll have a script to reproduce our exact dataset from 11 open-source datasets.

8/13/23: An unquantized GPU chatbot of OpenOrca-Platypus2-13B, our most recent collab, is available via Hugging Face spaces, courtesy of OpenOrca: Chat now!

Platypus

8/11/23: Our paper and project website have been released!

CLI

Fastchat provides a simple setup for those interested in running the model. After downloading the model through HuggingFace, clone the Fastchat repository:

git clone https://github.com/lm-sys/FastChat.git
cd FastChat

Download the required packages:

pip3 install --upgrade pip  # enable PEP 660 support
pip3 install -e .

Finally, run the following:

python3 -m fastchat.serve.cli --model-path garage-bAInd/Platypus-30B --conv_template alpaca

Local Setup

This repository is multi-GPU friendly, and provides code to use model or data parellelism, depending on your computational resources.

  1. Install dependencies

    pip install -r requirements.txt
  2. Be sure to use these exact requirements or you may run into model saving or OOM issues.

Fine-tuning (finetune.py)

Run fine-tuning.sh.

Note: The script above uses torchrun for data parallelism. PyTorch is not in requirements.txt since technically you can run fine-tuning without it (after a few minor changes to the .py file). To use fine-tuning.sh, please install PyTorch. We recommend using torchrun and PyTorch 2.0+ for speed + torch.compile. If you do not install pytorch, or use an alternative method like accelerate launch, please take time to comment out any torch related lines in the scirpts.

Hyperparameters used to fine-tune Platypus:

Hyperparameter Value 13B / 70B
learning rate 4e-4 / 3e-4
batch size 16
microbatch size 1
warmup steps 100
epochs 1
weight decay 0.
lr scheduler cosine
lora alpha 16
lora rank 16
lora dropout 0.05
lora target modules gate_proj, up_proj, down_proj
cutoff length 4096
train on inputs False
group by length False
add eos token False

Example for how to calcualte gradient accumulation steps using 2 GPUs: = global_batch_size / micro_batch_size / num_gpus = 16 / 1 / 2 = 8.

If your model cannot fit on the memory of each GPU, please use the alternative fine-tuning option below (or utilize accelerate, FDSP, etc.) to take advantage of model parallelism. A good alternative to torchrun is accelerate.

python finetune.py \
    --base_model meta-llama/Llama-2-70b-hf \
    --data-path ./final_data.json \
    --output_dir ./llama2-platypus-70b \
    --batch_size 16 \
    --micro_batch_size 1 \
    --num_epochs 1 \
    --learning_rate 0.0003 \
    --cutoff_len 4096 \
    --val_set_size 0 \
    --lora_r 16 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --lora_target_modules '[gate_proj, down_proj, up_proj]' \
    --train_on_inputs False \
    --add_eos_token False \
    --group_by_length False \
    --prompt_template_name alpaca \
    --lr_scheduler 'cosine' \
    --warmup_steps 100

Merging

Once you've completed a fine-tuning, use merge.sh to merge the LoRA weights back into the base LLaMa model (or base model of your choice) for export to HuggingFace format.

While we are experimenting on better and alternative ways to merge (stay tuned!), our current merging process relies on the basic linear merge provided by PEFT. Before we fine-tune, we search for possible models to merge with and the datasets used to create them (to the best of our ability). The success of our LoRA merges stems from using the right data. Our most successful merges have little to no overlap in fine-tuning data. For example, GPlatty-30B is a merge of Platypus-30B and gpt4-alpaca-lora-30b. We saw a 2% jump in accuracy for GPlatty, and the datasets used to fine-tune the aforementioned two LoRA-based models had very low similarity scores. Please see our paper for additional information.

NOTE: If you encounter any errors while merging, please try uninstalling bitsandbytes and peft, then reinstalling with the newest versions (peft should always be installed from source).

Dataset Refinement

We used keyword search to find STEM and logic questions in the 11 open-source datasets that make up Open-Platypus. Then, to remove duplicates and redundancy, we perform a cosine similarity check of the questions using SentenceTransformers embeddings. Lastly, we do a similarity check to remove any questions from our training set that are too similiar to the test set.

You can access all of the related code in the data_pipeline folder of this repo.

Reproducing Benchmark Eval Results

Install LM Evaluation Harness:

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
git checkout b281b0921b636bc36ad05c0b0b0763bd6dd43463 # The commit used by the Open LLM Leaderboard
pip install -e .

Each task was evaluated on a single A100 80GB GPU for 13B, and 2 A100s for 70B.

ARC:

python main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus-13B,use_accelerate=True --tasks arc_challenge --batch_size 2 --no_cache --write_out --output_path results/Platypus-13B/arc_challenge_25shot.json --device cuda --num_fewshot 25

HellaSwag:

python main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus-13B,use_accelerate=True --tasks hellaswag --batch_size 2 --no_cache --write_out --output_path results/Platypus-13B/hellaswag_10shot.json --device cuda --num_fewshot 10

MMLU:

python main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus-13B,use_accelerate=True --tasks hendrycksTest-* --batch_size 2 --no_cache --write_out --output_path results/Platypus-13B/mmlu_5shot.json --device cuda --num_fewshot 5

TruthfulQA:

python main.py --model hf-causal-experimental --model_args pretrained=garage-bAInd/Platypus-13B,use_accelerate=True --tasks truthfulqa_mc --batch_size 2 --no_cache --write_out --output_path results/Platypus-13B/truthfulqa_0shot.json --device cuda

Inference for Adapters (inference.py)

This a basic example script for running inference directly using fine-tuned adapters and/or local data. The current version reads data from a csv file. You can easily edit this to pull from HF or use a json file. Please make any necessary edits before using this script (it assumes alpaca formatting).

BibTeX

@article{platypus2023,
    title={Platypus: Quick, Cheap, and Powerful Refinement of LLMs}, 
    author={Ariel N. Lee and Cole J. Hunter and Nataniel Ruiz},
    booktitle={arXiv preprint arxiv:2308.07317},
    year={2023}
}