Skip to content

Commit

Permalink
feat: add gemma7b support (#971)
Browse files Browse the repository at this point in the history
Co-authored-by: Evan Smothers <ebs@fb.com>
  • Loading branch information
Optimox and ebsmothers authored May 31, 2024
1 parent 0c4056a commit 135cf2e
Show file tree
Hide file tree
Showing 13 changed files with 429 additions and 6 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ torchtune currently supports the following models.
| [Llama2](https://llama.meta.com/llama2/) | 7B, 13B, 70B [[models](torchtune/models/llama2/_model_builders.py), [configs](recipes/configs/llama2/)] |
| [Code-Llama2](https://huggingface.co/codellama) | 7B, 13B, 70B [[model](torchtune/models/code_llama2/_model_builders.py), [configs](recipes/configs/code_llama2/)] |
| [Mistral](https://huggingface.co/mistralai) | 7B [[model](torchtune/models/mistral/_model_builders.py), [configs](recipes/configs/mistral/)] |
| [Gemma](https://huggingface.co/collections/google/gemma-release-65d5efbccdbb8c4202ec078b) | 2B [[model](torchtune/models/gemma/_model_builders.py), [configs](recipes/configs/gemma/)] |
| [Gemma](https://huggingface.co/collections/google/gemma-release-65d5efbccdbb8c4202ec078b) | 2B, 7B [[model](torchtune/models/gemma/_model_builders.py), [configs](recipes/configs/gemma/)] |
| [Microsoft Phi3](https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3) | Mini [[model](torchtune/models/phi3/), [configs](recipes/configs/phi3/)]

We'll be adding a number of new models in the coming weeks, including support for 70B versions and MoEs.
Expand Down
3 changes: 2 additions & 1 deletion docs/source/api_ref_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ Pre-trained models can be downloaded from the Hugging Face Hub with the followin
gemma
-----

All models from the `Gemma family <https://blog.google/technology/developers/gemma-open-models/>`_.
Models of size 2B and 7B from the `Gemma family <https://blog.google/technology/developers/gemma-open-models/>`_.

Pre-trained models can be downloaded from the Hugging Face Hub with the following command:

Expand All @@ -104,3 +104,4 @@ Pre-trained models can be downloaded from the Hugging Face Hub with the followin
:nosignatures:

gemma.gemma_2b
gemma.gemma_7b
2 changes: 1 addition & 1 deletion recipes/configs/gemma/2B_qlora_single_device.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# this run:
# tune download google/gemma-2b --hf-token <HF_TOKEN> --output-dir /tmp/gemma --ignore-patterns ""
#
# To launch on 4 devices, run the following command from root:
# To launch on a single device, run the following command from root:
# tune run lora_finetune_single_device --config gemma/2B_qlora_single_device
#
# You can add specific overrides through the command line. For example
Expand Down
76 changes: 76 additions & 0 deletions recipes/configs/gemma/7B_full.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Config for multi-device full finetuning in full_finetune_distributed.py
# using a gemma 7B model
#
# This config assumes that you've run the following command before launching
# this run:
# tune download google/gemma-7b --hf-token <HF_TOKEN> --output-dir /tmp/gemma-7b --ignore-patterns "gemma-7b.gguf"
#
# To launch on 4 devices, run the following command from root:
# tune run --nnodes 1 --nproc_per_node 4 full_finetune_distributed --config gemma/7B_full
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training
# you can run:
# tune run --nnodes 1 --nproc_per_node 4 full_finetune_distributed --config gemma/7B_full checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works only when the model is being fine-tuned on 2+ GPUs.


# Tokenizer
tokenizer:
_component_: torchtune.models.gemma.gemma_tokenizer
path: /tmp/gemma-7b/tokenizer.model

# Dataset
dataset:
_component_: torchtune.datasets.alpaca_dataset
train_on_input: True
seed: null
shuffle: True

# Model Arguments
model:
_component_: torchtune.models.gemma.gemma_7b

checkpointer:
_component_: torchtune.utils.FullModelHFCheckpointer
checkpoint_dir: /tmp/gemma-7b/
checkpoint_files: [
model-00001-of-00004.safetensors,
model-00002-of-00004.safetensors,
model-00003-of-00004.safetensors,
model-00004-of-00004.safetensors,
]
recipe_checkpoint: null
output_dir: /tmp/gemma
model_type: GEMMA
resume_from_checkpoint: False

# Fine-tuning arguments
batch_size: 1
epochs: 1
optimizer:
_component_: torch.optim.AdamW
lr: 2e-5
loss:
_component_: torch.nn.CrossEntropyLoss
max_steps_per_epoch: null
gradient_accumulation_steps: 1

# Training env
device: cuda

# Memory management
enable_activation_checkpointing: True
memory_efficient_fsdp_wrap: False

# Reduced precision
dtype: bf16

# Logging
metric_logger:
_component_: torchtune.utils.metric_logging.DiskLogger
log_dir: ${output_dir}
output_dir: /tmp/alpaca-gemma-finetune
log_every_n_steps: 1
log_peak_memory_stats: False
85 changes: 85 additions & 0 deletions recipes/configs/gemma/7B_lora.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Config for multi-device LoRA finetuning in lora_finetune_distributed.py
# using a gemma 7B model
#
# This config assumes that you've run the following command before launching
# this run:
# tune download google/gemma-7b --hf-token <HF_TOKEN> --output-dir /tmp/gemma-7b --ignore-patterns "gemma-7b.gguf"
#
# To launch on 4 devices, run the following command from root:
# tune run --nnodes 1 --nproc_per_node 4 lora_finetune_distributed --config gemma/7B_lora
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training
# you can run:
# tune run --nnodes 1 --nproc_per_node 4 lora_finetune_distributed --config gemma/7B_lora checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works only when the model is being fine-tuned on 2+ GPUs.


# Tokenizer
tokenizer:
_component_: torchtune.models.gemma.gemma_tokenizer
path: /tmp/gemma-7b/tokenizer.model

# Dataset
dataset:
_component_: torchtune.datasets.alpaca_dataset
train_on_input: True
seed: null
shuffle: True

# Model Arguments
model:
_component_: torchtune.models.gemma.lora_gemma_7b
lora_attn_modules: ['q_proj', 'k_proj', 'v_proj']
apply_lora_to_mlp: True
lora_rank: 64
lora_alpha: 16

checkpointer:
_component_: torchtune.utils.FullModelHFCheckpointer
checkpoint_dir: /tmp/gemma-7b/
checkpoint_files: [
model-00001-of-00004.safetensors,
model-00002-of-00004.safetensors,
model-00003-of-00004.safetensors,
model-00004-of-00004.safetensors,
]
recipe_checkpoint: null
output_dir: /tmp/gemma-7b/
model_type: GEMMA
resume_from_checkpoint: False

optimizer:
_component_: torch.optim.AdamW
lr: 2e-5

lr_scheduler:
_component_: torchtune.modules.get_cosine_schedule_with_warmup
num_warmup_steps: 100

loss:
_component_: torch.nn.CrossEntropyLoss

# Fine-tuning arguments
batch_size: 4
epochs: 3
max_steps_per_epoch: null
gradient_accumulation_steps: 1

# Training env
device: cuda

# Memory management
enable_activation_checkpointing: True

# Reduced precision
dtype: bf16

# Logging
metric_logger:
_component_: torchtune.utils.metric_logging.DiskLogger
log_dir: ${output_dir}
output_dir: /tmp/alpaca-gemma-lora
log_every_n_steps: 1
log_peak_memory_stats: False
92 changes: 92 additions & 0 deletions recipes/configs/gemma/7B_lora_single_device.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Config for multi-device LoRA finetuning in lora_finetune_single_device.py
# using a gemma 7B model
#
# This config assumes that you've run the following command before launching
# this run (torchtune does not use gguf so you can ignore it to save time and space):
# tune download google/gemma-7b --hf-token <HF_TOKEN> --output-dir /tmp/gemma-7b --ignore-patterns "gemma-7b.gguf"
#
# To launch on a single device, run the following command from root:
# tune run lora_finetune_single_device --config gemma/7B_lora_single_device
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training
# you can run:
# tune run lora_finetune_single_device --config gemma/7B_lora_single_device checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works only for training on single device.

# Tokenizer
tokenizer:
_component_: torchtune.models.gemma.gemma_tokenizer
path: /tmp/gemma-7b/tokenizer.model

# Dataset
dataset:
_component_: torchtune.datasets.alpaca_dataset
train_on_input: True
seed: null
shuffle: True

# Model Arguments
model:
_component_: torchtune.models.gemma.lora_gemma_7b
lora_attn_modules: ['q_proj', 'k_proj', 'v_proj']
apply_lora_to_mlp: True
lora_rank: 8
lora_alpha: 16

checkpointer:
_component_: torchtune.utils.FullModelHFCheckpointer
checkpoint_dir: /tmp/gemma-7b/
checkpoint_files: [
model-00001-of-00004.safetensors,
model-00002-of-00004.safetensors,
model-00003-of-00004.safetensors,
model-00004-of-00004.safetensors,
]
recipe_checkpoint: null
output_dir: /tmp/gemma-7b/
model_type: GEMMA
resume_from_checkpoint: False

optimizer:
_component_: torch.optim.AdamW
lr: 5e-5

lr_scheduler:
_component_: torchtune.modules.get_cosine_schedule_with_warmup
num_warmup_steps: 100

loss:
_component_: torch.nn.CrossEntropyLoss

# Fine-tuning arguments
batch_size: 8
epochs: 1
max_steps_per_epoch: null
gradient_accumulation_steps: 2
compile: False

# Training env
device: cuda

# Memory management
enable_activation_checkpointing: True

# Reduced precision
dtype: bf16

# Logging
metric_logger:
_component_: torchtune.utils.metric_logging.DiskLogger
log_dir: ${output_dir}
output_dir: /tmp/alpaca-gemma-lora
log_every_n_steps: 1
log_peak_memory_stats: False

# Show case the usage of pytorch profiler
# Set enabled to False as it's only needed for debugging training
profiler:
_component_: torchtune.utils.profiler
enabled: False
output_dir: /tmp/alpaca-gemma-finetune/torchtune_perf_tracing.json
92 changes: 92 additions & 0 deletions recipes/configs/gemma/7B_qlora_single_device.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Config for multi-device QLoRA finetuning in lora_finetune_single_device.py
# using a gemma 7B model
#
# This config assumes that you've run the following command before launching
# this run:
# tune download google/gemma-7b --hf-token <HF_TOKEN> --output-dir /tmp/gemma-7b --ignore-patterns "gemma-7b.gguf"
#
# To launch on a single device, run the following command from root:
# tune run lora_finetune_single_device --config gemma/7B_qlora_single_device
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training
# you can run:
# tune run lora_finetune_single_device --config gemma/7B_qlora_single_device checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works only for training on single device.

# Tokenizer
tokenizer:
_component_: torchtune.models.gemma.gemma_tokenizer
path: /tmp/gemma-7b/tokenizer.model

# Dataset
dataset:
_component_: torchtune.datasets.alpaca_dataset
train_on_input: True
seed: null
shuffle: True

# Model Arguments
model:
_component_: torchtune.models.gemma.qlora_gemma_7b
lora_attn_modules: ['q_proj', 'k_proj', 'v_proj']
apply_lora_to_mlp: True
lora_rank: 64
lora_alpha: 16

checkpointer:
_component_: torchtune.utils.FullModelHFCheckpointer
checkpoint_dir: /tmp/gemma-7b/
checkpoint_files: [
model-00001-of-00004.safetensors,
model-00002-of-00004.safetensors,
model-00003-of-00004.safetensors,
model-00004-of-00004.safetensors,
]
recipe_checkpoint: null
output_dir: /tmp/gemma-7b/
model_type: GEMMA
resume_from_checkpoint: False

optimizer:
_component_: torch.optim.AdamW
lr: 2e-5

lr_scheduler:
_component_: torchtune.modules.get_cosine_schedule_with_warmup
num_warmup_steps: 100

loss:
_component_: torch.nn.CrossEntropyLoss

# Fine-tuning arguments
batch_size: 4
epochs: 3
max_steps_per_epoch: null
gradient_accumulation_steps: 4
compile: False

# Training env
device: cuda

# Memory management
enable_activation_checkpointing: True

# Reduced precision
dtype: bf16

# Logging
metric_logger:
_component_: torchtune.utils.metric_logging.DiskLogger
log_dir: ${output_dir}
output_dir: /tmp/alpaca-gemma-lora
log_every_n_steps: 1
log_peak_memory_stats: False

# Show case the usage of pytorch profiler
# Set enabled to False as it's only needed for debugging training
profiler:
_component_: torchtune.utils.profiler
enabled: False
output_dir: /tmp/alpaca-gemma-finetune/torchtune_perf_tracing.json
Loading

0 comments on commit 135cf2e

Please sign in to comment.