Skip to content

Commit

Permalink
Fix: Higher vram usage for mistral and sample_packing (axolotl-ai-clo…
Browse files Browse the repository at this point in the history
…ud#691)

* Fix: Higher vram usage for mistral and sample_packing

* chore: update comment

* chore: lint
  • Loading branch information
NanoCode012 authored Oct 6, 2023
1 parent 0e47dd5 commit afef662
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 5 deletions.
8 changes: 4 additions & 4 deletions examples/mistral/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,10 @@ lora_target_modules:
- k_proj
- o_proj

wandb_project:
wandb_entity:
wandb_project:
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
Expand Down Expand Up @@ -76,4 +76,4 @@ fsdp_config:
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"
unk_token: "<unk>"
3 changes: 2 additions & 1 deletion src/axolotl/utils/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,8 @@ def load_tokenizer(cfg):
tokenizer.add_special_tokens({"pad_token": "[PAD]"})
os.environ["TOKENIZERS_PARALLELISM"] = "false"

if cfg.is_mistral_derived_model:
# Mistral's official FA implementation requires left padding
if cfg.is_mistral_derived_model and cfg.flash_attention and not cfg.sample_packing:
tokenizer.padding_side = "left"

if cfg.special_tokens:
Expand Down

0 comments on commit afef662

Please sign in to comment.