Skip to content

Commit

Permalink
Fix drop_long_seq bug due to truncation in prompt tokenization stra…
Browse files Browse the repository at this point in the history
…tegies when using `chat_template` (#1867)
  • Loading branch information
chiwanpark authored Aug 26, 2024
1 parent 6819c12 commit 2dac1ed
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion src/axolotl/prompt_strategies/chat_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,8 @@ def load(tokenizer, cfg, ds_cfg: Optional[Dict[str, Any]] = None):
),
"roles": ds_cfg.get("roles"),
"drop_system_message": ds_cfg.get("drop_system_message", False),
"max_length": cfg.sequence_len,
# we need to add one for detecting sequences with exceeding the `sequence_len` limit.
"max_length": cfg.sequence_len + 1,
}

strategy_params = {
Expand Down

0 comments on commit 2dac1ed

Please sign in to comment.