Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add rope scaling #343

Merged
merged 2 commits into from
Aug 12, 2023

Conversation

NanoCode012
Copy link
Collaborator

@NanoCode012 NanoCode012 commented Aug 5, 2023

Closes #342

Note:

  • This only applies it to llama arch. It should also support neox. Should I add specific if condition for this?

  • It was stated on docs to NOT increase the max_position_embeddings ourselves for this. Let's say, if we want 8k, just change the rope factor.

  • We leave validation to transformer.

  • Test

Copy link
Collaborator

@winglian winglian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@winglian
Copy link
Collaborator

winglian commented Aug 6, 2023

  • It was stated on docs to NOT increase the max_position_embeddings ourselves for this. Let's say, if we want 8k, just change the rope factor.

has everyone else been doing it incorrectly? https://huggingface.co/togethercomputer/LLaMA-2-7B-32K/blob/main/config.json#L14

@NanoCode012
Copy link
Collaborator Author

NanoCode012 commented Aug 6, 2023

  • It was stated on docs to NOT increase the max_position_embeddings ourselves for this. Let's say, if we want 8k, just change the rope factor.

has everyone else been doing it incorrectly? https://huggingface.co/togethercomputer/LLaMA-2-7B-32K/blob/main/config.json#L14

It could be seen here on docs to not set it ourselves: https://github.com/gante/transformers/blob/30409af6e1b2b5efb6d9932b3e3b4ce20cfdb30e/src/transformers/models/llama/configuration_llama.py#L80-L87

At the same time, we also change this if the seq_len > the model's context, so I'm not sure if we should add a validation config check for that to make sure it does not happen.

Regarding the together's model, they have their own custom code for modelling_llama, since they require trust_remote_code. It's not exactly the same, so I can't say..

@NanoCode012
Copy link
Collaborator Author

This brought up another question in my mind, what about dataset packing? Do we just pass a 4k dataset to a "16k" dataset? If we do not set 16k seq_len, the dataset would be at most 4k? Do we need to add a condition to Not change max_pos_embedding if rope_scaling is true?

@NanoCode012
Copy link
Collaborator Author

Tested that it runs:

  • with rope scaling
  • with rope scaling , type: burp . Error as expected
  • without rope scaling

@NanoCode012 NanoCode012 marked this pull request as ready for review August 11, 2023 17:49
@NanoCode012 NanoCode012 merged commit b521206 into axolotl-ai-cloud:main Aug 12, 2023
3 checks passed
@NanoCode012 NanoCode012 deleted the feat/rope_scaling branch August 12, 2023 15:50
mkeoliya pushed a commit to mkeoliya/axolotl that referenced this pull request Dec 15, 2023
* Feat: Add rope scaling

* fix: move rope config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Add config for RoPE scaling
2 participants