Add NTK-Aware interpolation "by parts" correction #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds the new and improved "by parts" correction to the NTK-aware interpolation method.
This corrected method improves from previous methods fourfold:
scale
parameter should be used the same as linear interpolation. (eg. scale=2 is 2048 base ctx extended to 4096)extrapolation_factor
andntk_factor
are used for validation purposes, and should not be changed unless it is necessary.Edit:
AlsoFixed, addedmax_position_embeddings
is assumed to be the original pretrained model context size! Leave it at 2048 for LLaMA models, changing it will break the code...original_max_position_embeddings
parameter to avoid any confusionComparison of new corrected NTK-Aware method to previous non-corrected NTK-Aware method. Note the new
scale
factor is still calledalpha
in this graph.Now all is left is to validate this by finetuning!