Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for long-contexts with LoRA #385

Closed
wants to merge 6 commits into from

Conversation

SanjuCSudhakaran
Copy link

This PR does the following

  1. Enable long-context support for LoRA
  2. Handle RoPE changes requires for LoRA. The corresponding vllm-hpu-extension changes can be found https://github.com/HabanaAI/vllm-hpu-extension

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
Copy link

@vivekgoe vivekgoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SanjuCSudhakaran added review comments, please check.

self.model.embedding_modules,
self.model.embedding_padding_modules)
self.model.embedding_padding_modules,
max_position_embeddings=self.model.config.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we are not initializing it exactly as it is being done in upstream code? Refer to
https://github.com/vllm-project/vllm/blob/dbfa8d31d5e7627a84671c6068ecc8fa58acd1d1/vllm/worker/model_runner.py#L1083

long_lora_offsets = torch.zeros(len(index_mapping_indices),
device=get_device(),
dtype=torch.long)
long_lora_offsets_list = []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow notation, add type List[int]


long_lora_offsets = torch.tensor(long_lora_offsets_list,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W.r.t. functionality this change is ok for other backends (e.g. CUDA) also. @kzawora-intel is it ok to keep it? or do we need to make it HPU specific to be more upstream(ing) friendly?

key,
offsets=self.punica_wrapper.long_lora_indices,
)
offsets = self.punica_wrapper.long_lora_indices.reshape_as(positions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change ok for other backends (e.g. CUDA)?

return (type(source_layer) is LinearScalingRotaryEmbedding
or type(source_layer) is RotaryEmbedding)
if current_platform.is_hpu():
return (type(source_layer) is HpuLinearScalingRotaryEmbedding

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any functional impact of using default code?

@SanjuCSudhakaran
Copy link
Author

These changes are no longer required since the latest vllm-fork already refactored RoPE to handle this.

PR #404

@SanjuCSudhakaran SanjuCSudhakaran deleted the lora-long-context branch October 28, 2024 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants