Support for long-contexts with LoRA #385

SanjuCSudhakaran · 2024-10-11T13:20:13Z

This PR does the following

Enable long-context support for LoRA
Handle RoPE changes requires for LoRA. The corresponding vllm-hpu-extension changes can be found https://github.com/HabanaAI/vllm-hpu-extension

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>

vivekgoe

@SanjuCSudhakaran added review comments, please check.

vivekgoe · 2024-10-17T08:12:07Z

vllm/worker/hpu_model_runner.py

                    self.model.embedding_modules,
-                    self.model.embedding_padding_modules)
+                    self.model.embedding_padding_modules,
+                    max_position_embeddings=self.model.config.


Why we are not initializing it exactly as it is being done in upstream code? Refer to
https://github.com/vllm-project/vllm/blob/dbfa8d31d5e7627a84671c6068ecc8fa58acd1d1/vllm/worker/model_runner.py#L1083

vivekgoe · 2024-10-17T09:18:38Z

vllm/lora/punica.py

-        long_lora_offsets = torch.zeros(len(index_mapping_indices),
-                                        device=get_device(),
-                                        dtype=torch.long)
+    long_lora_offsets_list = []


follow notation, add type List[int]

vivekgoe · 2024-10-17T09:21:59Z

vllm/lora/punica.py


+    long_lora_offsets = torch.tensor(long_lora_offsets_list,


W.r.t. functionality this change is ok for other backends (e.g. CUDA) also. @kzawora-intel is it ok to keep it? or do we need to make it HPU specific to be more upstream(ing) friendly?

vivekgoe · 2024-10-17T09:23:39Z

vllm/lora/layers.py

-            key,
-            offsets=self.punica_wrapper.long_lora_indices,
-        )
+        offsets = self.punica_wrapper.long_lora_indices.reshape_as(positions)


Is this change ok for other backends (e.g. CUDA)?

vivekgoe · 2024-10-17T09:24:43Z

vllm/lora/layers.py

-        return (type(source_layer) is LinearScalingRotaryEmbedding
-                or type(source_layer) is RotaryEmbedding)
+        if current_platform.is_hpu():
+            return (type(source_layer) is HpuLinearScalingRotaryEmbedding


Is there any functional impact of using default code?

SanjuCSudhakaran · 2024-10-25T05:06:27Z

These changes are no longer required since the latest vllm-fork already refactored RoPE to handle this.

PR #404

SanjuCSudhakaran and others added 5 commits October 16, 2024 12:25

Support long contexts with LoRA

ae40450

Add workaround for bug in multimodal check

4df797a

Disable HPUGraphs for long-context tests with LoRA

0778324

supports lora-long-context tests

0baa2b2

supports tests of long context lora

e19f189

SanjuCSudhakaran force-pushed the lora-long-context branch from 752fff5 to e19f189 Compare October 16, 2024 09:32

Handle long contexts using HpuLinearScalingRotaryEmbedding

b796b8c

Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>

vivekgoe requested changes Oct 17, 2024

View reviewed changes

SanjuCSudhakaran closed this Oct 25, 2024

SanjuCSudhakaran deleted the lora-long-context branch October 28, 2024 10:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for long-contexts with LoRA #385

Support for long-contexts with LoRA #385

SanjuCSudhakaran commented Oct 11, 2024

vivekgoe left a comment

vivekgoe Oct 17, 2024

vivekgoe Oct 17, 2024

vivekgoe Oct 17, 2024

vivekgoe Oct 17, 2024

vivekgoe Oct 17, 2024

SanjuCSudhakaran commented Oct 25, 2024

Support for long-contexts with LoRA #385

Support for long-contexts with LoRA #385

Conversation

SanjuCSudhakaran commented Oct 11, 2024

vivekgoe left a comment

Choose a reason for hiding this comment

vivekgoe Oct 17, 2024

Choose a reason for hiding this comment

vivekgoe Oct 17, 2024

Choose a reason for hiding this comment

vivekgoe Oct 17, 2024

Choose a reason for hiding this comment

vivekgoe Oct 17, 2024

Choose a reason for hiding this comment

vivekgoe Oct 17, 2024

Choose a reason for hiding this comment

SanjuCSudhakaran commented Oct 25, 2024