Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions vllm/model_executor/layers/rotary_embedding.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
"""Rotary Positional Embeddings."""
import functools
import math
from typing import Any, Dict, List, Optional, Tuple, Union

Expand Down Expand Up @@ -1404,6 +1405,7 @@ def get_next_input_positions(
]

@staticmethod
@functools.lru_cache(maxsize=1024)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason why we set this to 1024?

Copy link
Contributor

@imkero imkero May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason why we set this to 1024?

I guess this is expected to be large enough to catch as much output mrope positions as possible.

Copy link
Contributor

@imkero imkero May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason why we set this to 1024?

I guess this is expected to be large enough to catch as much output mrope positions as possible.

In fact we are caching a lot of 3x1 tensors like [[32],[32],[32]], [[33],[33],[33]]

I will take a closer look if we can have a better way to do this.

def get_next_input_positions_tensor(
mrope_position_delta: int,
context_len: int,
Expand Down