Skip to content

Conversation

@ksmusz
Copy link
Contributor

@ksmusz ksmusz commented Sep 16, 2025

Introducing dynamic swap buckets to defragmenter, together with defragmenter warmup.

Currently only a maximum of 32 blocks can be swapped of one iteration of a defragmenter. This change introduces a bucketing system, which asserts the minimal size bucket of swaps to be done in current defragmenter iteration based on actual number of blocks, that need to be swapped. Size of the buckets range from 8 swaps up to 512 swaps in a single defragmenter run.

As the number of possible swap buckets grew from a single size bucket, a warmup of defragmenter has been added. Thanks to the warmup, no additional graph compilations connected to the defragmenter were visible during the inference.

Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>
Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>
Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>
@ksmusz ksmusz marked this pull request as ready for review September 17, 2025 14:13
@ksmusz
Copy link
Contributor Author

ksmusz commented Sep 18, 2025

/run-gaudi-tests

@ksmusz
Copy link
Contributor Author

ksmusz commented Sep 18, 2025

/run-gaudi-tests

@ksmusz
Copy link
Contributor Author

ksmusz commented Sep 19, 2025

/run-gaudi-tests

mswiniarsk added a commit that referenced this pull request Sep 19, 2025
Introducing dynamic swap buckets to defragmenter, together with
defragmenter warmup.

Currently only a maximum of 32 blocks can be swapped of one iteration of
a defragmenter. This change introduces a bucketing system, which asserts
the minimal size bucket of swaps to be done in current defragmenter
iteration based on actual number of blocks, that need to be swapped.
Size of the buckets range from 8 swaps up to 512 swaps in a single
defragmenter run.

As the number of possible swap buckets grew from a single size bucket, a
warmup of defragmenter has been added. Thanks to the warmup, no
additional graph compilations connected to the defragmenter were visible
during the inference.

cherry-pick #183

Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>
Co-authored-by: Marcin Swiniarski <marcin.swiniarski@intel.com>
@ksmusz
Copy link
Contributor Author

ksmusz commented Sep 19, 2025

/run-gaudi-tests

2 similar comments
@ksmusz
Copy link
Contributor Author

ksmusz commented Sep 22, 2025

/run-gaudi-tests

@ksmusz
Copy link
Contributor Author

ksmusz commented Sep 23, 2025

/run-gaudi-tests

@ksmusz
Copy link
Contributor Author

ksmusz commented Sep 23, 2025

/run-gaudi-tests

@ksmusz ksmusz requested a review from vivekgoe as a code owner September 24, 2025 06:25
@ksmusz
Copy link
Contributor Author

ksmusz commented Sep 24, 2025

/run-gaudi-tests

1 similar comment
@ksmusz
Copy link
Contributor Author

ksmusz commented Sep 24, 2025

/run-gaudi-tests

@mswiniarsk mswiniarsk merged commit 60808d7 into vllm-project:main Sep 25, 2025
9 checks passed
iboiko-habana pushed a commit to iboiko-habana/vllm-gaudi that referenced this pull request Oct 2, 2025
Introducing dynamic swap buckets to defragmenter, together with
defragmenter warmup.

Currently only a maximum of 32 blocks can be swapped of one iteration of
a defragmenter. This change introduces a bucketing system, which asserts
the minimal size bucket of swaps to be done in current defragmenter
iteration based on actual number of blocks, that need to be swapped.
Size of the buckets range from 8 swaps up to 512 swaps in a single
defragmenter run.

As the number of possible swap buckets grew from a single size bucket, a
warmup of defragmenter has been added. Thanks to the warmup, no
additional graph compilations connected to the defragmenter were visible
during the inference.

---------

Signed-off-by: Krzysztof Smusz <ksmusz@habana.ai>
Co-authored-by: Marcin Swiniarski <marcin.swiniarski@intel.com>
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants