-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Optimize SPMD architecture with delta + serialization optimization #7109
Merged
Merged
Changes from 46 commits
Commits
Show all changes
48 commits
Select commit
Hold shift + click to select a range
d41f4c5
wip
rkooo567 5741a83
fix original arch issue
rkooo567 d31d73f
should work now.
rkooo567 36e786d
working
rkooo567 71e40c1
.
rkooo567 7e69242
pickle
rkooo567 0de9f23
msgpack optimization
rkooo567 64faf75
Merge branch 'main' into serialization-opt
rkooo567 de4e43e
ip
rkooo567 dc7c445
.
rkooo567 700e4a3
Merge branch 'main' into serialization-opt
rkooo567 a906a9d
msgspec migration done
rkooo567 4af6699
ip. preemption and chunked prefill not working yet.
rkooo567 1e6196b
working e2e
rkooo567 0ea6e41
Merge branch 'main-before-server' into spmd-and-pp
rkooo567 35e9637
working finally
rkooo567 912b88b
.
rkooo567 5bab192
working
rkooo567 eb2cb14
working
rkooo567 007fe86
fix a test failure.
rkooo567 ce64b8d
.
rkooo567 e8e29e1
fixed
rkooo567 751bdb1
addressed code review.
rkooo567 d91aa78
lint
rkooo567 06774d1
Merge branch 'main' into spmd-and-pp
rkooo567 1af8dc2
ip
rkooo567 6e6ac92
all working
rkooo567 fa0d077
lint
rkooo567 b5a88ec
done
rkooo567 d2e14ca
code review.
rkooo567 8be3c8e
addressed code review.
rkooo567 c42c6c5
Merge branch 'main' into spmd-and-pp
rkooo567 c55c8f6
Merge branch 'main' into spmd-and-pp
rkooo567 2ba99e2
lint fix
rkooo567 e2c850b
Merge branch 'main' into spmd-and-pp
rkooo567 41ec6d1
Merge branch 'main' into spmd-and-pp
rkooo567 925c928
fix lint
rkooo567 9d3dee5
Merge branch 'main' into spmd-and-pp
rkooo567 d041e9c
Addressed code review.
rkooo567 c4b3682
Merge branch 'main' into spmd-and-pp
rkooo567 f938e00
fix pydantic not compatible to msggspec.Struct.
rkooo567 32cb984
addressed
rkooo567 5a4f27e
Merge branch 'main' into spmd-and-pp
rkooo567 c921877
fixed
rkooo567 ae1fb21
temporarily use dataclass
rkooo567 c3abcc5
Merge branch 'main' into spmd-and-pp
rkooo567 3e1325e
Addressed code review.
rkooo567 652c258
lint
rkooo567 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
import msgspec | ||
|
||
from vllm.executor.msgspec_utils import decode_hook, encode_hook | ||
from vllm.sequence import ExecuteModelRequest | ||
|
||
from ..spec_decode.utils import create_batch | ||
|
||
|
||
def test_msgspec_serialization(): | ||
num_lookahead_slots = 4 | ||
seq_group_metadata_list, _, _ = create_batch(16, num_lookahead_slots) | ||
execute_model_req = ExecuteModelRequest( | ||
seq_group_metadata_list=seq_group_metadata_list, | ||
num_lookahead_slots=num_lookahead_slots, | ||
running_queue_size=4) | ||
|
||
encoder = msgspec.msgpack.Encoder(enc_hook=encode_hook) | ||
decoder = msgspec.msgpack.Decoder(ExecuteModelRequest, | ||
dec_hook=decode_hook) | ||
req = decoder.decode(encoder.encode(execute_model_req)) | ||
expected = execute_model_req.seq_group_metadata_list | ||
actual = req.seq_group_metadata_list | ||
assert (len(expected) == len(actual)) | ||
expected = expected[0] | ||
actual = actual[0] | ||
|
||
assert expected.block_tables == actual.block_tables | ||
assert expected.is_prompt == actual.is_prompt | ||
assert expected.request_id == actual.request_id | ||
assert (expected.seq_data[0].prompt_token_ids == | ||
actual.seq_data[0].prompt_token_ids) | ||
assert (expected.seq_data[0].output_token_ids == | ||
actual.seq_data[0].output_token_ids) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember I once tried this library, but its serialization scopt is quite limited. Some classes cannot be serialized via this library. Do you have this experience when use it in ray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we used this library for our internal fork before. For this PR, I have to implement custom reduce for
array
type. And Union of the same 2 types are not supported (e.g., Union[OrderedDict, Dict] kind of thing). But I think we can get around this much