[WIP] [Speculative Decoding] Support draft model on different tensor-parallel size than target model (Extended) #5856

wooyeonlee0 · 2024-06-26T05:45:45Z

This PR is to support spec_draft_tp larger than 1.
spec_draft_tp=1 has been enabled in #5414.

Note: To test the code in CI, I removed unrelated test cases temporarily.

What I've done in this PR:

Changed config.py to allow tp values larger than 1 but not than the target model's tp value.
Added a test case of spec_draft_tp=2 in test_integration_dist_tp4.py

Resolves #4632 further

zifeitong · 2024-06-27T00:15:51Z

The timeout seems to be ray related:

[2024-06-26T07:36:39Z]                 if self.engine_use_ray:
[2024-06-26T07:36:39Z] >                   await self.engine.add_request.remote(  # type: ignore
[2024-06-26T07:36:39Z]                         **new_request)
[2024-06-26T07:36:39Z] E                       asyncio.exceptions.CancelledError

I'm trying to get e2e tests running using mp backend, but it's not so straightforward.

wooyeonlee0 · 2024-06-27T01:37:41Z

The timeout seems to be ray related:

[2024-06-26T07:36:39Z]                 if self.engine_use_ray:
[2024-06-26T07:36:39Z] >                   await self.engine.add_request.remote(  # type: ignore
[2024-06-26T07:36:39Z]                         **new_request)
[2024-06-26T07:36:39Z] E                       asyncio.exceptions.CancelledError

I'm trying to get e2e tests running using mp backend, but it's not so straightforward.

These e2e spec decode tests seem to be performed on the ray backend, as they are using AsyncLLM.
I thought it was using the mp backend because it's the default option.
Please let me know when you get some more information. Thank you!

cadedaniel · 2024-07-08T06:43:03Z

What is the latest status? is there a hang in CI? btw I suspect it's due to non-driver workers being different from driver worker (but I haven't looked).

wooyeonlee0 · 2024-07-11T06:32:36Z

What is the latest status? is there a hang in CI? btw I suspect it's due to non-driver workers being different from driver worker (but I haven't looked).

Sorry, I've been rather busy with other stuff.
Now I can take another look at this issue. I'll try to handle it this week ~ next week.

Regarding the question about CI, I tried to debug the problem by running distributed tests in CI, but I found that it requires much time (4hrs) for distributed tests to launch, because of resource queueing I guess. (though I removed all other tests for the faster result)

I'll work on it some more and share my findings.

Thanks! 👍

cadedaniel · 2024-07-15T18:27:44Z

No problem. you should also follow this PR #6032. it adds SPMD worker which removes the need to communicate from rank0 to other ranks

cadedaniel · 2024-07-19T01:51:29Z

Take a look at this also @wooyeonlee0 #6556

wooyeonlee0 force-pushed the spec-draft-tp-gt-1 branch from 26db458 to 2338d52 Compare June 26, 2024 05:56

njhill added the speculative-decoding label Jun 26, 2024

wooyeonlee0 force-pushed the spec-draft-tp-gt-1 branch from 2338d52 to e584fcd Compare July 11, 2024 07:41

wooyeonlee0 mentioned this pull request Jul 12, 2024

[Bug]: Test_skip_speculation fails in distributed execution #5814

Open

wooyeonlee0 force-pushed the spec-draft-tp-gt-1 branch from e584fcd to e62cf05 Compare July 15, 2024 09:59

wooyeonlee0 added 3 commits July 18, 2024 17:51

allow draft tp gt 1

f5bfe9c

add draft_spec_tp=2 test case

8375a15

yapf

7dab316

wooyeonlee0 force-pushed the spec-draft-tp-gt-1 branch from e62cf05 to 7dab316 Compare July 18, 2024 09:01

njhill mentioned this pull request Oct 15, 2024

[BugFix] Update draft model TP size check to allow matching target TP size #9394

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [Speculative Decoding] Support draft model on different tensor-parallel size than target model (Extended) #5856

[WIP] [Speculative Decoding] Support draft model on different tensor-parallel size than target model (Extended) #5856

wooyeonlee0 commented Jun 26, 2024

zifeitong commented Jun 27, 2024 •

edited

Loading

wooyeonlee0 commented Jun 27, 2024

cadedaniel commented Jul 8, 2024

wooyeonlee0 commented Jul 11, 2024

cadedaniel commented Jul 15, 2024

cadedaniel commented Jul 19, 2024

[WIP] [Speculative Decoding] Support draft model on different tensor-parallel size than target model (Extended) #5856

Are you sure you want to change the base?

[WIP] [Speculative Decoding] Support draft model on different tensor-parallel size than target model (Extended) #5856

Conversation

wooyeonlee0 commented Jun 26, 2024

zifeitong commented Jun 27, 2024 • edited Loading

wooyeonlee0 commented Jun 27, 2024

cadedaniel commented Jul 8, 2024

wooyeonlee0 commented Jul 11, 2024

cadedaniel commented Jul 15, 2024

cadedaniel commented Jul 19, 2024

zifeitong commented Jun 27, 2024 •

edited

Loading