Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Speculative Decoding] MLPSpeculator Tensor Parallel support (1/2) #6050

Merged
merged 5 commits into from
Jul 2, 2024

Commits on Jul 1, 2024

  1. [Speculative Decoding] MLPSpeculator Tensor Parallel support (1/2)

    This PR adds support for a draft worker with TP==1 and a target worker
    with TP>1. Support for draft worker>1 will come in a 2nd PR.
    
    This PR makes use of vllm-project#5414 to wrap the `MLPSpeculatorWorker` with a
    `SmallerTPProposerWorker`.
    
    Adds a test case for
    `ibm-granite/granite-3b-code-instruct{-accelerator}` to `test_draft_model_tp_lt_target_model_tp2`.
    sirejdua-db committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    18d2861 View commit details
    Browse the repository at this point in the history
  2. Fix format.sh issues

    sirejdua-db committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    aa867d0 View commit details
    Browse the repository at this point in the history
  3. format.sh

    sirejdua-db committed Jul 1, 2024
    Configuration menu
    Copy the full SHA
    0e6eb04 View commit details
    Browse the repository at this point in the history

Commits on Jul 2, 2024

  1. fix test config

    sirejdua-db committed Jul 2, 2024
    Configuration menu
    Copy the full SHA
    68afacf View commit details
    Browse the repository at this point in the history
  2. remove erroneous change to format.sh

    Sirej Dua authored and Sirej Dua committed Jul 2, 2024
    Configuration menu
    Copy the full SHA
    ac269e9 View commit details
    Browse the repository at this point in the history