Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v0.2.2] Release Tracker #1551

Closed
3 tasks done
WoosukKwon opened this issue Nov 2, 2023 · 7 comments · Fixed by #1689
Closed
3 tasks done

[v0.2.2] Release Tracker #1551

WoosukKwon opened this issue Nov 2, 2023 · 7 comments · Fixed by #1689
Labels
release Related to new version release

Comments

@WoosukKwon
Copy link
Collaborator

WoosukKwon commented Nov 2, 2023

ETA: Nov 3rd (Fri) - Nov 6th (Mon). Nov 17th (Fri) - 19th (Sun).

Major changes

  • Extensive refactoring for better tensor parallelism & quantization support
  • Changes in scheduler: from 1D flattened input tensor to 2D tensor
  • Bump up to PyTorch v2.1 + CUDA 12.1
  • New models: Yi, ChatGLM, Phi
  • Added LogitsProcessor API
  • Preliminary support for SqueezeLLM

PRs to be merged before the release

@WoosukKwon
Copy link
Collaborator Author

@zhuohan123 @simon-mo Please feel free to add if you have any PR that should be merged for the next release.

@WoosukKwon WoosukKwon added the release Related to new version release label Nov 2, 2023
@WoosukKwon WoosukKwon pinned this issue Nov 2, 2023
@simon-mo
Copy link
Collaborator

simon-mo commented Nov 2, 2023

I can release the corresponding docker image as well! Hopefully we can get logits processor in as well?

@esmeetu
Copy link
Collaborator

esmeetu commented Nov 6, 2023

It seems that there is something wrong with batch processing at the current main branch.
When i open a api server that serving https://huggingface.co/WizardLM/WizardCoder-1B-V1.0 model, and testing humaneval by using batch requests(164 concurrent reqs) and greedy sampling, it gives me less than 10% on main branch whereas 0.2.1-post1 gives me 23.17%.
related issue: #1570
I have no idea about this and hope this will be addressed at the coming v0.2.2. Thanks!

@miko7879
Copy link

miko7879 commented Nov 6, 2023

Will v0.2.2 work with CUDA 11.8?

@esmeetu
Copy link
Collaborator

esmeetu commented Nov 7, 2023

It seems that there is something wrong with batch processing at the current main branch.
When i open a api server that serving https://huggingface.co/WizardLM/WizardCoder-1B-V1.0 model, and testing humaneval by using batch requests(164 concurrent reqs) and greedy sampling, it gives me less than 10% on main branch whereas 0.2.1-post1 gives me 23.17%.
related issue: #1570
I have no idea about this and hope this will be addressed at the coming v0.2.2. Thanks!

#1546 Fix this.

@shuaiwang2022
Copy link

When is the v0.2.2 version scheduled to be released?

@WoosukKwon
Copy link
Collaborator Author

@IncrementalLearning It's in the process. We are planning to release it asap. ETA is Nov 17th (Fri) - 19th (Sun).

@WoosukKwon WoosukKwon linked a pull request Nov 16, 2023 that will close this issue
@WoosukKwon WoosukKwon unpinned this issue Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release Related to new version release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants