Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add a model that is not in vllm #567

Open
shizhediao opened this issue Aug 4, 2024 · 5 comments
Open

How to add a model that is not in vllm #567

shizhediao opened this issue Aug 4, 2024 · 5 comments

Comments

@shizhediao
Copy link

Hi,

I am evaluating a new model which is not in vllm. How can I generate responses with this model since I find vllm is the only way for generate()?

Thank you!

@shizhediao
Copy link
Author

BTW, I am evaluating on the BFCL. I am a bit confused about the difference between eval and BFCL

@HuanzhiMao
Copy link
Collaborator

HuanzhiMao commented Aug 4, 2024

Everything with BFCL is inside the berkeley-function-call-leaderboard folder. The eval folder is for Gorilla and RAFT papers, which has nothing to do with BFCL. Sorry about the confusion.

cc @ShishirPatil we need to give them a better name

@HuanzhiMao
Copy link
Collaborator

Hi,

I am evaluating a new model which is not in vllm. How can I generate responses with this model since I find vllm is the only way for generate()?

Thank you!

We only support vllm for model response generation. If you want to use something other than vllm, you need to create a new model_handler class and overwrite the default inference() method.

@shizhediao
Copy link
Author

shizhediao commented Aug 5, 2024

Hi Huanzhi,

Thank you so much for your prompt reply! Do you think it is feasible to overwrite the default inference() method by ourselves without vllm. Are there any potential issues? Will inference speed be unacceptable?

Thank you!

@HuanzhiMao
Copy link
Collaborator

The inference() method (code here) takes in the dataset questions, process them (self.process_input) and call the self._batch_generate to get the response. _batch_generate takes in the finalized/modified questions and setup vllm to do the response generation process (code here). So if you want to replace vllm, you only need to change the _batch_generate function. It should be fairly straight-forward to do that.
We don't care about the inference speed. As long as all the entries are eventually generated, then it's fine. It's just that, if you want to have that model up on the leaderboard, the cost and latency section for that model would be N/A (cause it's not using vllm)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants