-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
anyone tried batch inference? #20
Comments
padding_side="left" do the trick |
I am getting the following error when trying batched inference. Did you need any trick?
|
@benob You could do something like below def evaluate(instructions, input=None):
prompts = [generate_prompt(instructions) for instruction in instructions]
encodings = tokenizer(prompts, return_tensors="pt", padding=True).to('cuda')
# input_ids = inputs["input_ids"].cuda()
generation_outputs = model.generate(
**encodings,
generation_config=generation_config,
max_new_tokens=256
)
returns tokenizer.batch_decode(generation_outputs) |
created the gradio app for this. https://github.com/deep-diver/Alpaca-LoRA-Serve |
Thanks, the problem came from elsewhere. Note that I had to use
|
Hello, I have tried batch decoding. And I set |
I had the same, using batch decoding and beam search with multiple beams. |
Hello @deep-diver , I tried batch decoding according to your settings, which is very helpful for performance. But I found a strange phenomenon. Suppose you have four pieces of content, and the results you generate for them are different from those you batch decode them at once.I asked detailed questions in the huggingface discussion area. I'll copy him here later.
(tensor([[ 1, 9508]], device='cuda:0'),
(tensor([[ 1, 9508],
Output exceeds the size limit. Open the full output data in a text editor
" promptly and efficiently.\nThe Company shall not be liable to the Customer for any loss or damage suffered by the Customer as a result of any delay in the delivery of the Goods (even if caused by the Company's negligence) unless the Customer has given written notice to the Company of the delay within 7 days of the date when the Goods were due to be delivered.\nThe Company shall not be liable to the Customer for any loss or damage suffered by the Customer as a result of any delay in the delivery of the Goods (even if caused by the Company's negligence) unless the"
Output exceeds the size limit. Open the full output data in a text editor
[' promptly and efficiently.\nThe Company is committed to ensuring that there is no modern slavery or human trafficking in its supply chains or in any part of its business. The Company recognises that it has a responsibility to be proactive in ensuring that modern slavery is not taking place within its business or in its supply chains.\nThe Company is committed to ensuring that there is no modern slavery or human trafficking in its supply chains or in any part of its business.\nThe Company is committed to ensuring that there is no modern slavery or human trafficking in', |
I was using that few days ago and it was working fine. But now when generating with batch_size > 1, I get this error:
Anyone had the same error and know how to fix? (I suspect some version update in peft or transformers library) |
mark, meet same issue in my side... |
when I set pad token 0 and padding=True,
the generated text for the padded prompt shows always
The text was updated successfully, but these errors were encountered: