Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I set load_in_8bit=true, some errors occurred.... #606

Open
hychaochao opened this issue Nov 17, 2023 · 0 comments
Open

When I set load_in_8bit=true, some errors occurred.... #606

hychaochao opened this issue Nov 17, 2023 · 0 comments

Comments

@hychaochao
Copy link

Whether in generate or finetune, once I set load_in_8bit=true, it cannot be generated normally. The model will output a bunch of question marks, just like the picture below:
image
I printed out its vector as shown in the picture
image
It looks like it wasn't generated properly at all, but when I set it load_in_8bit=false, it can be generated and fine-tuned normally.

I have installed bitsandbytes and accelerate correctly, and no errors will be reported during testing. I've been stuck on this problem for a week, so I wanted to ask for help, thank you! !
Below is my generate.py code

from peft import PeftModel
from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig
tokenizer = LlamaTokenizer.from_pretrained("llama1")
model = LlamaForCausalLM.from_pretrained(
    "llama1",
    load_in_8bit = True,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, "tloen/alpaca-lora")
def alpaca_talk(text):
    inputs = tokenizer(
        text,
        return_tensors="pt",
    )
    input_ids = inputs["input_ids"].cuda()
    generation_config = GenerationConfig(
        temperature=0.9,
        top_p=0.75,
    )
    print("Generating...")
    generation_output = model.generate(
        input_ids=input_ids,
        generation_config=generation_config,
        return_dict_in_generate=True,
        output_scores=True,
        max_new_tokens=256,
    )
    for s in generation_output.sequences:
        print(tokenizer.decode(s))

for input_text in [
    """Below is an instruction that describes a task. Write a response that appropriately completes the request.
    ### Instruction:
    What steps should I ....?
    ### Response:
    """
]:
    alpaca_talk(input_text)`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant