Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert_to_ds_params.py doesn't generate tokenizer #4

Open
tammypi opened this issue Jun 16, 2023 · 3 comments
Open

convert_to_ds_params.py doesn't generate tokenizer #4

tammypi opened this issue Jun 16, 2023 · 3 comments

Comments

@tammypi
Copy link

tammypi commented Jun 16, 2023

convert_to_ds_params.py only generates llama-7b folder and .pt files in it. But does not generate tokenizer.
But the param tokenizer_path of tokenize_dataset.py needs tokenizer.
So how can I get tokenizer?

@chaoyi-wu
Copy link
Owner

chaoyi-wu commented Jun 26, 2023

You can download tokenizer from here. Besides, it also provides the model files after operating convert_to_ds_params.py.

@jingyeyang95
Copy link

jingyeyang95 commented Jun 30, 2023

I had a similar issue as @tammypi when I tried to run finetune_pp_peft.py. The script only generates .pt files (e.g. layer_00-model_states.pt). Therefore, when I run
python finetune_pp_peft.py --model_path ../llama-7b/, it said no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory ../llama-7b/.

Alternatively, I could use src/transformers/models/llama/convert_llama_weights_to_hf.py to convert the model into hf format and run finetune_pp_peft.py without any problem. Do you think it's a good idea to use convert_llama_weights_to_hf.py in transformers package instead of your file? What is the difference? Thanks!

@chaoyi-wu
Copy link
Owner

chaoyi-wu commented Jul 5, 2023

Sorry for the mistake. I actually hope to mention convert_llama_weights_to_hf.py in this project but add convert_to_ds_params.py incorrectly. Thanks for your issue, I have fixed this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants