Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert-pth-to-ggml.py failed with RuntimeError #35

Closed
KevinXuxuxu opened this issue Mar 12, 2023 · 7 comments
Closed

convert-pth-to-ggml.py failed with RuntimeError #35

KevinXuxuxu opened this issue Mar 12, 2023 · 7 comments
Labels
model Model specific

Comments

@KevinXuxuxu
Copy link

Hi there, I downloaded my LLaMa weights through bit-torrent, and tried to convert the 7B model to ggml FP16 format:

$python convert-pth-to-ggml.py models/7B/ 1 
normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.
{'dim': 4096, 'multiple_of': 256, 'n_heads': 32, 'n_layers': 32, 'norm_eps': 1e-06, 'vocab_size': 32000}
n_parts =  1
Processing part  0
Traceback (most recent call last):
  File "/Users/fzxu/Documents/code/llama.cpp/convert-pth-to-ggml.py", line 89, in <module>
    model = torch.load(fname_model, map_location="cpu")
  File "/opt/anaconda3/envs/llama.cpp/lib/python3.10/site-packages/torch/serialization.py", line 712, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/opt/anaconda3/envs/llama.cpp/lib/python3.10/site-packages/torch/serialization.py", line 1049, in _load
    result = unpickler.load()
  File "/opt/anaconda3/envs/llama.cpp/lib/python3.10/site-packages/torch/serialization.py", line 1019, in persistent_load
    load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "/opt/anaconda3/envs/llama.cpp/lib/python3.10/site-packages/torch/serialization.py", line 997, in load_tensor
    storage = zip_file.get_storage_from_record(name, numel, torch._UntypedStorage).storage()._untyped()
RuntimeError: PytorchStreamReader failed reading file data/27: invalid header or archive is corrupted

Does this mean my downloaded version of model weights is corrupted? Or am I missing something?
I have filed request to Meta and hopefully I can try again with data from official download source.

@G2G2G2G
Copy link

G2G2G2G commented Mar 12, 2023

what is "data/27" file, that is within your models/7B folder? you downloaded the wrong thing

@KevinXuxuxu
Copy link
Author

KevinXuxuxu commented Mar 12, 2023

Here's the file structure of my downloaded model:

$ ls ./models 
7B                      tokenizer.model         tokenizer_checklist.chk
$ ls ./models/7B
checklist.chk       consolidated.00.pth params.json

There isn't a directory called data and this looks normal to me.
As for the data/27 file, it seems to be some file structure within the pth file which seems to be zipped (making some guess by checking the pytorch serialization code: https://github.com/pytorch/pytorch/blob/master/torch/serialization.py#L1112)

@prusnak
Copy link
Collaborator

prusnak commented Mar 12, 2023

@KevinXuxuxu Can you post the hashes of the downloaded files?

on Linux:

sha256sum ./models/7B/*

on macOS:

shasum -a 256 ./models/7B/*

My hashes are:

7935c843a25ae265d60bf4543b90bfd91c4911b728412b5c1d5cff42a3cd5645  ./models/7B/checklist.chk
700df0d3013b703a806d2ae7f1bfb8e59814e3d06ae78be0c66368a50059f33d  ./models/7B/consolidated.00.pth
7e89e242ddc0dd6f060b43ca219ce8b3e8f08959a72cb3c0855df8bb04d46265  ./models/7B/params.json

@KevinXuxuxu
Copy link
Author

@prusnak Thanks for providing the shasum for my validation!

$ shasum -a 256 ./models/7B/*
7935c843a25ae265d60bf4543b90bfd91c4911b728412b5c1d5cff42a3cd5645  ./models/7B/checklist.chk
008cfbd68936367b15a311494c8c8259c4902dbb461896ae767084372cdfa3fc  ./models/7B/consolidated.00.pth
7e89e242ddc0dd6f060b43ca219ce8b3e8f08959a72cb3c0855df8bb04d46265  ./models/7B/params.json

Indeed my consolidated.00.pth file is somewhat corrupted. May I ask how you get the data? From official Meta download or bit-torrent?
Closing this comment while I try to get a correct version of the model weights.

@prettydeep
Copy link

@prusnak Can you provide hashes for the 13B files?

@KevinXuxuxu
Copy link
Author

For anyone who has doubt about their data, try using https://github.com/cocktailpeanut/dalai which has the weights downloaded for you, and they seem to come from reliable source.

@gjmulder gjmulder added the model Model specific label Mar 15, 2023
@tanishhshahh
Copy link

tanishhshahh commented Mar 17, 2023

Here's the file structure of my downloaded model:

$ ls ./models 
7B                      tokenizer.model         tokenizer_checklist.chk
$ ls ./models/7B
checklist.chk       consolidated.00.pth params.json

There isn't a directory called data and this looks normal to me. As for the data/27 file, it seems to be some file structure within the pth file which seems to be zipped (making some guess by checking the pytorch serialization code: https://github.com/pytorch/pytorch/blob/master/torch/serialization.py#L1112)

Can you please provide a link to download the LLaMA files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model Model specific
Projects
None yet
Development

No branches or pull requests

6 participants