Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama.from_pretrained should work with HF_HUB_OFFLINE=1 #1801

Open
davidgilbertson opened this issue Oct 16, 2024 · 0 comments
Open

Llama.from_pretrained should work with HF_HUB_OFFLINE=1 #1801

davidgilbertson opened this issue Oct 16, 2024 · 0 comments

Comments

@davidgilbertson
Copy link

Is your feature request related to a problem? Please describe.
Even with a model downloaded, the package attempts a call to HF HUB, which increases the load time.

From a quick scan of the logic here, it seems that the code just wants to check that the filename provided is in the repo provided.

Describe the solution you'd like
If you skipped that check and just assumed that the file existed and called hf_hub_download, that function would handle the case of errors if it couldn't find the file in the given repo.

The error may not be quite as focused, but init would run in a third the time.

On my machine:

  • loading from cache takes 400ms
  • loading from cache with this additional check of available files in the repo takes 1,200ms

Describe alternatives you've considered
The workaround is to use from_pretrained to download the appropriate file (if I want to do it all in Python), then get the cached file location and pass that as model_path to Llama without using from_pretrained.

Additional context
For work with HF models, I have HF_HUB_OFFLINE=1 set by default, only turning it off when I need a new model (because a few HF operations like to make checks for model info that require network requests, even with cache primed). It would be great if this was compatible with llama-cpp-python.

Side note: I just started using this today and was delighted with how easy it was to install, with CUDA support, from a single pip command. Nice work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant