`Llama.from_pretrained` should work with `HF_HUB_OFFLINE=1`

**Is your feature request related to a problem? Please describe.**
Even with a model downloaded, the package attempts a call to HF HUB, which increases the load time.

From a quick scan of the logic [here](https://github.com/abetlen/llama-cpp-python/blob/7c4aead82d349469bbbe7d8c0f4678825873c039/llama_cpp/llama.py#L2268-L2299), it seems that the code just wants to check that the filename provided is in the repo provided. 


**Describe the solution you'd like**
If you skipped that check and just assumed that the file existed and called `hf_hub_download`, that function would handle the case of errors if it couldn't find the file in the given repo.

The error may not be quite as focused, but init would run in a third the time.

On my machine:
 - loading from cache takes 400ms
 - loading from cache with this additional check of available files in the repo takes 1,200ms


**Describe alternatives you've considered**
The workaround is to use `from_pretrained` to download the appropriate file (if I want to do it all in Python), then get the cached file location and pass that as `model_path` to `Llama` without using `from_pretrained`.

**Additional context**
For work with HF models, I have `HF_HUB_OFFLINE=1` set by default, only turning it off when I need a new model (because a few HF operations like to make checks for model info that require network requests, even with cache primed). It would be great if this was compatible with `llama-cpp-python`.

Side note: I just started using this today and was delighted with how easy it was to install, with CUDA support, from a single pip command. Nice work.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`Llama.from_pretrained` should work with `HF_HUB_OFFLINE=1` #1801

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Llama.from_pretrained should work with HF_HUB_OFFLINE=1 #1801

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`Llama.from_pretrained` should work with `HF_HUB_OFFLINE=1` #1801