Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: llama_load_model_from_url using --model-url #6098

Merged
merged 53 commits into from
Mar 17, 2024

Conversation

phymbert
Copy link
Collaborator

@phymbert phymbert commented Mar 16, 2024

Motivation

Since GGUF is officially supported on huggingface it can be useful to start the cli directly from a http url.

We considered this at some point #4735, but it opens a lot of security concerns due to the core library having to execute commands.
So we settled with #5501 which should be good enough for most cases, but when there is no shell available it cannot work (example docker distroless container).

Changes

  • llama_load_model_from_url to first download the file if it does not exist locally or the remote is newer, then call llama_load_model_from_file in common
  • introduce --model-url which will trigger the download to --model in llama_init_from_gpt_params
  • enable this feature if -DLLAMA_CURL=ON in cmake and make toolchains
  • server tests using this parameter in embeddings.feature

Attention points

  • The common will be dynamic linked to libcurl.
  • It can later be moved to llama API if there is positive feedback from the community.
  • it saved ${model_path}.etag or ${model_path}.lastModified files along with the model_path
  • It's not possible to pass a custom CA, system CAs are always used in this first implementation on both posix and windows platform

Task

  • download is triggered only if the file has changed based on etag and last-modified http headers
  • CI build is passing
  • Server CI build is passing

References

@phymbert phymbert added demo Demonstrate some concept or idea, not intended to be merged need feedback Testing and feedback with results are needed labels Mar 16, 2024
@phymbert phymbert requested a review from ggerganov March 16, 2024 11:35
@phymbert
Copy link
Collaborator Author

@ggerganov Georgi, if you approve the proposal with libcurl dependency I can continue further to support Etag headers to avoid downloading the file each time.

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this can work. The curl dependency is optional, the implementation is isolated in common. Looks great

common/common.cpp Outdated Show resolved Hide resolved
@Artefact2
Copy link
Collaborator

Supporting split GGUF files (for models above 50GB) would be nice too. Ideally you'd merge files with copy_file_range() when available to avoid extra disk I/O.

@phymbert
Copy link
Collaborator Author

Supporting split GGUF files (for models above 50GB) would be nice too. Ideally you'd merge files with copy_file_range() when available to avoid extra disk I/O.

Yes I am working on this in a separate branch:

@phymbert
Copy link
Collaborator Author

Someone can help with the error in the windows build ?

D:\a\llama.cpp\llama.cpp\common\common.cpp(906,52): error C1061: compiler limit: blocks nested too deeply [D:\a\llama.cpp\llama.cpp\build\common\common.vcxproj]

Also @ggerganov please double check my Makefile changes as I am not used to.

@dranger003
Copy link
Contributor

Someone can help with the error in the windows build ?

D:\a\llama.cpp\llama.cpp\common\common.cpp(906,52): error C1061: compiler limit: blocks nested too deeply [D:\a\llama.cpp\llama.cpp\build\common\common.vcxproj]

Also @ggerganov please double check my Makefile changes as I am not used to.

I submitted a PR #6101 to fix this issue.
If you are in a hurry, you can try this workaround #6096

Makefile Outdated Show resolved Hide resolved
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
@phymbert
Copy link
Collaborator Author

I submitted a PR #6101 to fix this issue. If you are in a hurry, you can try this workaround #6096

Thanks a lot, no let's wait for your PR to be merged first.

@phymbert phymbert changed the title proposal: common: llama_load_model_from_url common: llama_load_model_from_url Mar 16, 2024
@phymbert phymbert removed demo Demonstrate some concept or idea, not intended to be merged need feedback Testing and feedback with results are needed labels Mar 16, 2024
@phymbert phymbert changed the title common: llama_load_model_from_url common: llama_load_model_from_url using --model-url Mar 16, 2024
examples/server/README.md Outdated Show resolved Hide resolved
common/common.cpp Outdated Show resolved Hide resolved
common/common.cpp Outdated Show resolved Hide resolved
common/common.cpp Outdated Show resolved Hide resolved
common/common.cpp Outdated Show resolved Hide resolved
common/common.cpp Outdated Show resolved Hide resolved
common/common.cpp Outdated Show resolved Hide resolved
common/common.cpp Outdated Show resolved Hide resolved
common/common.cpp Outdated Show resolved Hide resolved
examples/main/README.md Outdated Show resolved Hide resolved
@phymbert
Copy link
Collaborator Author

Need to fix the windows server CI tests, it's not passing at the moment

@phymbert
Copy link
Collaborator Author

It finally works on both windows and linux CI tests. Happy to merge it when CI passes

@phymbert
Copy link
Collaborator Author

@ggerganov Georgi, I suffered a little on Windows, but this implementation works on both platform. Would you please do another review as I changed a little bit the logic - and sorry but still need to find a good linter for CLion.

@phymbert phymbert merged commit d01b3c4 into master Mar 17, 2024
54 of 63 checks passed
@phymbert phymbert deleted the hp/download-model-from-hf branch March 17, 2024 18:12
Comment on lines +79 to +83
try:
psutil.Process(pid).kill()
except psutil.NoSuchProcess:
return False
return True
Copy link
Collaborator

@cebtenzzre cebtenzzre Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phymbert This calls TerminateProcess(handle, SIGTERM) on Windows and os.kill(pid, signal.SIGKILL) on Unix. os.kill(pid, signal.SIGTERM) also calls TerminateProcess on Windows. psutil really seems like overkill here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again,feel free to issue a PR

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* common: llama_load_model_from_url with libcurl dependency

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024
* common: llama_load_model_from_url with libcurl dependency

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Download model gguf file from hf
7 participants