-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
common: llama_load_model_from_url using --model-url #6098
Conversation
@ggerganov Georgi, if you approve the proposal with libcurl dependency I can continue further to support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this can work. The curl
dependency is optional, the implementation is isolated in common
. Looks great
Supporting split GGUF files (for models above 50GB) would be nice too. Ideally you'd merge files with |
Yes I am working on this in a separate branch: |
Someone can help with the error in the windows build ?
Also @ggerganov please double check my |
I submitted a PR #6101 to fix this issue. |
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
# Conflicts: # common/common.cpp
…o be coherent with the make toolchain
Need to fix the windows server CI tests, it's not passing at the moment |
…RLSSLOPT_NATIVE_CA
…e global curl function, use a write callback.
It finally works on both windows and linux CI tests. Happy to merge it when CI passes |
@ggerganov Georgi, I suffered a little on Windows, but this implementation works on both platform. Would you please do another review as I changed a little bit the logic - and sorry but still need to find a good linter for CLion. |
try: | ||
psutil.Process(pid).kill() | ||
except psutil.NoSuchProcess: | ||
return False | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phymbert This calls TerminateProcess(handle, SIGTERM)
on Windows and os.kill(pid, signal.SIGKILL)
on Unix. os.kill(pid, signal.SIGTERM)
also calls TerminateProcess on Windows. psutil really seems like overkill here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again,feel free to issue a PR
* common: llama_load_model_from_url with libcurl dependency Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* common: llama_load_model_from_url with libcurl dependency Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Motivation
Since GGUF is officially supported on huggingface it can be useful to start the cli directly from a http url.
We considered this at some point #4735, but it opens a lot of security concerns due to the core library having to execute commands.
So we settled with #5501 which should be good enough for most cases, but when there is no
shell
available it cannot work (example docker distroless container).Changes
llama_load_model_from_url
to first download the file if it does not exist locally or the remote is newer, then callllama_load_model_from_file
incommon
--model-url
which will trigger the download to--model
inllama_init_from_gpt_params
-DLLAMA_CURL=ON
in cmake and make toolchainsembeddings.feature
Attention points
common
will be dynamic linked to libcurl.llama
API if there is positive feedback from the community.${model_path}.etag
or${model_path}.lastModified
files along with themodel_path
Task
etag
andlast-modified
http headersReferences