Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: llama_load_model_from_url split support #6192

Merged
merged 15 commits into from
Mar 23, 2024

Conversation

phymbert
Copy link
Collaborator

@phymbert phymbert commented Mar 21, 2024

Context

Since we can load model split in #6187, this change allows to load a model from an url composed of GGUFs split.

Changes

  • password is hidden in URL if present
  • if the first downloaded file contains split metadata, it triggers downloading of additional GGUFs shard.
  • fix llama_split_prefix strncpy includes string termination
  • fix header names must not be case sensitive
  • support HF params in server
  • add server tests for split and HF params, it closes server: add tests with --split and --model-url #6223

Example:

server \ 
    --hf-repo phymbert/models \ 
    --hf-file ggml-model-q4_0-split-00001-of-00006.gguf \ 
    --model models/phi-2-00001-of-00006.gguf \ 
    --log-format text
logs

llama_download_file: downloading from https://huggingface.co/phymbert/models/resolve/main/ggml-model-q4_0-split-00001-of-00006.gguf to models/phi-2-00001-of-00006.gguf (server_etag:"efa3b09b3237f9f7e787318ffdeda19b-22", server_last_modified:Thu, 21 Mar 2024 07:20:40 GMT)...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1180  100  1180    0     0  11069      0 --:--:-- --:--:-- --:--:-- 11069
100  325M  100  325M    0     0  77.0M      0  0:00:04  0:00:04 --:--:-- 80.1M
llama_download_file: file etag saved models/phi-2-00001-of-00006.gguf.etag: "efa3b09b3237f9f7e787318ffdeda19b-22"
llama_download_file: file last modified saved models/phi-2-00001-of-00006.gguf.lastModified: Thu, 21 Mar 2024 07:20:40 GMT
llama_download_file: downloading from https://huggingface.co/phymbert/models/resolve/main/ggml-model-q4_0-split-00004-of-00006.gguf to models/phi-2-00004-of-00006.gguf (server_etag:"ab60eea3e28dde79ba54afd3182ff67a-17", server_last_modified:Thu, 21 Mar 2024 07:20:40 GMT)...
llama_download_file: downloading from https://huggingface.co/phymbert/models/resolve/main/ggml-model-q4_0-split-00006-of-00006.gguf to models/phi-2-00006-of-00006.gguf (server_etag:"1d87d7907283b29bb89cac624462e115-3", server_last_modified:Thu, 21 Mar 2024 07:20:40 GMT)...
llama_download_file: downloading from https://huggingface.co/phymbert/models/resolve/main/ggml-model-q4_0-split-00003-of-00006.gguf to models/phi-2-00003-of-00006.gguf (server_etag:"809c7fe772e320d4bc33c58bba273712-19", server_last_modified:Thu, 21 Mar 2024 07:20:40 GMT)...
llama_download_file: downloading from https://huggingface.co/phymbert/models/resolve/main/ggml-model-q4_0-split-00002-of-00006.gguf to models/phi-2-00002-of-00006.gguf (server_etag:"ca5d12e3bf4275fa5e5b62a0081baca0-18", server_last_modified:Thu, 21 Mar 2024 07:20:40 GMT)...
llama_download_file: downloading from https://huggingface.co/phymbert/models/resolve/main/ggml-model-q4_0-split-00005-of-00006.gguf to models/phi-2-00005-of-00006.gguf (server_etag:"edfc023521950a3c794f24b7282c931b-25", server_last_modified:Thu, 21 Mar 2024 07:20:40 GMT)...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1184  100  1184    0     0  10995      0 --:--:-- --:--:-- --:--:-- 10995
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1184  100  1184    0     0   9578      0 --:--:-- --:--:-- --:--:--  9578
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1186  100  1186    0     0   9532      0 --:--:-- --:--:-- --:--:--  9532
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1180  100  1180    0     0  11169      0 --:--:-- --:--:-- --:--:-- 11169
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1182  100  1182    0     0  11280      0 --:--:-- --:--:-- --:--:-- 11280
100 31.6M  100 31.6M    0     0  16.6M      0  0:00:01  0:00:01 --:--:-- 18.1M
llama_download_file: file etag saved models/phi-2-00006-of-00006.gguf.etag: "1d87d7907283b29bb89cac624462e115-3"
llama_download_file: file last modified saved models/phi-2-00006-of-00006.gguf.lastModified: Thu, 21 Mar 2024 07:20:40 GMT
100  253M  100  253M    0     0  19.6M      0  0:00:12  0:00:12 --:--:-- 14.4M
llama_download_file: file etag saved models/phi-2-00004-of-00006.gguf.etag: "ab60eea3e28dde79ba54afd3182ff67a-17"
llama_download_file: file last modified saved models/phi-2-00004-of-00006.gguf.lastModified: Thu, 21 Mar 2024 07:20:40 GMT
100  281M  100  281M    0     0  20.1M      0  0:00:13  0:00:13 --:--:-- 23.5M
llama_download_file: file etag saved models/phi-2-00003-of-00006.gguf.etag: "809c7fe772e320d4bc33c58bba273712-19"
llama_download_file: file last modified saved models/phi-2-00003-of-00006.gguf.lastModified: Thu, 21 Mar 2024 07:20:40 GMT
100  267M  100  267M    0     0  18.8M      0  0:00:14  0:00:14 --:--:-- 22.2M
llama_download_file: file etag saved models/phi-2-00002-of-00006.gguf.etag: "ca5d12e3bf4275fa5e5b62a0081baca0-18"
llama_download_file: file last modified saved models/phi-2-00002-of-00006.gguf.lastModified: Thu, 21 Mar 2024 07:20:40 GMT
100  367M  100  367M    0     0  24.5M      0  0:00:14  0:00:14 --:--:-- 39.0M
llama_download_file: file etag saved models/phi-2-00005-of-00006.gguf.etag: "edfc023521950a3c794f24b7282c931b-25"
llama_download_file: file last modified saved models/phi-2-00005-of-00006.gguf.lastModified: Thu, 21 Mar 2024 07:20:40 GMT
llama_model_loader: additional 5 GGUFs metadata loaded.

@phymbert phymbert requested a review from ggerganov March 21, 2024 07:42
@phymbert phymbert force-pushed the hp/split/load-model-from-url branch from 3c68a4d to a4a6d95 Compare March 22, 2024 07:45
Base automatically changed from hp/split/load-model to master March 22, 2024 18:00
@phymbert phymbert marked this pull request as draft March 22, 2024 20:36
…tion

common: llama_load_model_from_url:
 - fix header name case sensitive
 - support downloading additional split in parallel
 - hide password in url
@phymbert phymbert force-pushed the hp/split/load-model-from-url branch from a4a6d95 to ddb13ed Compare March 23, 2024 07:56
@phymbert phymbert requested review from slaren, ngxson and ggerganov and removed request for ggerganov March 23, 2024 07:58
@phymbert phymbert marked this pull request as ready for review March 23, 2024 08:01

// Wait for all downloads to complete
for(auto &f : futures_download) {
if(!f.get()) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here if one download fails, other will continue in background. I think it's acceptable for the first version ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think, in worst case user can re-run to re-download the missing part. That reminds me about how docker download multiple layers.

In the future we can allow developer to modify this behavior, we can add a new llama_download_params to specify:

  • stop_immediate_on_fail ==> true to stop if one part fails
  • n_parallel ==> maximum number of downloads can run in parallel
  • force_redownload ==> skip checking etag

@@ -1844,14 +1855,14 @@ struct llama_model * llama_load_model_from_url(
curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 0L);
Copy link
Collaborator Author

@phymbert phymbert Mar 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here progression of each split will be flushed to stderr concurrently, it can be improved later on, the actual final result is good enough for a first version.

}

curl_easy_cleanup(curl);

if (n_split > 1) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current gguf-split implementation, the first file includes tensor data. Probably in future the first split should only contain metadata to trigger the tensor data download in // earlier.

@phymbert phymbert changed the title common: llama_load_model_from_url support split common: llama_load_model_from_url split support Mar 23, 2024
common/common.cpp Outdated Show resolved Hide resolved
@phymbert
Copy link
Collaborator Author

@ggerganov Ubuntu Release server tests are disabled on PR, I think it was not the intention in ?

https://github.com/ggerganov/llama.cpp/pull/6128/files#diff-d8795f257f69c6bfe09365f90ea91930899b6a62aa2eee7f32d480863a8fddc9L34

Can I enable it back ?

@ggerganov
Copy link
Owner

Can I enable it back ?

Yes

@arki05 arki05 mentioned this pull request Mar 23, 2024
common/common.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
common/common.cpp Outdated Show resolved Hide resolved
phymbert and others added 3 commits March 23, 2024 17:51
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
@phymbert phymbert merged commit f482bb2 into master Mar 23, 2024
39 of 57 checks passed
@phymbert phymbert deleted the hp/split/load-model-from-url branch March 23, 2024 17:07
@phymbert phymbert restored the hp/split/load-model-from-url branch March 23, 2024 17:24
@phymbert phymbert deleted the hp/split/load-model-from-url branch March 23, 2024 18:07
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* llama: llama_split_prefix fix strncpy does not include string termination
common: llama_load_model_from_url:
 - fix header name case sensitive
 - support downloading additional split in parallel
 - hide password in url

* common: EOL EOF

* common: remove redundant LLAMA_CURL_MAX_PATH_LENGTH definition

* common: change max url max length

* common: minor comment

* server: support HF URL options

* llama: llama_model_loader fix log

* common: use a constant for max url length

* common: clean up curl if file cannot be loaded in gguf

* server: tests: add split tests, and HF options params

* common: move llama_download_hide_password_in_url inside llama_download_file as a lambda

* server: tests: enable back Release test on PR

* spacing

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* spacing

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* spacing

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 3, 2024
* llama: llama_split_prefix fix strncpy does not include string termination
common: llama_load_model_from_url:
 - fix header name case sensitive
 - support downloading additional split in parallel
 - hide password in url

* common: EOL EOF

* common: remove redundant LLAMA_CURL_MAX_PATH_LENGTH definition

* common: change max url max length

* common: minor comment

* server: support HF URL options

* llama: llama_model_loader fix log

* common: use a constant for max url length

* common: clean up curl if file cannot be loaded in gguf

* server: tests: add split tests, and HF options params

* common: move llama_download_hide_password_in_url inside llama_download_file as a lambda

* server: tests: enable back Release test on PR

* spacing

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* spacing

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* spacing

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
tybalex pushed a commit to rubra-ai/tools.cpp that referenced this pull request Apr 17, 2024
* llama: llama_split_prefix fix strncpy does not include string termination
common: llama_load_model_from_url:
 - fix header name case sensitive
 - support downloading additional split in parallel
 - hide password in url

* common: EOL EOF

* common: remove redundant LLAMA_CURL_MAX_PATH_LENGTH definition

* common: change max url max length

* common: minor comment

* server: support HF URL options

* llama: llama_model_loader fix log

* common: use a constant for max url length

* common: clean up curl if file cannot be loaded in gguf

* server: tests: add split tests, and HF options params

* common: move llama_download_hide_password_in_url inside llama_download_file as a lambda

* server: tests: enable back Release test on PR

* spacing

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* spacing

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* spacing

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

server: add tests with --split and --model-url
3 participants