common: llama_load_model_from_url split support #6192

phymbert · 2024-03-21T07:42:14Z

Context

Since we can load model split in #6187, this change allows to load a model from an url composed of GGUFs split.

Changes

password is hidden in URL if present
if the first downloaded file contains split metadata, it triggers downloading of additional GGUFs shard.
fix llama_split_prefix strncpy includes string termination
fix header names must not be case sensitive
support HF params in server
add server tests for split and HF params, it closes server: add tests with --split and --model-url #6223

Example:

server \ 
    --hf-repo phymbert/models \ 
    --hf-file ggml-model-q4_0-split-00001-of-00006.gguf \ 
    --model models/phi-2-00001-of-00006.gguf \ 
    --log-format text

logs


llama_download_file: downloading from https://huggingface.co/phymbert/models/resolve/main/ggml-model-q4_0-split-00001-of-00006.gguf to models/phi-2-00001-of-00006.gguf (server_etag:"efa3b09b3237f9f7e787318ffdeda19b-22", server_last_modified:Thu, 21 Mar 2024 07:20:40 GMT)...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1180  100  1180    0     0  11069      0 --:--:-- --:--:-- --:--:-- 11069
100  325M  100  325M    0     0  77.0M      0  0:00:04  0:00:04 --:--:-- 80.1M
llama_download_file: file etag saved models/phi-2-00001-of-00006.gguf.etag: "efa3b09b3237f9f7e787318ffdeda19b-22"
llama_download_file: file last modified saved models/phi-2-00001-of-00006.gguf.lastModified: Thu, 21 Mar 2024 07:20:40 GMT
llama_download_file: downloading from https://huggingface.co/phymbert/models/resolve/main/ggml-model-q4_0-split-00004-of-00006.gguf to models/phi-2-00004-of-00006.gguf (server_etag:"ab60eea3e28dde79ba54afd3182ff67a-17", server_last_modified:Thu, 21 Mar 2024 07:20:40 GMT)...
llama_download_file: downloading from https://huggingface.co/phymbert/models/resolve/main/ggml-model-q4_0-split-00006-of-00006.gguf to models/phi-2-00006-of-00006.gguf (server_etag:"1d87d7907283b29bb89cac624462e115-3", server_last_modified:Thu, 21 Mar 2024 07:20:40 GMT)...
llama_download_file: downloading from https://huggingface.co/phymbert/models/resolve/main/ggml-model-q4_0-split-00003-of-00006.gguf to models/phi-2-00003-of-00006.gguf (server_etag:"809c7fe772e320d4bc33c58bba273712-19", server_last_modified:Thu, 21 Mar 2024 07:20:40 GMT)...
llama_download_file: downloading from https://huggingface.co/phymbert/models/resolve/main/ggml-model-q4_0-split-00002-of-00006.gguf to models/phi-2-00002-of-00006.gguf (server_etag:"ca5d12e3bf4275fa5e5b62a0081baca0-18", server_last_modified:Thu, 21 Mar 2024 07:20:40 GMT)...
llama_download_file: downloading from https://huggingface.co/phymbert/models/resolve/main/ggml-model-q4_0-split-00005-of-00006.gguf to models/phi-2-00005-of-00006.gguf (server_etag:"edfc023521950a3c794f24b7282c931b-25", server_last_modified:Thu, 21 Mar 2024 07:20:40 GMT)...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1184  100  1184    0     0  10995      0 --:--:-- --:--:-- --:--:-- 10995
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1184  100  1184    0     0   9578      0 --:--:-- --:--:-- --:--:--  9578
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1186  100  1186    0     0   9532      0 --:--:-- --:--:-- --:--:--  9532
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1180  100  1180    0     0  11169      0 --:--:-- --:--:-- --:--:-- 11169
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1182  100  1182    0     0  11280      0 --:--:-- --:--:-- --:--:-- 11280
100 31.6M  100 31.6M    0     0  16.6M      0  0:00:01  0:00:01 --:--:-- 18.1M
llama_download_file: file etag saved models/phi-2-00006-of-00006.gguf.etag: "1d87d7907283b29bb89cac624462e115-3"
llama_download_file: file last modified saved models/phi-2-00006-of-00006.gguf.lastModified: Thu, 21 Mar 2024 07:20:40 GMT
100  253M  100  253M    0     0  19.6M      0  0:00:12  0:00:12 --:--:-- 14.4M
llama_download_file: file etag saved models/phi-2-00004-of-00006.gguf.etag: "ab60eea3e28dde79ba54afd3182ff67a-17"
llama_download_file: file last modified saved models/phi-2-00004-of-00006.gguf.lastModified: Thu, 21 Mar 2024 07:20:40 GMT
100  281M  100  281M    0     0  20.1M      0  0:00:13  0:00:13 --:--:-- 23.5M
llama_download_file: file etag saved models/phi-2-00003-of-00006.gguf.etag: "809c7fe772e320d4bc33c58bba273712-19"
llama_download_file: file last modified saved models/phi-2-00003-of-00006.gguf.lastModified: Thu, 21 Mar 2024 07:20:40 GMT
100  267M  100  267M    0     0  18.8M      0  0:00:14  0:00:14 --:--:-- 22.2M
llama_download_file: file etag saved models/phi-2-00002-of-00006.gguf.etag: "ca5d12e3bf4275fa5e5b62a0081baca0-18"
llama_download_file: file last modified saved models/phi-2-00002-of-00006.gguf.lastModified: Thu, 21 Mar 2024 07:20:40 GMT
100  367M  100  367M    0     0  24.5M      0  0:00:14  0:00:14 --:--:-- 39.0M
llama_download_file: file etag saved models/phi-2-00005-of-00006.gguf.etag: "edfc023521950a3c794f24b7282c931b-25"
llama_download_file: file last modified saved models/phi-2-00005-of-00006.gguf.lastModified: Thu, 21 Mar 2024 07:20:40 GMT
llama_model_loader: additional 5 GGUFs metadata loaded.

…tion common: llama_load_model_from_url: - fix header name case sensitive - support downloading additional split in parallel - hide password in url

phymbert · 2024-03-23T08:15:30Z

common/common.cpp

+
+        // Wait for all downloads to complete
+        for(auto &f : futures_download) {
+            if(!f.get()) {


Here if one download fails, other will continue in background. I think it's acceptable for the first version ?

Yes I think, in worst case user can re-run to re-download the missing part. That reminds me about how docker download multiple layers.

In the future we can allow developer to modify this behavior, we can add a new llama_download_params to specify:

stop_immediate_on_fail ==> true to stop if one part fails

n_parallel ==> maximum number of downloads can run in parallel

force_redownload ==> skip checking etag

phymbert · 2024-03-23T08:33:55Z

common/common.cpp

@@ -1844,14 +1855,14 @@ struct llama_model * llama_load_model_from_url(
        curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 0L);


Here progression of each split will be flushed to stderr concurrently, it can be improved later on, the actual final result is good enough for a first version.

common/common.cpp

phymbert · 2024-03-23T09:41:40Z

common/common.cpp

    }

    curl_easy_cleanup(curl);

+    if (n_split > 1) {


With the current gguf-split implementation, the first file includes tensor data. Probably in future the first split should only contain metadata to trigger the tensor data download in // earlier.

common/common.cpp

…d_file as a lambda

phymbert · 2024-03-23T12:30:27Z

@ggerganov Ubuntu Release server tests are disabled on PR, I think it was not the intention in ?

https://github.com/ggerganov/llama.cpp/pull/6128/files#diff-d8795f257f69c6bfe09365f90ea91930899b6a62aa2eee7f32d480863a8fddc9L34

Can I enable it back ?

ggerganov · 2024-03-23T16:16:48Z

Can I enable it back ?

Yes

common/common.cpp

llama.cpp

common/common.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* llama: llama_split_prefix fix strncpy does not include string termination common: llama_load_model_from_url: - fix header name case sensitive - support downloading additional split in parallel - hide password in url * common: EOL EOF * common: remove redundant LLAMA_CURL_MAX_PATH_LENGTH definition * common: change max url max length * common: minor comment * server: support HF URL options * llama: llama_model_loader fix log * common: use a constant for max url length * common: clean up curl if file cannot be loaded in gguf * server: tests: add split tests, and HF options params * common: move llama_download_hide_password_in_url inside llama_download_file as a lambda * server: tests: enable back Release test on PR * spacing Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * spacing Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * spacing Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

phymbert requested a review from ggerganov March 21, 2024 07:42

phymbert mentioned this pull request Mar 22, 2024

server: add tests with --split and --model-url #6223

Closed

phymbert force-pushed the hp/split/load-model-from-url branch from 3c68a4d to a4a6d95 Compare March 22, 2024 07:45

Base automatically changed from hp/split/load-model to master March 22, 2024 18:00

phymbert marked this pull request as draft March 22, 2024 20:36

phymbert mentioned this pull request Mar 23, 2024

llama_model_loader: support multiple split/shard GGUFs #6187

Merged

4 tasks

llama: llama_split_prefix fix strncpy does not include string termina…

ddb13ed

…tion common: llama_load_model_from_url: - fix header name case sensitive - support downloading additional split in parallel - hide password in url

phymbert force-pushed the hp/split/load-model-from-url branch from a4a6d95 to ddb13ed Compare March 23, 2024 07:56

phymbert requested review from slaren, ngxson and ggerganov and removed request for ggerganov March 23, 2024 07:58

common: EOL EOF

7c63644

phymbert marked this pull request as ready for review March 23, 2024 08:01

phymbert added 3 commits March 23, 2024 09:02

common: remove redundant LLAMA_CURL_MAX_PATH_LENGTH definition

fbcf2ab

common: change max url max length

c7d4db3

common: minor comment

dc3469e

phymbert commented Mar 23, 2024

View reviewed changes

server: support HF URL options

08a0c13

phymbert commented Mar 23, 2024

View reviewed changes

phymbert added 2 commits March 23, 2024 09:35

llama: llama_model_loader fix log

4fa1c63

common: use a constant for max url length

8187983

phymbert commented Mar 23, 2024

View reviewed changes

common/common.cpp Show resolved Hide resolved

common: clean up curl if file cannot be loaded in gguf

3ba5f2d

phymbert commented Mar 23, 2024

View reviewed changes

phymbert changed the title ~~common: llama_load_model_from_url support split~~ common: llama_load_model_from_url split support Mar 23, 2024

ngxson reviewed Mar 23, 2024

View reviewed changes

common/common.cpp Outdated Show resolved Hide resolved

phymbert added 2 commits March 23, 2024 12:53

server: tests: add split tests, and HF options params

b4a2ed8

common: move llama_download_hide_password_in_url inside llama_downloa…

52d7f44

…d_file as a lambda

server: tests: enable back Release test on PR

bdef0ec

arki05 mentioned this pull request Mar 23, 2024

Add grok-1 support #6204

Merged

ggerganov approved these changes Mar 23, 2024

View reviewed changes

common/common.cpp Outdated Show resolved Hide resolved

llama.cpp Outdated Show resolved Hide resolved

common/common.cpp Outdated Show resolved Hide resolved

phymbert and others added 3 commits March 23, 2024 17:51

spacing

4da00c1

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

spacing

34a7665

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

spacing

72d4eb5

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

phymbert merged commit f482bb2 into master Mar 23, 2024
39 of 57 checks passed

phymbert deleted the hp/split/load-model-from-url branch March 23, 2024 17:07

phymbert restored the hp/split/load-model-from-url branch March 23, 2024 17:24

phymbert deleted the hp/split/load-model-from-url branch March 23, 2024 18:07

This was referenced Apr 3, 2024

split: allow --split-max-size option #6343

Merged

gguf-split add a default option to not include tensors data in first shard #6463

Closed

common: download from URL, improve parallel download progress status #6537

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common: llama_load_model_from_url split support #6192

common: llama_load_model_from_url split support #6192

phymbert commented Mar 21, 2024 •

edited

Loading

phymbert Mar 23, 2024

ngxson Mar 23, 2024

phymbert Mar 23, 2024 •

edited

Loading

phymbert Mar 23, 2024

phymbert commented Mar 23, 2024

ggerganov commented Mar 23, 2024

		@@ -1844,14 +1855,14 @@ struct llama_model * llama_load_model_from_url(
		curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 0L);

common: llama_load_model_from_url split support #6192

common: llama_load_model_from_url split support #6192

Conversation

phymbert commented Mar 21, 2024 • edited Loading

Context

Changes

Example:

phymbert Mar 23, 2024

Choose a reason for hiding this comment

ngxson Mar 23, 2024

Choose a reason for hiding this comment

phymbert Mar 23, 2024 • edited Loading

Choose a reason for hiding this comment

phymbert Mar 23, 2024

Choose a reason for hiding this comment

phymbert commented Mar 23, 2024

ggerganov commented Mar 23, 2024

phymbert commented Mar 21, 2024 •

edited

Loading

phymbert Mar 23, 2024 •

edited

Loading