Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: llama_load_model_from_url using --model-url #6098

Merged
merged 53 commits into from
Mar 17, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
3221ab0
common: introduce llama_load_model_from_url to download model from hf…
phymbert Mar 16, 2024
a0ebdfc
common: llama_load_model_from_url witch to libcurl dependency
phymbert Mar 16, 2024
42b25da
common: PR feedback, rename the definition to LLAMA_USE_CURL
phymbert Mar 16, 2024
7e78285
common: LLAMA_USE_CURL in make toolchain
phymbert Mar 16, 2024
df0d822
ci: compile the server with curl, add make option curl example in def…
phymbert Mar 16, 2024
80bec98
llama_load_model_from_url: try to make the windows build passing
phymbert Mar 16, 2024
2c3a00e
Update Makefile
phymbert Mar 16, 2024
4135d4a
llama_load_model_from_url: typo
phymbert Mar 16, 2024
5d99f32
llama_load_model_from_url: download the file only if modified based o…
phymbert Mar 16, 2024
921e4af
ci: build, fix the default build to use LLAMA_CURL
phymbert Mar 16, 2024
6633689
llama_load_model_from_url: cleanup code
phymbert Mar 16, 2024
1430e89
Merge branch 'master' into hp/download-model-from-hf
phymbert Mar 16, 2024
e84206d
Update examples/server/README.md
phymbert Mar 16, 2024
4bc47b7
Update common/common.cpp
phymbert Mar 16, 2024
8751bd0
Update common/common.cpp
phymbert Mar 16, 2024
f53bfd5
Update common/common.cpp
phymbert Mar 16, 2024
b088122
Update common/common.cpp
phymbert Mar 16, 2024
f22456d
Update common/common.cpp
phymbert Mar 16, 2024
9565ae3
Update common/common.cpp
phymbert Mar 16, 2024
330e28d
Update common/common.cpp
phymbert Mar 16, 2024
89ab37a
Update common/common.cpp
phymbert Mar 16, 2024
be561a7
Update common/common.cpp
phymbert Mar 16, 2024
eb9e52a
Update common/common.cpp
phymbert Mar 16, 2024
b0b49e0
Update examples/main/README.md
phymbert Mar 16, 2024
545fef6
llama_load_model_from_url: fix compilation warning, clearer logging
phymbert Mar 16, 2024
4fadb07
server: tests: add `--model-url` tests
phymbert Mar 16, 2024
124c474
llama_load_model_from_url: coherent clearer logging
phymbert Mar 16, 2024
064dc07
common: CMakeLists.txt fix typo in logging when lib curl is not found
phymbert Mar 16, 2024
838178a
ci: tests: windows tests add libcurl
phymbert Mar 16, 2024
176f039
ci: tests: windows tests add libcurl
phymbert Mar 16, 2024
5df5605
ci: build: add libcurl in default make toolchain step
phymbert Mar 16, 2024
78812c6
llama_load_model_from_url: PR feedback, use snprintf instead of strnc…
phymbert Mar 16, 2024
1ad5a45
ci: build: add libcurl in default make toolchain step for tests
phymbert Mar 16, 2024
22b3bb3
common: fix windows build caused by double windows.h import
phymbert Mar 16, 2024
e6848ab
build: move the make build with env LLAMA_CURL to a dedicated place
phymbert Mar 16, 2024
d81acb6
build: introduce cmake option LLAMA_CURL to trigger libcurl linking t…
phymbert Mar 16, 2024
dbd9691
build: move the make build with env LLAMA_CURL to a dedicated place
phymbert Mar 16, 2024
9da4eec
llama_load_model_from_url: minor spacing and log message changes
phymbert Mar 16, 2024
89d3483
ci: build: fix ubuntu-focal-make-curl
phymbert Mar 16, 2024
13d8817
ci: build: try to fix the windows build
phymbert Mar 16, 2024
1ddaf71
common: remove old dependency to openssl
phymbert Mar 16, 2024
73b4b44
common: fix build
phymbert Mar 16, 2024
a3ed3d4
common: fix windows build
phymbert Mar 17, 2024
5e66ec8
common: fix windows tests
phymbert Mar 17, 2024
9ca4acc
common: fix windows tests
phymbert Mar 17, 2024
c1b002e
common: llama_load_model_from_url windows set CURLOPT_SSL_OPTIONS, CU…
phymbert Mar 17, 2024
cff7faa
ci: tests: print server logs in case of scenario failure
phymbert Mar 17, 2024
4fe431d
common: llama_load_model_from_url: make it working on windows: disabl…
phymbert Mar 17, 2024
47a9e5d
ci: tests: increase timeout for windows
phymbert Mar 17, 2024
31272c6
common: fix typo
phymbert Mar 17, 2024
f902ab6
common: llama_load_model_from_url use a temporary file for downloading
phymbert Mar 17, 2024
b24f30f
common: llama_load_model_from_url delete previous file before downloa…
phymbert Mar 17, 2024
fcf327f
ci: tests: fix behavior on windows
phymbert Mar 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ jobs:
id: make_build
env:
LLAMA_FATAL_WARNINGS: 1
LLAMA_USE_CURL: 1
run: |
CC=gcc-8 make -j $(nproc)

Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,8 @@ jobs:
cmake \
python3-pip \
wget \
language-pack-en
language-pack-en \
libcurl4-openssl-dev

- name: Build
id: cmake_build
Expand Down
5 changes: 5 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -595,6 +595,11 @@ include scripts/get-flags.mk
CUDA_CXXFLAGS := $(BASE_CXXFLAGS) $(GF_CXXFLAGS) -Wno-pedantic
endif

ifdef LLAMA_USE_CURL
phymbert marked this conversation as resolved.
Show resolved Hide resolved
override CXXFLAGS := $(CXXFLAGS) -DLLAMA_USE_CURL
override LDFLAGS := $(LDFLAGS) -lcurl
endif

#
# Print build information
#
Expand Down
10 changes: 10 additions & 0 deletions common/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,16 @@ if (BUILD_SHARED_LIBS)
set_target_properties(${TARGET} PROPERTIES POSITION_INDEPENDENT_CODE ON)
endif()

# Check for curl
find_package(CURL QUIET)
phymbert marked this conversation as resolved.
Show resolved Hide resolved
if (CURL_FOUND)
add_definitions(-DLLAMA_USE_CURL)
include_directories(${CURL_INCLUDE_DIRS})
link_libraries(${CURL_LIBRARIES})
else()
message(INFO "libcurl not found. Building without model download support.")
endif ()


set(TARGET common)

Expand Down
84 changes: 83 additions & 1 deletion common/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@
#include <unordered_set>
#include <vector>
#include <cinttypes>
#ifdef LLAMA_USE_CURL
#include <curl/curl.h>
#endif

#if defined(__APPLE__) && defined(__MACH__)
#include <sys/types.h>
Expand Down Expand Up @@ -531,6 +534,12 @@ bool gpt_params_parse_ex(int argc, char ** argv, gpt_params & params) {
break;
}
params.model = argv[i];
} else if (arg == "-mu" || arg == "--model-url") {
if (++i >= argc) {
invalid_param = true;
break;
}
params.model_url = argv[i];
} else if (arg == "-md" || arg == "--model-draft") {
if (++i >= argc) {
invalid_param = true;
Expand Down Expand Up @@ -1131,6 +1140,8 @@ void gpt_print_usage(int /*argc*/, char ** argv, const gpt_params & params) {
printf(" layer range to apply the control vector(s) to, start and end inclusive\n");
printf(" -m FNAME, --model FNAME\n");
printf(" model path (default: %s)\n", params.model.c_str());
printf(" -mu MODEL_URL, --model-url MODEL_URL\n");
printf(" model download url (default: %s)\n", params.model_url.c_str());
printf(" -md FNAME, --model-draft FNAME\n");
printf(" draft model for speculative decoding\n");
printf(" -ld LOGDIR, --logdir LOGDIR\n");
Expand Down Expand Up @@ -1376,10 +1387,81 @@ void llama_batch_add(
batch.n_tokens++;
}

#ifdef LLAMA_USE_CURL
struct llama_model * llama_load_model_from_url(const char * model_url, const char * path_model,
struct llama_model_params params) {
// Initialize libcurl
curl_global_init(CURL_GLOBAL_DEFAULT);
auto curl = curl_easy_init();

if (!curl) {
curl_global_cleanup();
fprintf(stderr, "%s: error initializing lib curl\n", __func__);
return nullptr;
}

// Set the URL
curl_easy_setopt(curl, CURLOPT_URL, model_url);
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 0L);

// Set the output file
auto outfile = fopen(path_model, "wb");
if (!outfile) {
curl_easy_cleanup(curl);
curl_global_cleanup();
fprintf(stderr, "%s: error opening local file for writing: %s\n", __func__, path_model);
return nullptr;
}
curl_easy_setopt(curl, CURLOPT_WRITEDATA, outfile);

// start the download
fprintf(stdout, "%s: downloading model from %s to %s ...\n", __func__, model_url, path_model);
auto res = curl_easy_perform(curl);
if (res != CURLE_OK) {
fclose(outfile);
curl_easy_cleanup(curl);
curl_global_cleanup();
fprintf(stderr, "%s: curl_easy_perform() failed: %s\n", __func__, curl_easy_strerror(res));
return nullptr;
}

int http_code = 0;
curl_easy_getinfo (curl, CURLINFO_RESPONSE_CODE, &http_code);
if (http_code < 200 || http_code >= 400) {
fclose(outfile);
curl_easy_cleanup(curl);
curl_global_cleanup();
fprintf(stderr, "%s: invalid http status code failed: %d\n", __func__, http_code);
return nullptr;
}

// Clean up
fclose(outfile);
curl_easy_cleanup(curl);
curl_global_cleanup();

return llama_load_model_from_file(path_model, params);
}
#else

struct llama_model *llama_load_model_from_url(const char * /*model_url*/, const char * /*path_model*/,
phymbert marked this conversation as resolved.
Show resolved Hide resolved
struct llama_model_params /*params*/) {
fprintf(stderr, "%s: llama.cpp built without curl support, downloading from an url not supported.\n", __func__);
return nullptr;
}

#endif

std::tuple<struct llama_model *, struct llama_context *> llama_init_from_gpt_params(gpt_params & params) {
auto mparams = llama_model_params_from_gpt_params(params);

llama_model * model = llama_load_model_from_file(params.model.c_str(), mparams);
llama_model * model = nullptr;
if (!params.model_url.empty()) {
model = llama_load_model_from_url(params.model_url.c_str(), params.model.c_str(), mparams);
} else {
model = llama_load_model_from_file(params.model.c_str(), mparams);
}
if (model == NULL) {
fprintf(stderr, "%s: error: failed to load model '%s'\n", __func__, params.model.c_str());
return std::make_tuple(nullptr, nullptr);
Expand Down
10 changes: 10 additions & 0 deletions common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@
#include <unordered_map>
#include <tuple>

#ifdef HAVE_OPENSSL
#include <openssl/ssl.h>
#include <openssl/bio.h>
#include <openssl/err.h>
#endif

#ifdef _WIN32
#define DIRECTORY_SEPARATOR '\\'
#else
Expand Down Expand Up @@ -89,6 +95,7 @@ struct gpt_params {
struct llama_sampling_params sparams;

std::string model = "models/7B/ggml-model-f16.gguf"; // model path
std::string model_url = ""; // model path
std::string model_draft = ""; // draft model for speculative decoding
std::string model_alias = "unknown"; // model alias
std::string prompt = "";
Expand Down Expand Up @@ -191,6 +198,9 @@ std::tuple<struct llama_model *, struct llama_context *> llama_init_from_gpt_par
struct llama_model_params llama_model_params_from_gpt_params (const gpt_params & params);
struct llama_context_params llama_context_params_from_gpt_params(const gpt_params & params);

struct llama_model * llama_load_model_from_url(const char * model_url, const char * path_model,
struct llama_model_params params);

// Batch utils

void llama_batch_clear(struct llama_batch & batch);
Expand Down
1 change: 1 addition & 0 deletions examples/main/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ main.exe -m models\7B\ggml-model.bin --ignore-eos -n -1 --random-prompt
In this section, we cover the most commonly used options for running the `main` program with the LLaMA models:

- `-m FNAME, --model FNAME`: Specify the path to the LLaMA model file (e.g., `models/7B/ggml-model.bin`).
- `-mu MODEL_URL --model MODEL_URL`: Specify a remote http url to download the file (e.g https://huggingface.co/ggml-org/models/resolve/main/phi-2/ggml-model-q4_0.gguf).
phymbert marked this conversation as resolved.
Show resolved Hide resolved
- `-i, --interactive`: Run the program in interactive mode, allowing you to provide input directly and receive real-time responses.
- `-ins, --instruct`: Run the program in instruction mode, which is particularly useful when working with Alpaca models.
- `-n N, --n-predict N`: Set the number of tokens to predict when generating text. Adjusting this value can influence the length of the generated text.
Expand Down
1 change: 1 addition & 0 deletions examples/server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ The project is under active development, and we are [looking for feedback and co
- `-tb N, --threads-batch N`: Set the number of threads to use during batch and prompt processing. If not specified, the number of threads will be set to the number of threads used for generation.
- `--threads-http N`: number of threads in the http server pool to process requests (default: `max(std::thread::hardware_concurrency() - 1, --parallel N + 2)`)
- `-m FNAME`, `--model FNAME`: Specify the path to the LLaMA model file (e.g., `models/7B/ggml-model.gguf`).
- `-mu MODEL_URL --model MODEL_URL`: Specify a remote http url to download the file (e.g https://huggingface.co/ggml-org/models/resolve/main/phi-2/ggml-model-q4_0.gguf).
phymbert marked this conversation as resolved.
Show resolved Hide resolved
- `-a ALIAS`, `--alias ALIAS`: Set an alias for the model. The alias will be returned in API responses.
- `-c N`, `--ctx-size N`: Set the size of the prompt context. The default is 512, but LLaMA models were built with a context of 2048, which will provide better results for longer input/inference. The size may differ in other models, for example, baichuan models were build with a context of 4096.
- `-ngl N`, `--n-gpu-layers N`: When compiled with appropriate support (currently CLBlast or cuBLAS), this option allows offloading some layers to the GPU for computation. Generally results in increased performance.
Expand Down
8 changes: 8 additions & 0 deletions examples/server/server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2195,6 +2195,8 @@ static void server_print_usage(const char * argv0, const gpt_params & params, co
}
printf(" -m FNAME, --model FNAME\n");
printf(" model path (default: %s)\n", params.model.c_str());
printf(" -mu MODEL_URL, --model-url MODEL_URL\n");
printf(" model download url (default: %s)\n", params.model_url.c_str());
printf(" -a ALIAS, --alias ALIAS\n");
printf(" set an alias for the model, will be added as `model` field in completion response\n");
printf(" --lora FNAME apply LoRA adapter (implies --no-mmap)\n");
Expand Down Expand Up @@ -2317,6 +2319,12 @@ static void server_params_parse(int argc, char ** argv, server_params & sparams,
break;
}
params.model = argv[i];
} else if (arg == "-mu" || arg == "--model-url") {
if (++i >= argc) {
invalid_param = true;
break;
}
params.model_url = argv[i];
} else if (arg == "-a" || arg == "--alias") {
if (++i >= argc) {
invalid_param = true;
Expand Down
Loading