Auto docker v2 - dockerised Open Llama 3B image w/OpenBLAS enabled server #310

gjmulder · 2023-06-02T11:09:49Z

Updated simple Dockerfiles to install llama-cpp-python packages instead of source
Rearranged directory structure to make docker build ... commands work correctly
"Open LLama 3B in-a-box" (tested on Linux only, sorry!):

$ cd docker/open_llama

docker/open_llama$ ./open_llama_in_a_box.sh 
Making request to https://huggingface.co/api/models...
Making request to https://huggingface.co/api/models/SlyEcho/open_llama_3b_ggml...
Downloading https://huggingface.co/SlyEcho/open_llama_3b_ggml/resolve/main/open-llama-3b-q5_1.bin to SlyEcho_open_llama_3b_ggml_open-llama-3b-q5_1.bin...
.....................................................................................................................................................................................................................................................
Download complete.
magic: 0x67676a74, version: 0x0003, file: SlyEcho_open_llama_3b_ggml_open-llama-3b-q5_1.bin
-rw-rw-r-- 1 user user 2.4G Jun  2 08:27 SlyEcho_open_llama_3b_ggml_open-llama-3b-q5_1.bin
lrwxrwxrwx 1 user user   49 Jun  2 08:27 model.bin -> SlyEcho_open_llama_3b_ggml_open-llama-3b-q5_1.bin
Sending build context to Docker daemon  2.571GB
Step 1/15 : ARG IMAGE=python:3-slim-bullseye
Step 2/15 : FROM ${IMAGE}
 ---> 56dd444d312f
Step 3/15 : ARG IMAGE
 ---> Using cache
 ---> cdb938acb842
Step 4/15 : RUN apt-get update && apt-get upgrade -y && apt-get install -y --no-install-recommends     python3     python3-pip     ninja-build     build-essential
 ---> Using cache
 ---> 6c7ca0d898b1
Step 5/15 : RUN python3 -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette
 ---> Using cache
 ---> 759f5492c138
Step 6/15 : RUN echo "Image: ${IMAGE}" &&     if [ "${IMAGE}" = "python:3-slim-bullseye" ] ; then     echo "OpenBLAS install:" &&     apt-get install -y --no-install-recommends libopenblas-dev &&     LLAMA_OPENBLAS=1 pip install llama-cpp-python --verbose; else     echo "CuBLAS install:" &&     LLAMA_CUBLAS=1 pip install llama-cpp-python --verbose; fi
 ---> Using cache
 ---> 0fc3144b59db
Step 7/15 : RUN rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 07c6050f0a73
Step 8/15 : WORKDIR /app
 ---> Using cache
 ---> 8688f680685e
Step 9/15 : RUN echo "Installing model...this can take some time..."
 ---> Using cache
 ---> 08cce76429ac
Step 10/15 : COPY ./model.bin /app/model.bin
 ---> Using cache
 ---> ac1aa642e96b
Step 11/15 : COPY ./start_server.sh /app/start_server.sh
 ---> Using cache
 ---> 4a1072e4a28d
Step 12/15 : RUN chmod +x /app/start_server.sh
 ---> Using cache
 ---> c1c0cd531462
Step 13/15 : ENV HOST=0.0.0.0
 ---> Using cache
 ---> 6d592c55ca52
Step 14/15 : EXPOSE 8000
 ---> Using cache
 ---> 7bf2466dcdb6
Step 15/15 : CMD ["/bin/sh", "/app/start_server.sh"]
 ---> Using cache
 ---> afc4cb89b07a
Successfully built afc4cb89b07a
Successfully tagged open_llama_3b:latest
REPOSITORY      TAG                        IMAGE ID       CREATED          SIZE
open_llama_3b   latest                     afc4cb89b07a   31 minutes ago   3.34GB

To start the docker container run:
docker run -t -p 8000:8000 open_llama_3b

$docker/open_llama$ ./start.sh 
llama.cpp: loading model from /app/model.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 3200
llama_model_load_internal: n_mult     = 216
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 26
llama_model_load_internal: n_rot      = 100
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 8640
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 3B
llama_model_load_internal: ggml ctx size =    0.06 MB
llama_model_load_internal: mem required  = 3219.39 MB (+  682.00 MB per state)
.................................................................................................
llama_init_from_file: kv self size  =  650.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
INFO:     Started server process [7]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

CONTAINER ID   IMAGE           COMMAND                  CREATED          STATUS         PORTS                                       NAMES
236a378c0550   open_llama_3b   "/bin/sh /app/start_…"   10 seconds ago   Up 9 seconds   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp   stoic_poincare

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   127    0     0  100   127      0    105  0:00:01  0:00:01 --:--:--   105
llama_print_timings:        load time =  1113.28 ms
llama_print_timings:      sample time =     2.02 ms /     3 runs   (    0.67 ms per token)
llama_print_timings: prompt eval time =  1113.21 ms /    24 tokens (   46.38 ms per token)
llama_print_timings:        eval time =   151.68 ms /     2 runs   (   75.84 ms per token)
llama_print_timings:       total time =  1316.52 ms
INFO:     172.17.0.1:60518 - "POST /v1/completions HTTP/1.1" 200 OK
100   397  100   270  100   127    204     96  0:00:01  0:00:01 --:--:--   300
{"id":"cmpl-0bdcdeb4-e8c7-40ea-a442-c724aa8baab3","object":"text_completion","created":1685703180,"model":"/app/model.bin","choices":[{"text":"Paris","index":0,"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":24,"completion_tokens":3,"total_tokens":27}}

open_llama_3b is working!!

…p README.md again

SlyEcho · 2023-06-07T13:40:51Z

@gjmulder I want to update the model to the final version. I hope that doesn't cause issues for you.

gjmulder · 2023-06-07T15:10:20Z

@SlyEcho if you replace the 3B model, it should just work 🤞 as my 🤗 API search terms are:

Author: SlyEcho
Search: open_llama_3b_ggml

And then I substring match a file in the repo that includes the string q5_1

The main ask was that you didn't delete the repo as then then the search will return nothing to d/l.

SlyEcho · 2023-06-07T15:22:25Z

It's updated now, I am adding 7B and 13B as well.

gjmulder added 6 commits May 31, 2023 15:16

Updated README.md instructions on how to use *_simple/Dockerfiles

483b6ba

Added paramterised search and d/l for Hugging Face. Updated README.md

217d783

Working Open Llama 3B in a box

cf4931a

Updated instructions

f24e7a7

Fixed .gitignore to ignore any downloaded model .bin files. Cleaned u…

d4eef73

…p README.md again

More README.md corrections and cleanup

30d32e9

gjmulder mentioned this pull request Jun 2, 2023

Push Docker images to Dockerhub using Github actions for running a llama-cpp-python REST server #236

Closed

gjmulder added build model Model specific issue labels Jun 2, 2023

gjmulder mentioned this pull request Jun 2, 2023

Docker examples are broke. Are they for dev or prod? #298

Closed

abetlen approved these changes Jun 2, 2023

View reviewed changes

abetlen merged commit 3977eea into abetlen:main Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto docker v2 - dockerised Open Llama 3B image w/OpenBLAS enabled server #310

Auto docker v2 - dockerised Open Llama 3B image w/OpenBLAS enabled server #310

gjmulder commented Jun 2, 2023 •

edited

Loading

SlyEcho commented Jun 7, 2023

gjmulder commented Jun 7, 2023 •

edited

Loading

SlyEcho commented Jun 7, 2023

Auto docker v2 - dockerised Open Llama 3B image w/OpenBLAS enabled server #310

Auto docker v2 - dockerised Open Llama 3B image w/OpenBLAS enabled server #310

Conversation

gjmulder commented Jun 2, 2023 • edited Loading

SlyEcho commented Jun 7, 2023

gjmulder commented Jun 7, 2023 • edited Loading

SlyEcho commented Jun 7, 2023

gjmulder commented Jun 2, 2023 •

edited

Loading

gjmulder commented Jun 7, 2023 •

edited

Loading