Skip to content

Auto docker v2 - dockerised Open Llama 3B image w/OpenBLAS enabled server #310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 2, 2023

Conversation

gjmulder
Copy link
Contributor

@gjmulder gjmulder commented Jun 2, 2023

  • Updated simple Dockerfiles to install llama-cpp-python packages instead of source
  • Rearranged directory structure to make docker build ... commands work correctly
  • "Open LLama 3B in-a-box" (tested on Linux only, sorry!):
$ cd docker/open_llama

docker/open_llama$ ./open_llama_in_a_box.sh 
Making request to https://huggingface.co/api/models...
Making request to https://huggingface.co/api/models/SlyEcho/open_llama_3b_ggml...
Downloading https://huggingface.co/SlyEcho/open_llama_3b_ggml/resolve/main/open-llama-3b-q5_1.bin to SlyEcho_open_llama_3b_ggml_open-llama-3b-q5_1.bin...
.....................................................................................................................................................................................................................................................
Download complete.
magic: 0x67676a74, version: 0x0003, file: SlyEcho_open_llama_3b_ggml_open-llama-3b-q5_1.bin
-rw-rw-r-- 1 user user 2.4G Jun  2 08:27 SlyEcho_open_llama_3b_ggml_open-llama-3b-q5_1.bin
lrwxrwxrwx 1 user user   49 Jun  2 08:27 model.bin -> SlyEcho_open_llama_3b_ggml_open-llama-3b-q5_1.bin
Sending build context to Docker daemon  2.571GB
Step 1/15 : ARG IMAGE=python:3-slim-bullseye
Step 2/15 : FROM ${IMAGE}
 ---> 56dd444d312f
Step 3/15 : ARG IMAGE
 ---> Using cache
 ---> cdb938acb842
Step 4/15 : RUN apt-get update && apt-get upgrade -y && apt-get install -y --no-install-recommends     python3     python3-pip     ninja-build     build-essential
 ---> Using cache
 ---> 6c7ca0d898b1
Step 5/15 : RUN python3 -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette
 ---> Using cache
 ---> 759f5492c138
Step 6/15 : RUN echo "Image: ${IMAGE}" &&     if [ "${IMAGE}" = "python:3-slim-bullseye" ] ; then     echo "OpenBLAS install:" &&     apt-get install -y --no-install-recommends libopenblas-dev &&     LLAMA_OPENBLAS=1 pip install llama-cpp-python --verbose; else     echo "CuBLAS install:" &&     LLAMA_CUBLAS=1 pip install llama-cpp-python --verbose; fi
 ---> Using cache
 ---> 0fc3144b59db
Step 7/15 : RUN rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 07c6050f0a73
Step 8/15 : WORKDIR /app
 ---> Using cache
 ---> 8688f680685e
Step 9/15 : RUN echo "Installing model...this can take some time..."
 ---> Using cache
 ---> 08cce76429ac
Step 10/15 : COPY ./model.bin /app/model.bin
 ---> Using cache
 ---> ac1aa642e96b
Step 11/15 : COPY ./start_server.sh /app/start_server.sh
 ---> Using cache
 ---> 4a1072e4a28d
Step 12/15 : RUN chmod +x /app/start_server.sh
 ---> Using cache
 ---> c1c0cd531462
Step 13/15 : ENV HOST=0.0.0.0
 ---> Using cache
 ---> 6d592c55ca52
Step 14/15 : EXPOSE 8000
 ---> Using cache
 ---> 7bf2466dcdb6
Step 15/15 : CMD ["/bin/sh", "/app/start_server.sh"]
 ---> Using cache
 ---> afc4cb89b07a
Successfully built afc4cb89b07a
Successfully tagged open_llama_3b:latest
REPOSITORY      TAG                        IMAGE ID       CREATED          SIZE
open_llama_3b   latest                     afc4cb89b07a   31 minutes ago   3.34GB

To start the docker container run:
docker run -t -p 8000:8000 open_llama_3b

$docker/open_llama$ ./start.sh 
llama.cpp: loading model from /app/model.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 3200
llama_model_load_internal: n_mult     = 216
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 26
llama_model_load_internal: n_rot      = 100
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 8640
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 3B
llama_model_load_internal: ggml ctx size =    0.06 MB
llama_model_load_internal: mem required  = 3219.39 MB (+  682.00 MB per state)
.................................................................................................
llama_init_from_file: kv self size  =  650.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
INFO:     Started server process [7]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

CONTAINER ID   IMAGE           COMMAND                  CREATED          STATUS         PORTS                                       NAMES
236a378c0550   open_llama_3b   "/bin/sh /app/start_…"   10 seconds ago   Up 9 seconds   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp   stoic_poincare

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   127    0     0  100   127      0    105  0:00:01  0:00:01 --:--:--   105
llama_print_timings:        load time =  1113.28 ms
llama_print_timings:      sample time =     2.02 ms /     3 runs   (    0.67 ms per token)
llama_print_timings: prompt eval time =  1113.21 ms /    24 tokens (   46.38 ms per token)
llama_print_timings:        eval time =   151.68 ms /     2 runs   (   75.84 ms per token)
llama_print_timings:       total time =  1316.52 ms
INFO:     172.17.0.1:60518 - "POST /v1/completions HTTP/1.1" 200 OK
100   397  100   270  100   127    204     96  0:00:01  0:00:01 --:--:--   300
{"id":"cmpl-0bdcdeb4-e8c7-40ea-a442-c724aa8baab3","object":"text_completion","created":1685703180,"model":"/app/model.bin","choices":[{"text":"Paris","index":0,"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":24,"completion_tokens":3,"total_tokens":27}}

open_llama_3b is working!!

@abetlen abetlen merged commit 3977eea into abetlen:main Jun 2, 2023
@SlyEcho
Copy link

SlyEcho commented Jun 7, 2023

@gjmulder I want to update the model to the final version. I hope that doesn't cause issues for you.

@gjmulder
Copy link
Contributor Author

gjmulder commented Jun 7, 2023

@SlyEcho if you replace the 3B model, it should just work 🤞 as my 🤗 API search terms are:

  1. Author: SlyEcho
  2. Search: open_llama_3b_ggml

And then I substring match a file in the repo that includes the string q5_1

The main ask was that you didn't delete the repo as then then the search will return nothing to d/l.

@SlyEcho
Copy link

SlyEcho commented Jun 7, 2023

It's updated now, I am adding 7B and 13B as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build model Model specific issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants