Using a WSL based Docker, run the llama.cpp container, load the quantified Chinese alpha plus model, and the terminal will continue to output a carriage return after inputting Chinese #1649

qingfengfenga · 2023-05-30T09:11:38Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Normal conversation in Chinese

Same operation, it is normal in WSL, I don't know what is missing in WSL based Docker container

Current Behavior

The Docker container environment has been configured to support Chinese character sets, and Modified the user input configuration file, and the bash terminal itself can support Chinese input and display.

Using a WSL based Docker, run the llama.cpp container, load the quantified Chinese-alpha-plus model, and the terminal will continue to output a carriage return after inputting Chinese

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

Physical (or virtual) hardware you are using, e.g. for Linux:

Docker container based on WSL

$ git log | head -1
commit 7552ac586380f202b75b18aa216ecfefbd438d94

$ lscpu | egrep "Intel|Flags"
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Core(TM) i5-10400 CPU @ 2.90GHz
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities

$ python3 --version
Python 3.10.6

$pip list | egrep "torch|numpy|sentencepiece"
numpy         1.24.0
sentencepiece 0.1.98

$ make --version | head -1
GNU Make 4.3

$ md5sum ./models/llama/7B/chinese-alpaca-plus-pth/7B/ggml-model-q4_0.bin
2be976125e14ef0def15f3992155b3f4  ./models/llama/7B/chinese-alpaca-plus-pth/7B/ggml-model-q4_0.bin

Operating System, e.g. for Linux:

$ uname -a
Linux 5d8db86af909 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

SDK version, e.g. for Linux:

$ python3 --version
Python 3.10.6

$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

$ g++ --version
g++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Failure Information (for bugs)

Using a WSL based Docker, run the llama.cpp container, load the quantified Chinese-alpha-plus model, and the terminal will continue to output a carriage return after inputting Chinese

$ docker exec -it stoic_margulis bash
root@5d8db86af909:/app# ls
BLIS.md         README.md     convert-lora-to-ggml.py  flake.lock       ggml-opencl.h  llama.cpp  models      quantize-stats    vdot
CMakeLists.txt  SHA256SUMS    convert-pth-to-ggml.py   flake.nix        ggml.c         llama.h    perplexity  requirements.txt
LICENSE         build-info.h  convert.py               ggml-cuda.cu     ggml.h         llama.o    pocs        scripts
Makefile        build.zig     embedding                ggml-cuda.h      ggml.o         main       prompts     spm-headers
Package.swift   common.o      examples                 ggml-opencl.cpp  llama-util.h   media      quantize    tests
root@5d8db86af909:/app# ls models/
llama
root@5d8db86af909:/app# ./main -m ./models/llama/7B/chinese-alpaca-plus-pth/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -ins -c 2048 --temp 0.2 -n 256 --repeat_penalty 1.1
main: build = 1 (7552ac5)
main: seed  = 1685435292
llama.cpp: loading model from ./models/llama/7B/chinese-alpaca-plus-pth/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 49954
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 5486.61 MB (+ 1026.00 MB per state)
.
llama_init_from_file: kv self size  = 1024.00 MB

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.200000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = 256, n_keep = 21


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 Below is an instruction that describes a task. Write a response that appropriately completes the request.
> hi
Hello! How can I assist you today?
>
>
>
>
>
>
>
>
>
>
>
>
>
>

llama_print_timings:        load time = 78487.39 ms
llama_print_timings:      sample time =    87.52 ms /   134 runs   (    0.65 ms per token)
llama_print_timings: prompt eval time =  4995.84 ms /    41 tokens (  121.85 ms per token)
llama_print_timings:        eval time = 26214.38 ms /   133 runs   (  197.10 ms per token)
llama_print_timings:       total time = 145959.58 ms
root@5d8db86af909:/app#

Steps to Reproduce

Docs：

Get the chinese-alpaca-plus model after merging lora.
Convert the merged model to FP16 format.
Quantization using the script provided by llama.cpp.
Start and mount the quantized model to the llama.cpp container.
Here, due to script compatibility reasons, the default script of the container is not used, but the ./main command is used directly.
Enter English to test whether the model is normal.
Enter Chinese to test whether the model is normal.

The text was updated successfully, but these errors were encountered:

DannyDaemonic · 2023-05-30T14:11:11Z

Does this only happen with the docker image? It doesn't happen if you compile it yourself or use one of the official binaries?

Either way, I believe I've seen this before. Either the docker isn't configured to use the chinese locale or it doesn't contain it. I think the first is more likely. Can you try setting LC_ALL to "zh-CN.UTF-8" and see if it fixes the problem? You can set the environment variable at runtime using the -e option with docker run:

    docker run -e LC_ALL=zh-CN.UTF-8 -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B

    docker run -e LC_ALL=zh-CN.UTF-8 -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512

    docker run -e LC_ALL=zh-CN.UTF-8 -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512

qingfengfenga · 2023-05-31T07:35:02Z

这只发生在 docker 镜像上吗？如果您自己编译或使用官方二进制文件之一，不会发生这种情况吗？

无论哪种方式，我相信我以前见过这个。docker 未配置为使用中文区域设置，或者不包含中文区域设置。我认为第一个更有可能。您可以尝试将LC_ALL设置为“zh-CN.UTF-8”，看看是否可以解决问题吗？您可以在运行时使用以下选项设置环境变量：-e``docker run
    docker run -e LC_ALL=zh-CN.UTF-8 -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B
    docker run -e LC_ALL=zh-CN.UTF-8 -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512
    docker run -e LC_ALL=zh-CN.UTF-8 -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512

Thank you very much. After installing the corresponding language pack, I tested C.utf8 and zh_ The CN.utf8 character set, they all work properly.

DannyDaemonic · 2023-05-31T07:40:39Z

You aren't the first with this issue. Are you able to provide the steps you followed on how you fixed this so I can include it in the documentation?

您并不是第一个遇到这个问题的人。您能提供解决这个问题所遵循的步骤吗？这样我可以将它包含在文档中。

qingfengfenga · 2023-05-31T08:09:04Z

You aren't the first with this issue. Are you able to provide the steps you followed on how you fixed this so I can include it in the documentation?

您并不是第一个遇到这个问题的人。您能提供解决这个问题所遵循的步骤吗？这样我可以将它包含在文档中。

Of course, I am currently organizing and testing, and I will organize a document on loading Chinese models in a container environment using llama.cpp, using Chinese-alpaca-plus as an example.

当然，我目前还在整理和测试，之后我会整理一份llama.cpp在容器环境下加载中文模型的文档，以Chinese-alpaca-plus为例。

qingfengfenga mentioned this issue May 30, 2023

使用llama.cpp容器加载Chinese-alpaca-plus模型，中文输入无限输出回车的问题 ymcui/Chinese-LLaMA-Alpaca#461

Closed

6 tasks

qingfengfenga closed this as completed May 31, 2023

qingfengfenga mentioned this issue Jun 2, 2023

Add llama.cpp docker support for non-latin languages #1673

Merged

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using a WSL based Docker, run the llama.cpp container, load the quantified Chinese alpha plus model, and the terminal will continue to output a carriage return after inputting Chinese #1649

Using a WSL based Docker, run the llama.cpp container, load the quantified Chinese alpha plus model, and the terminal will continue to output a carriage return after inputting Chinese #1649

qingfengfenga commented May 30, 2023

DannyDaemonic commented May 30, 2023

qingfengfenga commented May 31, 2023

DannyDaemonic commented May 31, 2023

qingfengfenga commented May 31, 2023

Using a WSL based Docker, run the llama.cpp container, load the quantified Chinese alpha plus model, and the terminal will continue to output a carriage return after inputting Chinese #1649

Using a WSL based Docker, run the llama.cpp container, load the quantified Chinese alpha plus model, and the terminal will continue to output a carriage return after inputting Chinese #1649

Comments

qingfengfenga commented May 30, 2023

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

DannyDaemonic commented May 30, 2023

qingfengfenga commented May 31, 2023

DannyDaemonic commented May 31, 2023

qingfengfenga commented May 31, 2023