Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run on CPU without AVX2 #315

Open
ZanMax opened this issue Apr 14, 2024 · 3 comments
Open

Run on CPU without AVX2 #315

ZanMax opened this issue Apr 14, 2024 · 3 comments

Comments

@ZanMax
Copy link

ZanMax commented Apr 14, 2024

Hello,
I have a server with Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz and 5x WX9100 and want to run Mistral 7b on each GPU.
But I received an error: "Illegal instruction (core dumped)" when I tried to do it.
Is it possible to run exllama on the CPU without AVX2?

@turboderp
Copy link
Owner

Are you on the latest version?

@ZanMax
Copy link
Author

ZanMax commented Apr 15, 2024

steps:

git clone https://github.com/turboderp/exllama
cd exllama
pip install -r requirements.txt
python test_benchmark_inference.py -d <path_to_model_files> -p -ppl

result

python test_benchmark_inference.py -d /home/dev/models/Mistral-7B-Instruct-v0.2-GPTQ/ -p -ppl
Successfully preprocessed all matching files.
-- Perplexity:
-- - Dataset: datasets/wikitext2_val_sample.jsonl
-- - Chunks: 100
-- - Chunk size: 2048 -> 2048
-- - Chunk overlap: 0
-- - Min. chunk size: 50
-- - Key: text
-- Tokenizer: /home/dev/models/Mistral-7B-Instruct-v0.2-GPTQ/tokenizer.model
-- Model config: /home/dev/models/Mistral-7B-Instruct-v0.2-GPTQ/config.json
-- Model: /home/dev/models/Mistral-7B-Instruct-v0.2-GPTQ/model.safetensors
-- Sequence length: 2048
-- Tuning:
-- --sdp_thd: 8
-- --matmul_recons_thd: 8
-- --fused_mlp_thd: 2
-- --rmsnorm_no_half2
-- --rope_no_half2
-- --matmul_no_half2
-- --silu_no_half2
-- Options: ['perf', 'perplexity']
** Time, Load model: 21.56 seconds
** Time, Load tokenizer: 0.02 seconds
-- Groupsize (inferred): 128
-- Act-order (inferred): yes
** VRAM, Model: [cuda:0] 3,877.87 MB - [cuda:1] 0.00 MB - [cuda:2] 0.00 MB - [cuda:3] 0.00 MB - [cuda:4] 0.00 MB
** VRAM, Cache: [cuda:0] 256.00 MB - [cuda:1] 0.00 MB - [cuda:2] 0.00 MB - [cuda:3] 0.00 MB - [cuda:4] 0.00 MB
-- Warmup pass 1...
Illegal instruction (core dumped)

As I know Illegal instruction (core dumped) means that problem with AVX2 instruction. When I tried the GGUF format with llama.cpp I received the same Illegal instruction (core dumped).

@ZanMax
Copy link
Author

ZanMax commented Apr 18, 2024

Maybe this gives more information about an error:

gdb --args python3 test_benchmark_inference.py -d /home/dev/models/Mistral-7B-Instruct-v0.2-GPTQ/ -p -ppl

#0 0x00007fff4e89540e in rocblas_hgemm () from /home/dev/workspace/numpy_no_avx2/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#1 0x00007fff86e491dd in hipblasHgemm () from /home/dev/workspace/numpy_no_avx2/venv/lib/python3.10/site-packages/torch/lib/libhipblas.so
#2 0x00007ffe8ba50855 in q4_matmul_recons_cuda(ExLlamaTuning*, __half const*, int, Q4Matrix*, __half*, void*, bool) () from /home/dev/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so
#3 0x00007ffe8ba364e8 in q4_matmul(at::Tensor, unsigned long, at::Tensor) () from /home/dev/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so
#4 0x00007ffe8ba4e423 in pybind11::cpp_function::initialize<void (&)(at::Tensor, unsigned long, at::Tensor), void, at::Tensor, unsigned long, at::Tensor, pybind11::name, pybind11::scope, pybind11::sibling, char [10]>(void (&)(at::Tensor, unsigned long, at::Tensor), void ()(at::Tensor, unsigned long, at::Tensor), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [10])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) () from /home/dev/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so
#5 0x00007ffe8ba4aa4d in pybind11::cpp_function::dispatcher(_object
, _object*, _object*) () from /home/dev/.cache/torch_extensions/py310_cpu/exllama_ext/exllama_ext.so
#6 0x00005555556ae10e in ?? ()
#7 0x00005555556a4a7b in _PyObject_MakeTpCall ()
#8 0x000055555569d096 in _PyEval_EvalFrameDefault ()
#9 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#10 0x000055555569ccfa in _PyEval_EvalFrameDefault ()
#11 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#12 0x000055555569745c in _PyEval_EvalFrameDefault ()
#13 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#14 0x000055555569745c in _PyEval_EvalFrameDefault ()
#15 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#16 0x000055555569745c in _PyEval_EvalFrameDefault ()
#17 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#18 0x000055555569745c in _PyEval_EvalFrameDefault ()
#19 0x00005555556bc7f1 in ?? ()
#20 0x000055555569853c in _PyEval_EvalFrameDefault ()
#21 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#22 0x000055555569726d in _PyEval_EvalFrameDefault ()
#23 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#24 0x000055555569726d in _PyEval_EvalFrameDefault ()
#25 0x00005555556ae9fc in _PyFunction_Vectorcall ()
#26 0x000055555569726d in _PyEval_EvalFrameDefault ()
#27 0x00005555556939c6 in ?? ()
#28 0x0000555555789256 in PyEval_EvalCode ()
#29 0x00005555557b4108 in ?? ()
#30 0x00005555557ad9cb in ?? ()
#31 0x00005555557b3e55 in ?? ()
#32 0x00005555557b3338 in _PyRun_SimpleFileObject ()
#33 0x00005555557b2f83 in _PyRun_AnyFileObject ()
#34 0x00005555557a5a5e in Py_RunMain ()
#35 0x000055555577c02d in Py_BytesMain ()
#36 0x00007ffff7c7ed90 in __libc_start_call_main (main=main@entry=0x55555577bff0, argc=argc@entry=6, argv=argv@entry=0x7fffffffe328) at ../sysdeps/nptl/libc_start_call_main.h:58
#37 0x00007ffff7c7ee40 in __libc_start_main_impl (main=0x55555577bff0, argc=6, argv=0x7fffffffe328, init=, fini=, rtld_fini=, stack_end=0x7fffffffe318)
at ../csu/libc-start.c:392
#38 0x000055555577bf25 in _start ()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants