feat: support internvl #9403

qlylangyu · 2024-09-10T07:15:24Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ngxson · 2024-09-10T13:40:32Z

I'm not very familiar with vision models, but I wonder if there is a particular reason to duplicate clip.cpp, instead of reusing llava/clip.cpp

James4Ever0 · 2025-01-07T10:14:28Z

Your code is incomplete and unable to compile. Do you have updates since the last commit?

Procedure:

cd /tmp
git clone https://github.com/qlylangyu/llama.cpp llama.cpp-internvl
cd llama.cpp
git checkout internvl
# edit the file examples/CMakeLists.txt and add the line "add_subdirectory(internvl)"
mkdir build
cd build 
cmake ..
make llama-internvl-cli

Error log:

[ 93%] Building CXX object examples/internvl/CMakeFiles/llama-internvl-cli.dir/internvl-cli.cpp.o
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp: In function ‘const char* sample(llama_sampling_context*, llama_context*, int*)’:
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:52:28: error: ‘llama_sampling_sample’ was not declared in this scope; did you mean ‘llama_sampler_sample’?
   52 |     const llama_token id = llama_sampling_sample(ctx_sampling, ctx_llama, NULL);
      |                            ^~~~~~~~~~~~~~~~~~~~~
      |                            llama_sampler_sample
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:53:5: error: ‘llama_sampling_accept’ was not declared in this scope; did you mean ‘llama_sampler_accept’?
   53 |     llama_sampling_accept(ctx_sampling, ctx_llama, id, true);
      |     ^~~~~~~~~~~~~~~~~~~~~
      |     llama_sampler_accept
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp: In function ‘void print_usage(int, char**, const gpt_params&)’:
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:122:5: error: ‘gpt_params_print_usage’ was not declared in this scope
  122 |     gpt_params_print_usage(argc, argv, params);
      |     ^~~~~~~~~~~~~~~~~~~~~~
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp: In function ‘internvl_image_embed* load_image(internvl_context*, gpt_params*, const string&)’:
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:138:94: error: ‘struct gpt_params’ has no member named ‘n_threads’
  138 |         embed = internvl_image_embed_make_with_prompt_base64(ctx_internvl->ctx_clip, params->n_threads, prompt);
      |                                                                                              ^~~~~~~~~
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:145:89: error: ‘struct gpt_params’ has no member named ‘n_threads’
  145 |         embed = internvl_image_embed_make_with_filename(ctx_internvl->ctx_clip, params->n_threads, fname.c_str());
      |                                                                                         ^~~~~~~~~
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp: In function ‘void process_prompt(internvl_context*, internvl_image_embed*, gpt_params*, const string&)’:
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:161:26: error: ‘llama_should_add_bos_token’ was not declared in this scope; did you mean ‘llama_add_bos_token’?
  161 |     const bool add_bos = llama_should_add_bos_token(llama_get_model(ctx_internvl->ctx_llama));
      |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      |                          llama_add_bos_token
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:185:52: error: ‘llama_sampling_init’ was not declared in this scope; did you mean ‘llama_sampling_context’?
  185 |     struct llama_sampling_context * ctx_sampling = llama_sampling_init(params->sparams);
      |                                                    ^~~~~~~~~~~~~~~~~~~
      |                                                    llama_sampling_context
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:205:5: error: ‘llama_sampling_free’ was not declared in this scope; did you mean ‘llama_sampler_free’?
  205 |     llama_sampling_free(ctx_sampling);
      |     ^~~~~~~~~~~~~~~~~~~
      |     llama_sampler_free
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp: In function ‘int main(int, char**)’:
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:279:10: error: ‘gpt_params_parse’ was not declared in this scope; did you mean ‘gpt_params’?
  279 |     if (!gpt_params_parse(argc, argv, params)) {
      |          ^~~~~~~~~~~~~~~~
      |          gpt_params
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:344:9: error: ‘llama_print_timings’ was not declared in this scope; did you mean ‘llama_print_system_info’?
  344 |         llama_print_timings(ctx_internvl->ctx_llama);
      |         ^~~~~~~~~~~~~~~~~~~
      |         llama_print_system_info
make[3]: *** [examples/internvl/CMakeFiles/llama-internvl-cli.dir/build.make:76: examples/internvl/CMakeFiles/llama-internvl-cli.dir/internvl-cli.cpp.o] Error 1
make[2]: *** [CMakeFiles/Makefile2:3540: examples/internvl/CMakeFiles/llama-internvl-cli.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:3547: examples/internvl/CMakeFiles/llama-internvl-cli.dir/rule] Error 2
make: *** [Makefile:1414: llama-internvl-cli] Error 2

James4Ever0 · 2025-01-08T02:43:20Z

It looks like your code is very similar to the files under examples/llava. Unless there is a specific reason to copy large amount of code from there, you shall import or rewrite it first.

Anyway, I will check the overall model architecture, and make a working version instead of this.

I have tried to load the model using llama-llava-cli but failed.

./llama-llava-cli \
    -m ./InternVL-gguf/internlm2-1.8B-chat-q4_k.gguf \
    --mmproj ./InternVL-gguf/InternViT-300M-448px-f16.gguf \
    -t 4 \
    --image ./example.jpeg \
    -p "<image>\nWhat is in this image?"

Output:

key clip.has_text_encoder not found in file
terminate called after throwing an instance of 'std::runtime_error'
  what():  Missing required key: clip.has_text_encoder

James4Ever0 · 2025-01-08T10:09:51Z

Have made every attempt for your code to work. But I have this core dump anyway.

internvl_image_embed_make_with_filename: image loaded in     0.03 ms

internvl_image_embed_make_with_bytes: image encoded in     1.35 ms

encode_image_with_clip: image process in    11.08 ms

encode_image_with_clip: image embedding created: 256 tokens

encode_image_with_clip: image preprocessed in    11.13 ms by CLIP (    0.04 ms per image patch)

encode_image_with_clip: image encoded in 85708.52 ms by CLIP (  334.80 ms per image patch)

internvl_image_embed_make_with_filename: image encoded in 85721.07 ms

Segmentation fault (core dumped)

James4Ever0 · 2025-01-09T04:50:52Z

Using gdb backtrace gets the following result:

#0  0x00007ffff7d0975e in llama_decode_internal (lctx=..., batch_all=...)
    at /tmp/llama.cpp-internvl/src/llama.cpp:16080
#1  0x00007ffff7d17f15 in llama_decode (ctx=0x5555557cc960, batch=...)
    at /tmp/llama.cpp-internvl/src/llama.cpp:20053
#2  0x00005555555c2072 in internvl_eval_image_embed (ctx_llama=0x5555557cc960, image_embed=0x555555837220, 
    n_batch=2048, n_past=0x7fffffffcdf4)
    at /tmp/llama.cpp-internvl/examples/internvl/internvl.cpp:268
#3  0x00005555555b85bd in process_prompt (ctx_internvl=0x555555785c70, image_embed=0x555555837220, 
    params=0x7fffffffcfd0, prompt="<image>\nWhat is in this image?")
    at /tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:194
#4  0x00005555555b9375 in main (argc=13, argv=0x7fffffffe2c8)
    at /tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:362

James4Ever0 · 2025-01-10T07:42:43Z

After applying that patch, I find the sampling process being wrong.

Under the file src/llama-sampling.cpp, The value of cur_p.selected is -1 after llama_sampler_apply(smpl, &cur_p).

Backtrace:

encode_image_with_clip: image embedding created: 256 tokens

encode_image_with_clip: image preprocessed in    41.33 ms by CLIP (    0.16 ms per image patch)

encode_image_with_clip: image encoded in 273212.53 ms by CLIP ( 1067.24 ms per image patch)

internvl_image_embed_make_with_filename: image encoded in 273258.34 ms

ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: reallocating buffers automatically
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: reallocating buffers automatically

/tmp/llama.cpp-internvl/src/llama-sampling.cpp:239: GGML_ASSERT(cur_p.selected >= 0 && cur_p.selected < (int32_t) cur_p.size) failed
[Detaching after fork from child process 2352572]

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff724526e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff72288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff78ca6a4 in ggml_abort (
    file=0x7ffff7e85870 "/tmp/llama.cpp-internvl/src/llama-sampling.cpp", line=239, 
    fmt=0x7ffff7e85854 "GGML_ASSERT(%s) failed")
    at /tmp/llama.cpp-internvl/ggml/src/ggml.c:284
#6  0x00007ffff7debcbb in llama_sampler_sample (smpl=0x555555806560, ctx=0x5555557cc960, idx=-1)
    at /tmp/llama.cpp-internvl/src/llama-sampling.cpp:239
#7  0x00005555555b77c1 in sample (smpl=0x555555806560, ctx=0x5555557cc960, n_past=0x7fffffffcdf4)
    at /tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:56
#8  0x00005555555b86a7 in process_prompt (ctx_internvl=0x555555785c70, image_embed=0x555555837220, 
    params=0x7fffffffcfd0, prompt="<image>\nWhat is in this image?")
    at /tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:209
#9  0x00005555555b9375 in main (argc=13, argv=0x7fffffffe2c8)
    at /tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:362

James4Ever0 · 2025-01-14T04:35:04Z

By cross-referencing the file llava-cli.cpp, it has finally returned something reasonable. Anyway I will post the refactored code patch shorty, after all these shenanigans.

Creating such a patch for original codebase is not easy. There are significant differences. I decide to release my changes of the forked version, and also generated diff files for further work.

Now you can view the release here.

James4Ever0 · 2025-01-28T14:11:24Z

@ggerganov

ngxson · 2025-01-28T18:54:35Z

Will have a look at later stage on my refactoring: #11292

James4Ever0 · 2025-02-22T13:41:53Z

A C++ formatter like clang-format, astyle or Uncrustify would be required for code refactoring.

feat: support internvl

Loading
Loading status checks…

1e8646b

github-actions bot added examples python labels Sep 10, 2024

cjsdurj mentioned this pull request Nov 11, 2024

performance problem about internvl image embedding using ggml.dll intel/ipex-llm#12376

Open

James4Ever0 mentioned this pull request Jan 6, 2025

[Feature] Implement InternVL to llama.cpp OpenGVLab/InternVL#522

Open

James4Ever0 mentioned this pull request Jan 9, 2025

Bug: llava.cpp Segmentation fault (core dumped) starting in faf69d4237c9ae4d7f572b4674d1002463e8acd3 #9436

Closed

James4Ever0 mentioned this pull request Jan 14, 2025

Support for InternVL #6803

Closed

filipemansano approved these changes Jan 28, 2025

View reviewed changes

James4Ever0 mentioned this pull request Feb 22, 2025

Task request for SWELancer-Benchmark openai/SWELancer-Benchmark#35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support internvl #9403

feat: support internvl #9403

qlylangyu commented Sep 10, 2024 •

edited

Loading

ngxson commented Sep 10, 2024

James4Ever0 commented Jan 7, 2025 •

edited

Loading

James4Ever0 commented Jan 8, 2025 •

edited

Loading

James4Ever0 commented Jan 8, 2025

James4Ever0 commented Jan 9, 2025

James4Ever0 commented Jan 10, 2025 •

edited

Loading

James4Ever0 commented Jan 14, 2025 •

edited

Loading

James4Ever0 commented Jan 28, 2025

ngxson commented Jan 28, 2025

James4Ever0 commented Feb 22, 2025

feat: support internvl #9403

Are you sure you want to change the base?

feat: support internvl #9403

Conversation

qlylangyu commented Sep 10, 2024 • edited Loading

ngxson commented Sep 10, 2024

James4Ever0 commented Jan 7, 2025 • edited Loading

James4Ever0 commented Jan 8, 2025 • edited Loading

James4Ever0 commented Jan 8, 2025

James4Ever0 commented Jan 9, 2025

James4Ever0 commented Jan 10, 2025 • edited Loading

James4Ever0 commented Jan 14, 2025 • edited Loading

James4Ever0 commented Jan 28, 2025

ngxson commented Jan 28, 2025

James4Ever0 commented Feb 22, 2025

qlylangyu commented Sep 10, 2024 •

edited

Loading

James4Ever0 commented Jan 7, 2025 •

edited

Loading

James4Ever0 commented Jan 8, 2025 •

edited

Loading

James4Ever0 commented Jan 10, 2025 •

edited

Loading

James4Ever0 commented Jan 14, 2025 •

edited

Loading