Skip to content

feat: support internvl #9403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

qlylangyu
Copy link

@qlylangyu qlylangyu commented Sep 10, 2024

Sorry, something went wrong.

@github-actions github-actions bot added examples python python script changes labels Sep 10, 2024
@ngxson
Copy link
Collaborator

ngxson commented Sep 10, 2024

I'm not very familiar with vision models, but I wonder if there is a particular reason to duplicate clip.cpp, instead of reusing llava/clip.cpp

@James4Ever0
Copy link

James4Ever0 commented Jan 7, 2025

Your code is incomplete and unable to compile. Do you have updates since the last commit?

Procedure:

cd /tmp
git clone https://github.com/qlylangyu/llama.cpp llama.cpp-internvl
cd llama.cpp
git checkout internvl
# edit the file examples/CMakeLists.txt and add the line "add_subdirectory(internvl)"
mkdir build
cd build 
cmake ..
make llama-internvl-cli

Error log:

[ 93%] Building CXX object examples/internvl/CMakeFiles/llama-internvl-cli.dir/internvl-cli.cpp.o
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp: In function ‘const char* sample(llama_sampling_context*, llama_context*, int*)’:
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:52:28: error: ‘llama_sampling_sample’ was not declared in this scope; did you mean ‘llama_sampler_sample’?
   52 |     const llama_token id = llama_sampling_sample(ctx_sampling, ctx_llama, NULL);
      |                            ^~~~~~~~~~~~~~~~~~~~~
      |                            llama_sampler_sample
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:53:5: error: ‘llama_sampling_accept’ was not declared in this scope; did you mean ‘llama_sampler_accept’?
   53 |     llama_sampling_accept(ctx_sampling, ctx_llama, id, true);
      |     ^~~~~~~~~~~~~~~~~~~~~
      |     llama_sampler_accept
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp: In function ‘void print_usage(int, char**, const gpt_params&)’:
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:122:5: error: ‘gpt_params_print_usage’ was not declared in this scope
  122 |     gpt_params_print_usage(argc, argv, params);
      |     ^~~~~~~~~~~~~~~~~~~~~~
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp: In function ‘internvl_image_embed* load_image(internvl_context*, gpt_params*, const string&)’:
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:138:94: error: ‘struct gpt_params’ has no member named ‘n_threads’
  138 |         embed = internvl_image_embed_make_with_prompt_base64(ctx_internvl->ctx_clip, params->n_threads, prompt);
      |                                                                                              ^~~~~~~~~
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:145:89: error: ‘struct gpt_params’ has no member named ‘n_threads’
  145 |         embed = internvl_image_embed_make_with_filename(ctx_internvl->ctx_clip, params->n_threads, fname.c_str());
      |                                                                                         ^~~~~~~~~
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp: In function ‘void process_prompt(internvl_context*, internvl_image_embed*, gpt_params*, const string&)’:
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:161:26: error: ‘llama_should_add_bos_token’ was not declared in this scope; did you mean ‘llama_add_bos_token’?
  161 |     const bool add_bos = llama_should_add_bos_token(llama_get_model(ctx_internvl->ctx_llama));
      |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
      |                          llama_add_bos_token
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:185:52: error: ‘llama_sampling_init’ was not declared in this scope; did you mean ‘llama_sampling_context’?
  185 |     struct llama_sampling_context * ctx_sampling = llama_sampling_init(params->sparams);
      |                                                    ^~~~~~~~~~~~~~~~~~~
      |                                                    llama_sampling_context
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:205:5: error: ‘llama_sampling_free’ was not declared in this scope; did you mean ‘llama_sampler_free’?
  205 |     llama_sampling_free(ctx_sampling);
      |     ^~~~~~~~~~~~~~~~~~~
      |     llama_sampler_free
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp: In function ‘int main(int, char**)’:
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:279:10: error: ‘gpt_params_parse’ was not declared in this scope; did you mean ‘gpt_params’?
  279 |     if (!gpt_params_parse(argc, argv, params)) {
      |          ^~~~~~~~~~~~~~~~
      |          gpt_params
/tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:344:9: error: ‘llama_print_timings’ was not declared in this scope; did you mean ‘llama_print_system_info’?
  344 |         llama_print_timings(ctx_internvl->ctx_llama);
      |         ^~~~~~~~~~~~~~~~~~~
      |         llama_print_system_info
make[3]: *** [examples/internvl/CMakeFiles/llama-internvl-cli.dir/build.make:76: examples/internvl/CMakeFiles/llama-internvl-cli.dir/internvl-cli.cpp.o] Error 1
make[2]: *** [CMakeFiles/Makefile2:3540: examples/internvl/CMakeFiles/llama-internvl-cli.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:3547: examples/internvl/CMakeFiles/llama-internvl-cli.dir/rule] Error 2
make: *** [Makefile:1414: llama-internvl-cli] Error 2

@James4Ever0
Copy link

James4Ever0 commented Jan 8, 2025

It looks like your code is very similar to the files under examples/llava. Unless there is a specific reason to copy large amount of code from there, you shall import or rewrite it first.

Anyway, I will check the overall model architecture, and make a working version instead of this.


I have tried to load the model using llama-llava-cli but failed.

./llama-llava-cli \
    -m ./InternVL-gguf/internlm2-1.8B-chat-q4_k.gguf \
    --mmproj ./InternVL-gguf/InternViT-300M-448px-f16.gguf \
    -t 4 \
    --image ./example.jpeg \
    -p "<image>\nWhat is in this image?" 

Output:

key clip.has_text_encoder not found in file
terminate called after throwing an instance of 'std::runtime_error'
  what():  Missing required key: clip.has_text_encoder

@James4Ever0
Copy link

Have made every attempt for your code to work. But I have this core dump anyway.

internvl_image_embed_make_with_filename: image loaded in     0.03 ms

internvl_image_embed_make_with_bytes: image encoded in     1.35 ms

encode_image_with_clip: image process in    11.08 ms

encode_image_with_clip: image embedding created: 256 tokens

encode_image_with_clip: image preprocessed in    11.13 ms by CLIP (    0.04 ms per image patch)

encode_image_with_clip: image encoded in 85708.52 ms by CLIP (  334.80 ms per image patch)

internvl_image_embed_make_with_filename: image encoded in 85721.07 ms

Segmentation fault (core dumped)

@James4Ever0
Copy link

Using gdb backtrace gets the following result:

#0  0x00007ffff7d0975e in llama_decode_internal (lctx=..., batch_all=...)
    at /tmp/llama.cpp-internvl/src/llama.cpp:16080
#1  0x00007ffff7d17f15 in llama_decode (ctx=0x5555557cc960, batch=...)
    at /tmp/llama.cpp-internvl/src/llama.cpp:20053
#2  0x00005555555c2072 in internvl_eval_image_embed (ctx_llama=0x5555557cc960, image_embed=0x555555837220, 
    n_batch=2048, n_past=0x7fffffffcdf4)
    at /tmp/llama.cpp-internvl/examples/internvl/internvl.cpp:268
#3  0x00005555555b85bd in process_prompt (ctx_internvl=0x555555785c70, image_embed=0x555555837220, 
    params=0x7fffffffcfd0, prompt="<image>\nWhat is in this image?")
    at /tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:194
#4  0x00005555555b9375 in main (argc=13, argv=0x7fffffffe2c8)
    at /tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:362

@James4Ever0
Copy link

James4Ever0 commented Jan 10, 2025

After applying that patch, I find the sampling process being wrong.

Under the file src/llama-sampling.cpp, The value of cur_p.selected is -1 after llama_sampler_apply(smpl, &cur_p).

Backtrace:

encode_image_with_clip: image embedding created: 256 tokens

encode_image_with_clip: image preprocessed in    41.33 ms by CLIP (    0.16 ms per image patch)

encode_image_with_clip: image encoded in 273212.53 ms by CLIP ( 1067.24 ms per image patch)

internvl_image_embed_make_with_filename: image encoded in 273258.34 ms

ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: reallocating buffers automatically
ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: reallocating buffers automatically

/tmp/llama.cpp-internvl/src/llama-sampling.cpp:239: GGML_ASSERT(cur_p.selected >= 0 && cur_p.selected < (int32_t) cur_p.size) failed
[Detaching after fork from child process 2352572]

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff724526e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff72288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff78ca6a4 in ggml_abort (
    file=0x7ffff7e85870 "/tmp/llama.cpp-internvl/src/llama-sampling.cpp", line=239, 
    fmt=0x7ffff7e85854 "GGML_ASSERT(%s) failed")
    at /tmp/llama.cpp-internvl/ggml/src/ggml.c:284
#6  0x00007ffff7debcbb in llama_sampler_sample (smpl=0x555555806560, ctx=0x5555557cc960, idx=-1)
    at /tmp/llama.cpp-internvl/src/llama-sampling.cpp:239
#7  0x00005555555b77c1 in sample (smpl=0x555555806560, ctx=0x5555557cc960, n_past=0x7fffffffcdf4)
    at /tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:56
#8  0x00005555555b86a7 in process_prompt (ctx_internvl=0x555555785c70, image_embed=0x555555837220, 
    params=0x7fffffffcfd0, prompt="<image>\nWhat is in this image?")
    at /tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:209
#9  0x00005555555b9375 in main (argc=13, argv=0x7fffffffe2c8)
    at /tmp/llama.cpp-internvl/examples/internvl/internvl-cli.cpp:362

@James4Ever0
Copy link

James4Ever0 commented Jan 14, 2025

By cross-referencing the file llava-cli.cpp, it has finally returned something reasonable. Anyway I will post the refactored code patch shorty, after all these shenanigans.


Creating such a patch for original codebase is not easy. There are significant differences. I decide to release my changes of the forked version, and also generated diff files for further work.

Now you can view the release here.

@James4Ever0
Copy link

@ggerganov

@ngxson
Copy link
Collaborator

ngxson commented Jan 28, 2025

Will have a look at later stage on my refactoring: #11292

@James4Ever0
Copy link

A C++ formatter like clang-format, astyle or Uncrustify would be required for code refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants