Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support MiniCPM-V-2.5 #7599

Merged
merged 67 commits into from
Aug 9, 2024
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
7a49a6f
init
tc-mb May 23, 2024
c536fa6
rename
tc-mb May 23, 2024
2b91903
add run android for termux in readme
tc-mb May 23, 2024
0480d5f
add android readme
tc-mb May 23, 2024
ec1cea7
add instructions in readme
tc-mb May 23, 2024
a491f45
change name in readme
tc-mb May 23, 2024
7573b63
Update README.md
iceflame89 May 23, 2024
94dcaba
fixed line
harvestingmoon May 23, 2024
b31f51f
Merge pull request #1 from harvestingmoon/minicpm-v2.5
tc-mb May 24, 2024
629420e
add result in readme
tc-mb May 24, 2024
b48708a
random pos_embed
tc-mb May 26, 2024
d9fbc1d
add positions index
tc-mb May 26, 2024
18fe620
change for ollama
tc-mb May 26, 2024
2997a68
change for ollama
tc-mb May 26, 2024
8541e99
better pos_embed in clip
tc-mb May 26, 2024
d8974b8
support ollama
tc-mb May 27, 2024
e73a0c7
updata cmakelist
tc-mb May 28, 2024
6366d62
updata cmakelist
tc-mb May 28, 2024
056d178
rename wrapper
tc-mb May 28, 2024
3c306f1
clear code
tc-mb May 28, 2024
9495504
replace and organize code
tc-mb May 28, 2024
b37ab0b
add link
tc-mb May 28, 2024
8767ce2
Merge branch 'prepare-PR-of-minicpm-v2.5' into prepare-PR
tc-mb May 28, 2024
8bd47ce
Merge pull request #7 from OpenBMB/prepare-PR
tc-mb May 28, 2024
28d4a7f
Merge pull request #8 from OpenBMB/master
tc-mb May 28, 2024
02eb445
sync master
tc-mb May 28, 2024
07f48f9
fix warnings
tc-mb May 28, 2024
c38d152
fix warnings
tc-mb May 28, 2024
88f5e6a
fix bug in bicubic resize when need resize iamge smaller
tc-mb May 30, 2024
a913ca4
receive review comments and modify
tc-mb May 31, 2024
a95a6d9
receive review comments and modify
tc-mb Jun 2, 2024
c390dd4
Merge branch 'ggerganov:master' into prepare-PR-of-minicpm-v2.5
tc-mb Jun 4, 2024
efe4c61
put all code into llava dir
tc-mb Jun 4, 2024
ee5b850
Merge pull request #11 from OpenBMB/pr_add_all_in_llava
tc-mb Jun 4, 2024
77beb4d
Merge branch 'prepare-PR-of-minicpm-v2.5' into master
tc-mb Jun 24, 2024
cb8cfb9
Merge pull request #15 from OpenBMB/master
tc-mb Jun 24, 2024
8f03505
fix quality problem in pr code
tc-mb Jun 25, 2024
e68c8bc
change n_layer
tc-mb Jun 25, 2024
4c67d7c
add space in "-1"
tc-mb Jun 25, 2024
977941d
imitate reshape bug of python code
tc-mb Jul 4, 2024
3e6348b
fix bug in clip
tc-mb Jul 7, 2024
c5b6851
fix issues for merging
tc-mb Jul 17, 2024
5959b14
fix llama-minicpmv-cli in cmake file
tc-mb Jul 19, 2024
292a469
change pr readme
tc-mb Jul 20, 2024
be8b5b2
fix code review
tc-mb Jul 22, 2024
4c75583
remove in line 33 directory in the /cmakelists.txt (not in example, i…
tc-mb Jul 22, 2024
62fa15b
fix cmakefile
tc-mb Jul 23, 2024
dad4abe
add warn
tc-mb Jul 23, 2024
3642be9
fix KEY_HAS_MINICPMV_PROJ
tc-mb Jul 23, 2024
fcde997
remove load_image_size into clip_ctx
tc-mb Jul 23, 2024
6fd0937
remove the extern "C", MINICPMV_API
tc-mb Jul 23, 2024
107e1ed
fix uhd code for review comment
tc-mb Jul 25, 2024
72b9629
delete minicpmv-wrapper in pr
tc-mb Jul 25, 2024
f3d400d
remove uhd_image_embed
tc-mb Jul 26, 2024
65f7455
Modify 2 notes
tc-mb Jul 26, 2024
6e29913
clip : style changes
ggerganov Aug 6, 2024
f33071d
Merge pull request #19 from ggerganov/prepare-PR-of-minicpm-v2.5-gg
tc-mb Aug 6, 2024
f04c6e2
del common.h in clip
tc-mb Aug 6, 2024
5ec4de7
Merge branch 'master' into prepare-PR-of-minicpm-v2.5
tc-mb Aug 6, 2024
5ab9577
fix Type-Check error
tc-mb Aug 7, 2024
28230d0
fix Type-Check error
tc-mb Aug 7, 2024
e3eff2a
fix Type-Check error
tc-mb Aug 7, 2024
0eb0bfa
fix Type-Check error
tc-mb Aug 7, 2024
712fd7c
fix makefile error
tc-mb Aug 7, 2024
616f3ea
fix ubuntu-make error
tc-mb Aug 7, 2024
2d14c81
try fix clip
tc-mb Aug 8, 2024
069631e
try fix 1
tc-mb Aug 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ models-mnt
/libllama.so
/llama-bench
/llava-cli
/minicpmv-cli
/openbmb
/lookahead
/lookup
/lookup-create
Expand Down
9 changes: 8 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Define the default target now so that it is always the first target
BUILD_TARGETS = \
main quantize quantize-stats perplexity imatrix embedding vdot q8dot train-text-from-scratch convert-llama2c-to-ggml \
simple batched batched-bench save-load-state server gguf gguf-split eval-callback llama-bench libllava.a llava-cli baby-llama beam-search \
simple batched batched-bench save-load-state server gguf gguf-split eval-callback llama-bench libllava.a llava-cli minicpmv-cli baby-llama beam-search \
retrieval speculative infill tokenize benchmark-matmult parallel finetune export-lora lookahead lookup passkey gritlm tests/test-c.o

# Binaries only useful for tests
Expand Down Expand Up @@ -862,6 +862,13 @@ llava-cli: examples/llava/llava-cli.cpp examples/llava/clip.h examples/llava/cli
$(CXX) $(CXXFLAGS) -c examples/llava/llava.cpp -o $(call GET_OBJ_FILE, examples/llava/llava.cpp)
$(CXX) $(CXXFLAGS) $(filter-out %.h $< examples/llava/clip.cpp examples/llava/llava.cpp,$^) $(call GET_OBJ_FILE, $<) $(call GET_OBJ_FILE, examples/llava/clip.cpp) $(call GET_OBJ_FILE, examples/llava/llava.cpp) -o $@ $(LDFLAGS)

minicpmv-cli: examples/minicpmv/minicpmv-cli.cpp examples/minicpmv/clip.h examples/minicpmv/clip.cpp examples/minicpmv/minicpmv.h examples/minicpmv/minicpmv.cpp examples/minicpmv/minicpmv_wrapper.h examples/minicpmv/minicpmv_wrapper.cpp ggml.o llama.o $(COMMON_DEPS) $(OBJS)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) -c examples/minicpmv/clip.cpp -o $(call GET_OBJ_FILE, examples/minicpmv/clip.cpp) -Wno-cast-qual
$(CXX) $(CXXFLAGS) -c examples/minicpmv/minicpmv.cpp -o $(call GET_OBJ_FILE, examples/minicpmv/minicpmv.cpp)
$(CXX) $(CXXFLAGS) -c examples/minicpmv/minicpmv_wrapper.cpp -o $(call GET_OBJ_FILE, examples/minicpmv/minicpmv_wrapper.cpp)
$(CXX) $(CXXFLAGS) $(filter-out %.h $< examples/minicpmv/clip.cpp examples/minicpmv/minicpmv.cpp examples/minicpmv/minicpmv_wrapper.cpp,$^) $(call GET_OBJ_FILE, $<) $(call GET_OBJ_FILE, examples/minicpmv/clip.cpp) $(call GET_OBJ_FILE, examples/minicpmv/minicpmv.cpp) $(call GET_OBJ_FILE, examples/minicpmv/minicpmv_wrapper.cpp) -o $@ $(LDFLAGS)

baby-llama: examples/baby-llama/baby-llama.cpp ggml.o llama.o $(COMMON_DEPS) train.o $(OBJS)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
Expand Down
38 changes: 0 additions & 38 deletions convert-hf-to-gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -675,44 +675,6 @@ def set_gguf_parameters(self):
self.gguf_writer.add_parallel_residual(self.hparams.get("use_parallel_residual", True))
self.gguf_writer.add_layer_norm_eps(self.hparams["layer_norm_eps"])

def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:
del bid # unused

n_head = self.hparams.get("n_head", self.hparams.get("num_attention_heads"))
n_embed = self.hparams.get("hidden_size", self.hparams.get("n_embed"))

tensors: list[tuple[str, Tensor]] = []

if re.match(r"gpt_neox\.layers\.\d+\.attention\.query_key_value\.weight", name):
# Map bloom-style qkv_linear to gpt-style qkv_linear
# bloom: https://github.com/huggingface/transformers/blob/main/src/transformers/models/bloom/modeling_bloom.py#L238-L252 # noqa
# gpt-2: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py#L312 # noqa
qkv_weights = data_torch.reshape((n_head, 3, n_embed // n_head, n_embed))
data_torch = torch.cat(
(
qkv_weights[:, 0, :, :].reshape((-1, n_embed)),
qkv_weights[:, 1, :, :].reshape((-1, n_embed)),
qkv_weights[:, 2, :, :].reshape((-1, n_embed)),
),
dim=0,
)
logger.info("re-format attention.linear_qkv.weight")
elif re.match(r"gpt_neox\.layers\.\d+\.attention\.query_key_value\.bias", name):
qkv_bias = data_torch.reshape((n_head, 3, n_embed // n_head))
data_torch = torch.cat(
(
qkv_bias[:, 0, :].reshape((n_embed,)),
qkv_bias[:, 1, :].reshape((n_embed,)),
qkv_bias[:, 2, :].reshape((n_embed,)),
),
dim=0,
)
logger.info("re-format attention.linear_qkv.bias")

tensors.append((self.map_tensor_name(name), data_torch))

return tensors

tc-mb marked this conversation as resolved.
Show resolved Hide resolved

@Model.register("BloomForCausalLM")
class BloomModel(Model):
Expand Down
1 change: 1 addition & 0 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ else()
add_subdirectory(infill)
add_subdirectory(llama-bench)
add_subdirectory(llava)
add_subdirectory(minicpmv)
tc-mb marked this conversation as resolved.
Show resolved Hide resolved
tc-mb marked this conversation as resolved.
Show resolved Hide resolved
if (LLAMA_SYCL)
add_subdirectory(sycl)
endif()
Expand Down
42 changes: 42 additions & 0 deletions examples/minicpmv/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
add_library(minicpmv OBJECT
minicpmv.cpp
minicpmv.h
clip.cpp
clip.h
)

target_link_libraries(minicpmv PRIVATE ggml llama ${CMAKE_THREAD_LIBS_INIT})

target_include_directories(minicpmv PUBLIC .)
target_include_directories(minicpmv PUBLIC ../..)
target_include_directories(minicpmv PUBLIC ../../common)

target_compile_features(minicpmv PRIVATE cxx_std_11)

add_library(minicpmv_static STATIC $<TARGET_OBJECTS:minicpmv>)
if (BUILD_SHARED_LIBS)
set_target_properties(minicpmv PROPERTIES POSITION_INDEPENDENT_CODE ON)
target_compile_definitions(minicpmv PRIVATE LLAMA_SHARED LLAMA_BUILD)
add_library(minicpmv_shared SHARED $<TARGET_OBJECTS:minicpmv>)
target_link_libraries(minicpmv_shared PRIVATE ggml llama ${CMAKE_THREAD_LIBS_INIT})
install(TARGETS minicpmv_shared LIBRARY)
endif()

if (NOT MSVC)
target_compile_options(minicpmv PRIVATE -Wno-cast-qual) # stb_image.h
endif()

if(TARGET BUILD_INFO)
add_dependencies(minicpmv BUILD_INFO)
endif()

set(TARGET minicpmv-cli)
add_executable(minicpmv-cli minicpmv-cli.cpp)
install(TARGETS minicpmv-cli RUNTIME)
target_link_libraries(minicpmv-cli PRIVATE common minicpmv_wrapper minicpmv ${CMAKE_THREAD_LIBS_INIT})
target_compile_features(minicpmv PRIVATE cxx_std_11)

add_library(minicpmv_wrapper OBJECT
minicpmv_wrapper.cpp
)
target_link_libraries(minicpmv_wrapper PRIVATE minicpmv ${CMAKE_THREAD_LIBS_INIT})
104 changes: 104 additions & 0 deletions examples/minicpmv/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
## MiniCPM-Llama3-V 2.5

### Usage

Download [MiniCPM-Llama3-V-2_5](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5) PyTorch model from huggingface to "MiniCPM-Llama3-V-2_5" folder.

Clone llama.cpp and checkout to branch `minicpm-v2.5`:
```bash
git clone -b minicpm-v2.5 https://github.com/OpenBMB/llama.cpp.git
cd llama.cpp
```

Convert PyTorch model to gguf files (You can also download the converted [gguf](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf) by us)

```bash
python ./examples/minicpmv/minicpmv-surgery.py -m ../MiniCPM-Llama3-V-2_5
python ./examples/minicpmv/minicpmv-convert-image-encoder-to-gguf.py -m ../MiniCPM-Llama3-V-2_5 --minicpmv-projector ../MiniCPM-Llama3-V-2_5/minicpmv.projector --output-dir ../MiniCPM-Llama3-V-2_5/ --image-mean 0.5 0.5 0.5 --image-std 0.5 0.5 0.5
python ./convert.py ../MiniCPM-Llama3-V-2_5/model --outtype f16 --vocab-type bpe
tc-mb marked this conversation as resolved.
Show resolved Hide resolved

# quantize int4 version
./quantize ../MiniCPM-Llama3-V-2_5/model/model-8B-F16.gguf ../MiniCPM-Llama3-V-2_5/model/ggml-model-Q4_K_M.gguf Q4_K_M
```

Build for Linux or Mac

```bash
make
make minicpmv-cli
```

Inference on Linux or Mac
```
# run f16 version
./minicpmv-cli -m ../MiniCPM-Llama3-V-2_5/model/model-8B-F16.gguf --mmproj ../MiniCPM-Llama3-V-2_5/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"

# run quantized int4 version
./minicpmv-cli -m ../MiniCPM-Llama3-V-2_5/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-Llama3-V-2_5/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"

# or run in interactive mode
./minicpmv-cli -m ../MiniCPM-Llama3-V-2_5/model/ggml-model-Q4_K_M.gguf --mmproj ../MiniCPM-Llama3-V-2_5/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -i
```

### Android

#### Build on Android device using Termux
We found that build on Android device would bring better runtime performance, so we recommend to build on device.

[Termux](https://github.com/termux/termux-app#installation) is a terminal app on Android device (no root required).

Install tools in Termux:
```
apt update && apt upgrade -y
apt install git make cmake
```

It's recommended to move your model inside the `~/` directory for best performance:
```
cd storage/downloads
mv model.gguf ~/
```

#### Building the Project using Android NDK
Obtain the [Android NDK](https://developer.android.com/ndk) and then build with CMake.

Execute the following commands on your computer to avoid downloading the NDK to your mobile. Alternatively, you can also do this in Termux:

```bash
mkdir build-android
cd build-android
export NDK=/your_ndk_path
cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-23 -DCMAKE_C_FLAGS=-march=armv8.4a+dotprod ..
make
```

Install [termux](https://github.com/termux/termux-app#installation) on your device and run `termux-setup-storage` to get access to your SD card (if Android 11+ then run the command twice).

Finally, copy these built `llama` binaries and the model file to your device storage. Because the file permissions in the Android sdcard cannot be changed, you can copy the executable files to the `/data/data/com.termux/files/home/bin` path, and then execute the following commands in Termux to add executable permission:

(Assumed that you have pushed the built executable files to the /sdcard/llama.cpp/bin path using `adb push`)
```
$cp -r /sdcard/llama.cpp/bin /data/data/com.termux/files/home/
$cd /data/data/com.termux/files/home/bin
$chmod +x ./*
```

Download models and push them to `/sdcard/llama.cpp/`, then move it to `/data/data/com.termux/files/home/model/`

```
$mv /sdcard/llama.cpp/ggml-model-Q4_K_M.gguf /data/data/com.termux/files/home/model/
$mv /sdcard/llama.cpp/mmproj-model-f16.gguf /data/data/com.termux/files/home/model/
```

Now, you can start chatting:
```
$cd /data/data/com.termux/files/home/bin
$./minicpmv-cli -m ../model/ggml-model-Q4_K_M.gguf --mmproj ../model/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image xx.jpg -p "What is in the image?"
```

### result
We use this command on Xiaomi 14 Pro, and the measured results.
```
$./minicpmv-cli -m ../model/ggml-model-Q4_K_M.gguf --mmproj ../model/mmproj-model-f16.gguf -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 -t 6 --image xx.jpg -p "What is in the image?"
```
![alt text](assets/xiaomi14pro_test.jpeg)
Binary file added examples/minicpmv/assets/xiaomi14pro_test.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading