Skip to content

Commit c12caff

Browse files
HimariOggerganov
authored andcommitted
llama : add Qwen2VL support + multimodal RoPE (ggml-org#10361)
* Barebone Qwen2VL LLM convertor * Add Qwen2VL cli entrypoint * [WIP] add qwen2vl arch * Verify m-rope output * Add vl-rope/2d-rope support for qwen2vl ViT * update qwen2vl cli tool * update 5D tensor op workaround * [WIP] qwen2vl vision model * make batch and clip utils compatible with qwen2vl * [WIP] create inference workflow, gguf convert script but fix * correcting vision-rope behavior, add the missing last layer back to ViT * add arg parser to qwen2vl_surgery * replace variable size array with vector * cuda-gdb cmake preset * add fp32 mrope, vision rope kernel * add fp16 support for qwen2vl and m-rope * add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION` * fix rope op mode switching, out dated func args * update `llama_hparams` * update to keep up stream changes * resolve linter, test errors * add makefile entry, update speical image padding token * add mrope unit test, fix few compiler warnings * rename `mrope` related function, params * minor updates on debug util, bug fixs * add `m-rope` testcase to `test-backend-ops` * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix traililng whitespce * store `llama_hparams.rope_sections` with fixed size array * update position id tensor size check in GGML_OP_ROPE * minor updates * update `ggml_backend_*_supports_op` of unsupported backends * remote old `rope_section` compare operator --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
1 parent ea373cf commit c12caff

24 files changed

+1873
-114
lines changed

Makefile

+9
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ BUILD_TARGETS = \
2222
llama-infill \
2323
llama-llava-cli \
2424
llama-minicpmv-cli\
25+
llama-qwen2vl-cli\
2526
llama-lookahead \
2627
llama-lookup \
2728
llama-lookup-create \
@@ -1404,6 +1405,14 @@ llama-minicpmv-cli: examples/llava/minicpmv-cli.cpp \
14041405
$(OBJ_ALL)
14051406
$(CXX) $(CXXFLAGS) $< $(filter-out %.h $<,$^) -o $@ $(LDFLAGS) -Wno-cast-qual
14061407

1408+
llama-qwen2vl-cli: examples/llava/qwen2vl-cli.cpp \
1409+
examples/llava/llava.cpp \
1410+
examples/llava/llava.h \
1411+
examples/llava/clip.cpp \
1412+
examples/llava/clip.h \
1413+
$(OBJ_ALL)
1414+
$(CXX) $(CXXFLAGS) $< $(filter-out %.h $<,$^) -o $@ $(LDFLAGS) -Wno-cast-qual
1415+
14071416
ifeq ($(UNAME_S),Darwin)
14081417
swift: examples/batched.swift
14091418
(cd examples/batched.swift; make build)

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,7 @@ Instructions for adding support for new models: [HOWTO-add-model.md](docs/develo
144144
- [x] [Mini CPM](https://huggingface.co/models?search=MiniCPM)
145145
- [x] [Moondream](https://huggingface.co/vikhyatk/moondream2)
146146
- [x] [Bunny](https://github.com/BAAI-DCAI/Bunny)
147+
- [x] [Qwen2-VL](https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d)
147148

148149
</details>
149150

convert_hf_to_gguf.py

+23
Original file line numberDiff line numberDiff line change
@@ -2001,6 +2001,29 @@ def set_gguf_parameters(self):
20012001
self.gguf_writer.add_rope_scaling_orig_ctx_len(self.hparams["rope_scaling"]["original_max_position_embeddings"])
20022002

20032003

2004+
@Model.register("Qwen2VLForConditionalGeneration")
2005+
class Qwen2VLModel(Model):
2006+
model_arch = gguf.MODEL_ARCH.QWEN2VL
2007+
2008+
def set_gguf_parameters(self):
2009+
super().set_gguf_parameters()
2010+
mrope_section = self.hparams["rope_scaling"]["mrope_section"]
2011+
mrope_section += [0] * max(0, 4 - len(mrope_section))
2012+
self.gguf_writer.add_rope_dimension_sections(mrope_section)
2013+
2014+
def set_vocab(self):
2015+
try:
2016+
self._set_vocab_sentencepiece()
2017+
except FileNotFoundError:
2018+
self._set_vocab_gpt2()
2019+
2020+
def get_tensors(self) -> Iterator[tuple[str, Tensor]]:
2021+
for name, data in super().get_tensors():
2022+
if name.startswith("visual."):
2023+
continue
2024+
yield name, data
2025+
2026+
20042027
@Model.register("Qwen2MoeForCausalLM")
20052028
class Qwen2MoeModel(Model):
20062029
model_arch = gguf.MODEL_ARCH.QWEN2MOE

examples/llava/CMakeLists.txt

+7
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,10 @@ set_target_properties(${TARGET} PROPERTIES OUTPUT_NAME llama-minicpmv-cli)
4343
install(TARGETS ${TARGET} RUNTIME)
4444
target_link_libraries(${TARGET} PRIVATE common llava ${CMAKE_THREAD_LIBS_INIT})
4545
target_compile_features(${TARGET} PRIVATE cxx_std_17)
46+
47+
set(TARGET llama-qwen2vl-cli)
48+
add_executable(${TARGET} qwen2vl-cli.cpp)
49+
set_target_properties(${TARGET} PROPERTIES OUTPUT_NAME llama-qwen2vl-cli)
50+
install(TARGETS ${TARGET} RUNTIME)
51+
target_link_libraries(${TARGET} PRIVATE common llava ${CMAKE_THREAD_LIBS_INIT})
52+
target_compile_features(${TARGET} PRIVATE cxx_std_17)

0 commit comments

Comments
 (0)