[Bug][iOS/Swift SDK] Multiple image input to vision models will throw error from TVM #3044

Neet-Nestor · 2024-11-22T04:08:50Z

🐛 Bug

When using the mlc-llm Swift package to chat with vision language models, specifically the Phi-3-vision-instruct model, errors occur when attempting to input an image for the second time or input multiple images simutaneously. The first image input processes correctly, but subsequent attempts result in one of the following errors:

NDArray size mismatch:

libc++abi: terminating due to uncaught exception of type tvm::runtime::InternalError: [23:07:43] /Users/neet/code/mlc-llm/3rdparty/tvm/src/runtime/ndarray.cc:213: Check failed: relative_byte_offset + view_size <= curr_size (11046600 vs. 1017846) : ValueError: View with shape [1, 1700, 2166, 3] and datatype uint8 would have a size of 11046600 bytes. This would occupy bytes 0 <= i_byte < 11046600 within the backing array. However, the NDArray being viewed only contains 1017846 bytes (shape = [1, 618, 549, 3], dtype= uint8).

Embedding shape mismatch:

libc++abi: terminating due to uncaught exception of type tvm::runtime::InternalError: [23:22:38] /Users/neet/code/mlc-llm/cpp/serve/model.cc:1023: InternalError: Check failed: embedding->shape[0] + offset <= dst->shape[0] (2535 vs. 2048) :

To Reproduce

Steps to reproduce the behavior:

Send an image from the iOS app through the Swift mlc-llm SDK MLCEngine.chatCompletion() as base64 encoded url. This image shall input into the pipeline and decode tokens correctly.
Send a second image, this time it will throw one of the two errors above.

or,

Send multiple images from the iOS app through the Swift mlc-llm SDK MLCEngine.chatCompletion() as base64 encoded url.

Expected behavior

The model should process multiple image inputs without throwing exceptions.

Environment

Platform: iOS
Operating system: macOS
Device: macOS
How you installed MLC-LLM: conda & pip
How you installed TVM-Unity: Source

The text was updated successfully, but these errors were encountered:

Chris611 · 2024-12-22T22:38:12Z

I have the same issue on Ubuntu 24.04. Using openai API with mlc_llm serve of phi-3.5-vision. Compiled from source

Chris611 · 2024-12-22T22:39:23Z

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 339, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 270, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 259, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 185, in tvm._ffi._cy3.core.CHECK_CALL
  File "/home/chris/AI/mlc-llm-cuda/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "/workspace/mlc-llm/cpp/serve/threaded_engine.cc", line 182, in mlc::llm::serve::ThreadedEngineImpl::RunBackgroundLoop()
  File "/workspace/mlc-llm/cpp/serve/engine.cc", line 594, in mlc::llm::serve::EngineImpl::Step()
  File "/workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc", line 119, in mlc::llm::serve::NewRequestPrefillActionObj::Step(mlc::llm::serve::EngineState)
  File "/workspace/mlc-llm/cpp/serve/data.cc", line 96, in mlc::llm::serve::ImageDataNode::GetEmbedding(mlc::llm::serve::Model, tvm::runtime::ObjectRef*, int) const
  File "/workspace/mlc-llm/cpp/serve/model.cc", line 116, in mlc::llm::serve::ModelImpl::ImageEmbed(tvm::runtime::NDArray const&, tvm::runtime::ObjectRef*, int)
  File "/workspace/mlc-llm/cpp/serve/function_table.cc", line 320, in mlc::llm::serve::FunctionTable::CopyToWorker0(tvm::runtime::NDArray const&, tvm::runtime::String, tvm::runtime::ShapeTuple, bool)
ValueError: Traceback (most recent call last):
  7: mlc::llm::serve::ThreadedEngineImpl::RunBackgroundLoop()
        at /workspace/mlc-llm/cpp/serve/threaded_engine.cc:182
  6: mlc::llm::serve::EngineImpl::Step()
        at /workspace/mlc-llm/cpp/serve/engine.cc:594
  5: mlc::llm::serve::NewRequestPrefillActionObj::Step(mlc::llm::serve::EngineState)
        at /workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc:119
  4: mlc::llm::serve::ImageDataNode::GetEmbedding(mlc::llm::serve::Model, tvm::runtime::ObjectRef*, int) const
        at /workspace/mlc-llm/cpp/serve/data.cc:96
  3: mlc::llm::serve::ModelImpl::ImageEmbed(tvm::runtime::NDArray const&, tvm::runtime::ObjectRef*, int)
        at /workspace/mlc-llm/cpp/serve/model.cc:116
  2: mlc::llm::serve::FunctionTable::CopyToWorker0(tvm::runtime::NDArray const&, tvm::runtime::String, tvm::runtime::ShapeTuple, bool)
        at /workspace/mlc-llm/cpp/serve/function_table.cc:320
  1: tvm::runtime::NDArray::CreateView(tvm::runtime::ShapeTuple, DLDataType, unsigned long)
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/src/runtime/ndarray.cc", line 213
ValueError: Check failed: relative_byte_offset + view_size <= curr_size (4320000 vs. 2260800) : View with shape [1, 1200, 1200, 3] and datatype uint8 would have a size of 4320000 bytes.  This would occupy bytes 0 <= i_byte < 4320000 within the backing array.  However, the NDArray being viewed only contains 2260800 bytes (shape = [1, 628, 1200, 3], dtype= uint8).

Chris611 · 2024-12-22T22:45:25Z

And it only goes wrong if the second image is larger than the first image.

Neet-Nestor added the bug Confirmed bugs label Nov 22, 2024

Neet-Nestor assigned mengshyu Nov 22, 2024

Neet-Nestor changed the title ~~[Bug] Sending images with large dimensions into vision models will break TVM~~ [Bug] Image input to vision models will throw error from TVM Nov 22, 2024

Neet-Nestor changed the title ~~[Bug] Image input to vision models will throw error from TVM~~ [Bug][iOS/Swift SDK] Image input to vision models will throw error from TVM Nov 22, 2024

Neet-Nestor changed the title ~~[Bug][iOS/Swift SDK] Image input to vision models will throw error from TVM~~ [Bug][iOS/Swift SDK] Multiple image input to vision models will throw error from TVM Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug][iOS/Swift SDK] Multiple image input to vision models will throw error from TVM #3044

[Bug][iOS/Swift SDK] Multiple image input to vision models will throw error from TVM #3044

Neet-Nestor commented Nov 22, 2024 •

edited

Loading

Chris611 commented Dec 22, 2024

Chris611 commented Dec 22, 2024

Chris611 commented Dec 22, 2024

[Bug][iOS/Swift SDK] Multiple image input to vision models will throw error from TVM #3044

[Bug][iOS/Swift SDK] Multiple image input to vision models will throw error from TVM #3044

Comments

Neet-Nestor commented Nov 22, 2024 • edited Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

Chris611 commented Dec 22, 2024

Chris611 commented Dec 22, 2024

Chris611 commented Dec 22, 2024

Neet-Nestor commented Nov 22, 2024 •

edited

Loading