This repository contains an efficient deployment solution for Imp models on mobile devices, which is based on the MLC-LLM framework and referred to the MLC-MiniCPM project.
Note that our default Imp-3B model uses an input image size of 384x384, which results in a 729 visual embeddings and takes too much time and memory for mobile devices. Therefore, we use an Imp variant with a reduced image size of 196x196. More details can be referred to our technical report). The models run on android are quantized to 4-bit
, which takes 1.9GB stoarage.
We provide two ways to run MLC-Imp on mobile phones with Android system. YOu can use directly install our precompiled .apk
file or compile it from scatrch by yourself.
The solution for IOS system is still one the way.
-
Download precompiled ImpChat apk from here
-
Accept camera & photo permission
- Download model: (1) Press the download button (2) Wait for the progress bar to fill up (3) Start chat
- Chat with Imp: (1) Wait for model initialization until "Ready to chat" pop up. (2) Upload an image from the gallary or take a photo using the camera (3) Wait until "process image done" show up. (4) Enter your question to begin a conversation.
- Chat mode: Text or Vision are both support
- Note:image process may take some time.
- Demo
-
Follow https://llm.mlc.ai/docs/install/tvm.html to Install TVM Unity Compiler
-
Follow https://llm.mlc.ai/docs/install/mlc_llm.html to Install MLC LLM Python Package
-
Follow https://llm.mlc.ai/docs/deploy/android.html to prepare android requirements.
Download the model checkpoint into the dist/models
folder.
# covert the model weights from fp-16 to 4-bit
mlc_llm convert_weight --model-type imp ./dist/models/imp-v1.5-3B-196 --quantization q4f16_1 -o ./dist/imp-v1.5-3B-196-q4f16_1
# generate config
mlc_llm gen_config ./dist/models/imp-v1.5-3B-196 --quantization q4f16_1 --conv-template imp -o ./dist/imp-v1.5-3B-196-q4f16_1
# compile for android
mlc_llm compile ./dist/imp-v1.5-3B-196-q4f16_1/mlc-chat-config.json --device android -o ./dist/libs/imp-v1.5-3B-196-q4f16_1-android.tar
cd ./android/library
./prepare_libs.sh
Go to android/
and use Android Studio to build the app. (Follow https://llm.mlc.ai/docs/deploy/android.html)
Alternatively, we can also use the MLC-Imp on Linux/Windows servers.
-
Follow https://llm.mlc.ai/docs/install/tvm.html to Install TVM Unity Compiler
-
Follow https://llm.mlc.ai/docs/install/mlc_llm.html to Install MLC LLM Python Package
Download the model checkpoint and put it into the dist/models
folder.
use Vulkan as an example:
# covert imp model into 4bit
mlc_llm convert_weight --model-type imp ./dist/models/imp-v1.5-3B_196 --quantization q4f16_1 -o ./dist/imp-v1.5-3B_196-q4f16_1
# generate config
mlc_llm gen_config ./dist/models/imp-v1.5-3B-196 --quantization q4f16_1 --conv-template imp -o ./dist/imp-v1.5-3B-196-q4f16_1
# compile to vulkan
mlc_llm compile ./dist/imp-v1.5-3B-196-q4f16_1/mlc-chat-config.json --device vulkan -o ./dist/libs/imp-v1.5-3B-196-q4f16_1-vulkan.so
Then, you can use the following example python script to use MLC-Imp on the corresponding server
import tvm
from mlc_llm import ChatModule
from PIL import Image
from transformers.image_utils import (
ChannelDimension,
PILImageResampling,
to_numpy_array,
)
from transformers.image_transforms import (
convert_to_rgb,
normalize,
rescale,
resize,
to_channel_dimension_format,
)
from functools import partial, reduce
from transformers.image_processing_utils import BatchFeature
def load_image(image_file):
from io import BytesIO
import requests
from PIL import Image
if image_file.startswith("http") or image_file.startswith("https"):
response = requests.get(image_file)
image = Image.open(BytesIO(response.content)).convert("RGB")
else:
image = Image.open(image_file).convert("RGB")
return image
def simple_image_processor(
images,
image_mean=(0.5, 0.5, 0.5),
image_std=(0.5, 0.5, 0.5),
size=(196, 196),
resample=PILImageResampling.BICUBIC,
rescale_factor=1 / 255,
data_format=ChannelDimension.FIRST,
return_tensors="pt"
):
if isinstance(images, Image.Image):
images = [images]
else:
assert isinstance(images, list)
transforms = [
convert_to_rgb,
to_numpy_array,
partial(resize, size=size, resample=resample, data_format=data_format),
partial(rescale, scale=rescale_factor, data_format=data_format),
partial(normalize, mean=image_mean, std=image_std, data_format=data_format),
partial(to_channel_dimension_format, channel_dim=data_format, input_channel_dim=data_format),
]
images = reduce(lambda x, f: [*map(f, x)], transforms, images)
data = {"pixel_values": images}
return BatchFeature(data=data, tensor_type=return_tensors)
image_path = "./assets/bus.jpg"
image_tensor = load_image(image_path)
image_features = tvm.nd.array(
simple_image_processor(image_tensor)['pixel_values'].numpy().astype("float32"),
device=tvm.runtime.ndarray.vulkan(),
)
cm = ChatModule(model="./dist/imp-v1.5-3B-196-q4f16_1", model_lib_path="./dist/libs/imp-v1.5-3B-196-q4f16_1-vulkan.so")
output = cm.generate(
prompt="<image>\nWhat are the colors of the bus in the image?",
pixel_values=image_features
)
print(output)