The code is coming soon. MouSi: Poly-Visual-Expert Vision-Language Models Multimodal large language model with integration of multiple vision experts [Paper] [Demo]