
-
Meituan
- Beijing
- https://rollingwang.github.io/rolling.github.io/
Starred repositories
No fortress, purely open ground. OpenManus is Coming.
🐫 CAMEL: Finding the Scaling Law of Agents. The first and the best multi-agent framework. https://www.camel-ai.org
LLM-powered multiagent persona simulation for imagination enhancement and business insights.
Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
Text-to-Music Generation with Rectified Flow Transformers
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
An open source implementation of CLIP.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".
Scaling RWKV-Like Architectures for Diffusion Models
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
[CVPR2023] Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
Using Low-rank adaptation to quickly fine-tune diffusion models.
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
High-Resolution Image Synthesis with Latent Diffusion Models
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image …
Better Aligning Text-to-Image Models with Human Preference. ICCV 2023
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Open-source and strong foundation image recognition models.
Stable Diffusion web UI
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
A curated list of Multimodal Related Research.
📝Awesome and classical image retrieval papers
A curated list of awesome awesomeness about artificial intelligence