Mobile-Agent-v2.mp4
Mobile-Agent.mp4
- 🔥[6.27] 我们在Hugging Face和ModelScope发布了可以上传手机截图体验Mobile-Agent-v2的Demo,无需配置模型和设备,即刻便可体验。
- [6. 4] Modelscope-Agent 已经支持 Mobile-Agent-V2,基于 Android Adb Env,请查看 application。
- [6. 4] 我们发布了新一代移动设备操作助手 Mobile-Agent-v2, 通过多智能体协作实现有效导航。
- [3.10] Mobile-Agent 被 ICLR 2024 Workshop on Large Language Model (LLM) Agents 接收。
- Mobile-Agent-v2 - 通过多代理协作有效导航的移动设备操作助手
- Mobile-Agent - 视觉感知方案的自动化移动设备操作智能体
If you find Mobile-Agent useful for your research and applications, please cite using this BibTeX:
@article{wang2024mobile2,
title={Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration},
author={Wang, Junyang and Xu, Haiyang and Jia Haitao and Zhang Xi and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
journal={arXiv preprint arXiv:2406.01014},
year={2024}
}
@article{wang2024mobile,
title={Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception},
author={Wang, Junyang and Xu, Haiyang and Ye, Jiabo and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
journal={arXiv preprint arXiv:2401.16158},
year={2024}
}
- AppAgent: Multimodal Agents as Smartphone Users
- mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
- Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
- GroundingDINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
- CLIP: Contrastive Language-Image Pretraining