Welcome to the repository accompanying our survey paper on Large Language Model-Brained GUI Agents. This repository contains the code for the searchable paper page and the assets used in the paper. LLM-Brained GUI Agents are:
Intelligent agents that operate within GUI environments, leveraging Large Language Models (LLMs) as their core inference and cognitive engine to generate, plan, and execute actions in a flexible and adaptive manner.
📖 Read the Paper: Large Language Model-Brained GUI Agents: A Survey
If you find our work useful, please consider citing:
@misc{zhang2024largelanguagemodelbrainedgui,
title={Large Language Model-Brained GUI Agents: A Survey},
author={Chaoyun Zhang and Shilin He and Jiaxu Qian and Bowen Li and Liqun Li and Si Qin and Yu Kang and Minghua Ma and Guyue Liu and Qingwei Lin and Saravan Rajmohan and Dongmei Zhang and Qi Zhang},
year={2024},
eprint={2411.18279},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2411.18279},
}
🔍 Explore the Searchable Paper Page
The Searchable Paper Page is a web-based interface that allows you to search and filter through the papers in our survey. You can also view the papers by category, platform, and date.
🤝Contributions Welcome!
We encourage the community to contribute to this repository. If you have suggestions for new papers, resources, or improvements, please open an issue or submit a pull request.
To contribute, follow these steps:
Find the *.json
file in the data
directory which matches the category of the paper you want to add. It should be either survey
, framework
, dataset
, model
, benchmark
, gui-testing
, or visual-assistant
.
- `survey`: Papers that provide a survey of the LLM-Powered GUI Agents.
- `framework`: Papers that introduce a new framework or architecture for LLM-Powered GUI Agents.
- `dataset`: Papers that introduce a new dataset for optimizing models for LLM-Powered GUI Agents.
- `model`: Papers that introduce a new optimized model for LLM-Powered GUI Agents.
- `benchmark`: Papers that introduce a new benchmark for evaluating LLM-Powered GUI Agents.
- `gui-testing`: Papers that uses LLM-powered agents for GUI testing. It is mainly focused on the testing applications aspect.
- `visual-assistant`: Papers, open-source projects, or products that use LLM-powered agents for visual assistance, such as voice assistants, produtized web agents, etc. It is mainly focused on the applications aspect.
In the corresponding json file, add the paper details in the following format to the existing list of papers:
{
"Name": "Paper Title",
"Platform": "Device or OS Platform, e.g. Mobile, Web, Desktop, Android, Windows, etc.",
"Date": "Month Year",
"Paper_Url": "The paper link of the paper",
"Highlight": "A brief highlight of the paper, up to 2 sentences.",
"Code_Url": "The project or code link of the paper",
}
After adding the paper details, submit a pull request with the title Add Paper: Paper Title
and we will review it as soon as possible. Once the pull request is merged, the paper will be automatically added to the Searchable Paper Page.
Here are some other repositories that you might find useful:
⭐ If you find this repository helpful, please consider to cite our paper and give it a star!
If the authors of the paper wish to have their paper removed from the website, please contact chaoyun.zhang@microsoft.com.