This repository is for 👨💻 developing / 🛠️ constructing / 🧪 testing and 🚀 moonshoting ideas for our bachelor thesis: Natural Language-Intructed Autonomous Agent for Computer Control (A2C2)
As part of module Machine Learning Operations, we developed a prototype of an A2C2 and integrated several tools that we learnt about in the module to represent the development of our prototype in an ML pipeline.
- The ultimate AI application
- Assistant in using computer systems
- Helpful in everyday task
- Data Generation - How and where to collect training data? -
- Dynamic Action Inference - How can the actions relevant for the instruction be determined? -
- Refinement with the User - Where does it require further information from the user? -
- User Interaction for Critical Tasks - When are further enquiries to the user necessary? -
- User friendly Chatbot
- Critical Task Detection
- Missing Information Detection
- Basic Pipeline for ViT Training
- Screen Captioning
- Chatbot
- I/O Execution
- Storing Experience Embeddings
- Task Decomposition & Refinement
- Gathering real-life Data
- Model Fine-Tuning
- Critical Task Detection
- Missing Information Detection
-
UI interacts with planning component through REST
- UI with Tkinter, Python & pyautogui
- Interaction through REST with FastAPI
-
Planning Ccomponent does RAG for gathering more information
- Data storage (decomposition prompts & planning prompts) with oxen.ai
-
Conversational - validator checks if critical action or missing information
- MAD (multi-agent debating) (see Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate)
- Data storage (debate prompts) with oxen.ai
-
Planning component utilize model for user instruction interpretation & visual analysis
- GPT-4 Vision (first)
- YOLOv8 (fine-tuned but not yet optimized for utilizing with planning component)
-
Web-crawler interacts with browser to gather training data
- Gathering training data with Selenium
-
- Data storage (data from web crawling) with oxen.ai
-
Model is fine-tuned, stored and re-deployed
- Hyperparameter tuning with RayTune & wandb
-
(5. & 6)
- Workflow with GitHub Actions (Tried to solve it with Airflow via Google Cloud. Unfortunately without success. Hence the use of GitHub Actions instead. However, no temporal triggering possible, but automated)
-
(1 -6)
- Automated Testing with GitHub Actions (CICD pipeline for deployment; CI: test with Flake8 whether Python syntax is correct; CD pipeline is triggered using semantic release; CD: Executable for win & mac will be created)
TODO