Alefe Gadioli, Breno de Angelo, Daniel Gobbi, Eduardo Abreu, Luiz Cardoso, Mariana Rezende e Matheus Rezende
The 'AI Want Coffee' project assesses the potential of Artificial General Intelligence (AGI) using the 'Wozniak test', which challenges a robot to enter a house, make and serve coffee. It incorporates skills such as autonomous navigation and speech recognition. The focus is on testing OpenAI's GPT-4 model in real scenarios, exploring how its natural language processing and response generation can aid in developing AGI for complex tasks in unstructured environments.
A. Coffee-Assistant: This architecture assists the user in preparing coffee by suggesting the next steps. It starts with the user sending an image of the kitchen to GPT-4, which then formulates a list of procedures. After each action completed by the user, a new image is sent to GPT-4 for evaluation and to suggest the next step. The process heavily relies on human feedback.
B. Coffee-Agent: Unlike Coffee-Assistant, Coffee-Agent aims to allow GPT to act autonomously, controlling a robotic body with a set of defined instructions. A human substitutes the robotic body, following GPT's instructions. Initially, GPT receives a photo of the human's view in the kitchen, and in each iteration, it calls a predefined function, sending a new photo after each completed instruction.
C. Coffee-Agent v2: This version seeks to solve problems from the previous architecture, such as impossible mission requests. It's an evolution that enhances the functioning and autonomy of the system.
Tests were conducted in different kitchens, for testing in different scenarios using cameras and sending to the GPT.
- Python 3.6 or higher
- IP Webcam with RTSP enabled
- Windows: Download the installer from the official Python website
- Linux: Use your distribution's package manager, for example:
# Python3 - Linux
sudo apt-get install python3
- Open a terminal
# ai-want-coffee
git clone https://github.com/lewislf/ai-want-coffee.git
- Open a terminal and navigate to the cloned project directory.
# Open the Folder
cd ai-want-coffee
- Install all required dependencies
# ai-want-coffee
pip install -r requirements.txt
- Obtain an API Key from OpenAI by creating an account at OpenAI.
- Open the file named api_key.py in the project directory and define OPENAI_API_KEY with your key and Replace the ip and port in the ip_address variable with your IP webcam's information, in the LOCAL_CAMERA variable.
# api_key.py
OPENAI_API_KEY = 'YOUR API KEY'
LOCAL_CAMERA = "rtsp://ip:port/h264_ulaw.sdp"
- Replace the prompt in the gpt4vision.py file with the AgentLegacy.py prompt
- Open de folder predict:
cd ai-want-coffee/predict/
- In the terminal, execute python assistant_gpt.py
python3 assistant_gpt.py
- Replace the prompt in the gpt4vision.py file with the AgentLegacy.py prompt
- Open de folder predict/classes:
cd ai-want-coffee/predict/Classes
- In the terminal, execute python assistant_gpt.py
python3 main.py
- Replace the prompt in the gpt4vision.py file with the AgentLegacy.py prompt
- Open de folder predict/assistant-v2:
cd ai-want-coffee/predict/assistant-v2
- In the terminal, execute python assistant_gpt.py
python3 gpt.py
Explore how to eliminate the "interact" function that was made available to the Coffee-Agent, since it is classified as an abstract order. Works that are outside the scope of the AGI subject, but continue from what was produced: Improve integration with Coffee-Assistant through the voice communication module we developed. Use it for tasks other than making coffee. Integration with VR glasses.