Skip to content

pnmartinez/simple-computer-use

Repository files navigation

🤖 LLM PC Control

Voice Server Verification

For voice control with phone check: https://github.com/pnmartinez/computer-use-android-app

demo.webm

Control your computer with natural language commands using Large Language Models (LLMs), OCR, and voice input.

Get the Android app to control your PC with voice on the Computer Use Android App repo.

imagen

✨ Features

  • 🗣️ Natural Language Commands: Control your computer using everyday language
  • 🔍 UI Element Detection: Automatically detects UI elements on your screen
  • 📝 Multi-Step Commands: Execute complex sequences of actions with a single command
  • 👁️ OCR Integration: Reads text from your screen to better understand the context
  • ⌨️ Keyboard and Mouse Control: Simulates keyboard and mouse actions
  • 🎤 Voice Input Support: Control your PC with voice commands
  • 🌎 Multilingual Support: Automatic translation with preservation of UI element names
  • 🖥️ Multiple Deployment Options: Run locally or in Docker

🚀 Installation

Standard Installation

# Clone the repository
git clone https://github.com/yourusername/llm-pc-control.git
cd llm-pc-control

# Install the package
pip install -e .

Docker Installation

For a Docker-based setup:

  1. Make sure Docker and Docker Compose are installed
  2. Ensure Ollama is installed and running locally
  3. Run the setup script:
./scripts/docker/setup-docker-x11.sh

📋 Requirements

  • Python 3.8 or higher
  • Ollama (for local LLM inference)
  • EasyOCR and PaddleOCR (for text recognition)
  • PyAutoGUI (for keyboard and mouse control)
  • PyAudio (for voice input)
  • OpenAI Whisper (for speech-to-text)

📖 Usage

Voice Control Server

# Run the voice control server
python -m llm_control voice-server

# With custom options
python -m llm_control voice-server --port 8080 --whisper-model medium --ollama-model llama3.1

Simple Command

# Run a simple command
python -m llm_control simple-voice --command "click on the Firefox icon"

🖥️ Server API

The voice control server provides the following API endpoints:

  • GET /health: Check server status
  • POST /command: Execute a text command
  • POST /voice-command: Process a voice command from audio data
  • POST /transcribe: Transcribe audio without executing commands
  • POST /translate: Translate text to English

Example: Sending a Direct Command

curl -X POST http://localhost:5000/command \
  -H "Content-Type: application/json" \
  -d '{"command": "open Firefox, go to gmail.com and compose a new email"}'

Example: Sending a Voice Command

curl -X POST http://localhost:5000/voice-command \
  -F "audio_file=@recording.wav" \
  -F "translate=true" \
  -F "language=es"

🧪 Project Structure

llm-control/
├── llm_control/         # Main Python package
├── scripts/             # Utility scripts
│   ├── docker/          # Docker-related scripts
│   ├── setup/           # Installation scripts
│   └── tools/           # Utility tools
├── data/                # Data files
├── tests/               # Test suite
└── screenshots/         # Screenshots directory

💡 Command Examples

Here are some examples of commands you can use:

  • "Click on the Submit button"
  • "Type 'Hello, world!' in the search box"
  • "Press Enter"
  • "Move to the top-right corner of the screen"
  • "Double-click on the file icon"
  • "Right-click on the image"
  • "Scroll down"
  • "Click on the button, then type 'Hello', then press Enter"

⚙️ How It Works

  1. 📸 Screenshot Analysis: Takes a screenshot of your screen
  2. 🔎 UI Detection: Analyzes the screenshot to detect UI elements
  3. 🔄 Command Parsing: Parses your natural language command into steps
  4. Action Generation: Generates the corresponding actions for each step
  5. ▶️ Execution: Executes the actions using PyAutoGUI

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Open source implementation for computer use, using light OCR models and LLMs. Get Android app in link below.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •