🤖 LLM PC Control

For voice control with phone check: https://github.com/pnmartinez/computer-use-android-app

demo.webm

Control your computer with natural language commands using Large Language Models (LLMs), OCR, and voice input.

Get the Android app to control your PC with voice on the Computer Use Android App repo.

✨ Features

🗣️ Natural Language Commands: Control your computer using everyday language
🔍 UI Element Detection: Automatically detects UI elements on your screen
📝 Multi-Step Commands: Execute complex sequences of actions with a single command
👁️ OCR Integration: Reads text from your screen to better understand the context
⌨️ Keyboard and Mouse Control: Simulates keyboard and mouse actions
🎤 Voice Input Support: Control your PC with voice commands
🌎 Multilingual Support: Automatic translation with preservation of UI element names
🖥️ Multiple Deployment Options: Run locally or in Docker

🚀 Installation

Standard Installation

# Clone the repository
git clone https://github.com/yourusername/llm-pc-control.git
cd llm-pc-control

# Install the package
pip install -e .

Docker Installation

For a Docker-based setup:

Make sure Docker and Docker Compose are installed
Ensure Ollama is installed and running locally
Run the setup script:

./scripts/docker/setup-docker-x11.sh

📋 Requirements

Python 3.8 or higher
Ollama (for local LLM inference)
EasyOCR and PaddleOCR (for text recognition)
PyAutoGUI (for keyboard and mouse control)
PyAudio (for voice input)
OpenAI Whisper (for speech-to-text)

📖 Usage

Voice Control Server

# Run the voice control server
python -m llm_control voice-server

# With custom options
python -m llm_control voice-server --port 8080 --whisper-model medium --ollama-model llama3.1

Simple Command

# Run a simple command
python -m llm_control simple-voice --command "click on the Firefox icon"

🖥️ Server API

The voice control server provides the following API endpoints:

GET /health: Check server status
POST /command: Execute a text command
POST /voice-command: Process a voice command from audio data
POST /transcribe: Transcribe audio without executing commands
POST /translate: Translate text to English

Example: Sending a Direct Command

curl -X POST http://localhost:5000/command \
  -H "Content-Type: application/json" \
  -d '{"command": "open Firefox, go to gmail.com and compose a new email"}'

Example: Sending a Voice Command

curl -X POST http://localhost:5000/voice-command \
  -F "audio_file=@recording.wav" \
  -F "translate=true" \
  -F "language=es"

🧪 Project Structure

llm-control/
├── llm_control/         # Main Python package
├── scripts/             # Utility scripts
│   ├── docker/          # Docker-related scripts
│   ├── setup/           # Installation scripts
│   └── tools/           # Utility tools
├── data/                # Data files
├── tests/               # Test suite
└── screenshots/         # Screenshots directory

💡 Command Examples

Here are some examples of commands you can use:

"Click on the Submit button"
"Type 'Hello, world!' in the search box"
"Press Enter"
"Move to the top-right corner of the screen"
"Double-click on the file icon"
"Right-click on the image"
"Scroll down"
"Click on the button, then type 'Hello', then press Enter"

⚙️ How It Works

📸 Screenshot Analysis: Takes a screenshot of your screen
🔎 UI Detection: Analyzes the screenshot to detect UI elements
🔄 Command Parsing: Parses your natural language command into steps
⚡ Action Generation: Generates the corresponding actions for each step
▶️ Execution: Executes the actions using PyAutoGUI

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.github/workflows		.github/workflows
llm_control		llm_control
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 LLM PC Control

✨ Features

🚀 Installation

Standard Installation

Docker Installation

📋 Requirements

📖 Usage

Voice Control Server

Simple Command

🖥️ Server API

Example: Sending a Direct Command

Example: Sending a Voice Command

🧪 Project Structure

💡 Command Examples

⚙️ How It Works

📄 License

About

Uh oh!

Releases 6

Uh oh!

Contributors 4

Uh oh!

Languages

License

pnmartinez/simple-computer-use

Folders and files

Latest commit

History

Repository files navigation

🤖 LLM PC Control

✨ Features

🚀 Installation

Standard Installation

Docker Installation

📋 Requirements

📖 Usage

Voice Control Server

Simple Command

🖥️ Server API

Example: Sending a Direct Command

Example: Sending a Voice Command

🧪 Project Structure

💡 Command Examples

⚙️ How It Works

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Uh oh!

Contributors 4

Uh oh!

Languages