council-of-ai

Security measure for agentic LLMs using a council of AIs moderted by a veto system. The council judges an agent's actions outputs based on specified categories.

Objective

Implement a system to judge AI Agents outputs using a council of AI models. Decentralize the decision making power to avoid potential disasters.

Sections

How it Works

Language models, acting as a "judge", will rate an AI output out of 10. If any of the judges in the council (formed by a group of judges) vetoes an output (verdict == false), that output will be flagged as being potentially immoral/unjust/harmful/useless.

How to Use

Clone the repository via git clone https://github.com/seanpixel/council-of-ai.git and cd into the cloned repository.
Install required packages by doing: pip install -r requirements.txt
Download the ethics dataset from here and move it into root (same dir as main.py).
Create a .env file or plug in your key in judge.py (line 8), all you need is an OPENAI_API_KEY
Go to main.py and choose the test type using the choice variable (default is commonsense)
Run python main.py and see what kinds of judgements the council makes

Note: For for "commonsense" AITA (Am I the Asshole?) questions, "allowed" means you are the asshole and "blocked" means you are not the asshole (so it's kind of inverted).

More about the Project & Me

After creating Teenage-AGI, I wondered about potential implications of Agentic LLMs and some ways to moderate its unpredictable behaviors. From this, I thought of democracy and how a decentralized system of AIs could monitor other AIs from causing harm. So came council-of-ai. While contributing to the "acceleration" of technology, I still care about AI Safety and believe that safely guiding AI towards the future can be as fun and exciting as accelerating.

I'm a founder currently running a startup called DSNR and also a first-year at USC. Contact me on twitter about anything would love to chat.

What you can do

Create more "setups", these are basically the characteristics of the judges. Play around with more example Agent outputs and possbily use your own by adding them to "actions.yaml". Use more judges or even plug in your own local LLM. Or even better, implement the council on an unaligned base model (Llama?) and experiment. This is a growing initiative so any help would be appreciated.

Credits

Credits to @DanHendrycks for the Ethics dataset used in testing the idea.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
LICENSE		LICENSE
README.md		README.md
judge.py		judge.py
main.py		main.py
prompts.yaml		prompts.yaml
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

council-of-ai

Objective

Sections

How it Works

How to Use

More about the Project & Me

What you can do

Credits

About

Releases

Packages

Languages

License

seanpixel/council-of-ai

Folders and files

Latest commit

History

Repository files navigation

council-of-ai

Objective

Sections

How it Works

How to Use

More about the Project & Me

What you can do

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages