Loop - Onchain RLHF Platform

Built during EthSF 2024 🏆 $1,500 Hedera Prize + 🏆 $1,500 AirDAO AI prize,

Loop brings Reinforcement Learning with Human Feedback (RLHF) into the blockchain space and makes the human feedback crowdsourced. This platform integrates on-chain reward systems, empowering users to train AI models and earn rewards for quality feedback.

LOOP_demo.mp4

What is Reinforcement Learning with Human Feedback (RLHF)

RLHF is a machine learning approach that uses human feedback to 'guide' the learning of AI models. Whenever ChatGPT asks you to pick between 2 split-screen answer options, that is an example of RLHF. It is injecting your human feedback to improve the model weights and thus fine tune the model.

Inherently, RLHF REQUIRES humans. This is because the feedback needs to be not objective. For example, if we were making a FunnyGPT, we could use RLHF to score the 'funniness' of llm outputs.

So, RLHF requires human feedback and judgment to steer AI learning by rewarding good outputs and penalizing bad outputs.

Why is blockchain key for RLHF?

Crypto Incentives: With blockchain, we can easily mobilize a large crowd of people with token incentives. Also, this brings more diversity in the perspectives because anyone (decentralized) can access, give feedback, and earn tokens. This is opposite to a company's current approach of doing RLHF on their own model.
Privacy + Security: Blockchain makes it possible to compute on encrypted data using advanced techniques like homomorphic encryption or zero-knowledge proofs, allowing AI models to be trained without exposing sensitive user data. In future developments, we hope to integrate privacy-preserving blockchains that will allow RLHF systems to keep data confidential while still benefiting from human feedback. This will allow us to use RLHF with privacy compliance in various industries like healthcare or finance. (Future works)

How does it work?

UserA can upload any opensource model from huggingface. They pay chain-native tokens and set a bounty reward amount.
UserB can access LLMs uploaded by other users (like user A) and use them for free. After each LLM output, UserB is prompted to give feedback (too short, too long, was responsive, bad output..). For helping give feedback, users earn crypto tokens deposited to account.
UserA can access the admin dashboard and then see all of the data coming from the crowdsourced RLHF. This data can be used to retrain the model and update the weights.

Installation

React Web App

cd frontend
cd react-web-App
npm install --legacy-peer-deps
npm run start

Smart Contracts To deploy on thirdweb:

cd crypto
cd thirdweb-deployment
npx thirdweb@latest deploy

To see contract:

cd crypto
cd Contracts

Flask Backend

cd backend
py flask_server.py

Smart Contract Deployments

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
backend		backend
crypto		crypto
frontend/react-web-app		frontend/react-web-app
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Loop - Onchain RLHF Platform

What is Reinforcement Learning with Human Feedback (RLHF)

Why is blockchain key for RLHF?

How does it work?

Installation

Smart Contract Deployments

Zircuit Testnet:

Polygon Amoy:

Neon EVM Devnet:

Morph:

Unichain:

Story:

AirDAO

Hedera

Rootstock:

About

Releases

Packages

Contributors 2

Languages

shreybirmiwal/Loop-ethSF2024

Folders and files

Latest commit

History

Repository files navigation

Loop - Onchain RLHF Platform

What is Reinforcement Learning with Human Feedback (RLHF)

Why is blockchain key for RLHF?

How does it work?

Installation

Smart Contract Deployments

Zircuit Testnet:

Polygon Amoy:

Neon EVM Devnet:

Morph:

Unichain:

Story:

AirDAO

Hedera

Rootstock:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages