Change the repository type filter
All
Repositories list
16 repositories
- Align Anything: Training All-modality Model with Feedback
ProgressGym
Public.github
Public- JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.
- SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enhance the helpfulness and harmlessness of Large Vision Models (LVMs).
- Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
llms-resist-alignment
Public- NeurIPS 2023: Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
ProAgent
PublicProAgent: Building Proactive Cooperative Agents with Large Language Models- ICLR 2024: SafeDreamer: Safe Reinforcement Learning with World Models
Safe-Policy-Optimization
PublicNeurIPS 2023: Safe Policy Optimization: A benchmark repository for safe reinforcement learning algorithms- AI Alignment: A Comprehensive Survey
beavertails
Public