Skip to content
View WeiXiongUST's full-sized avatar

Block or report WeiXiongUST

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
WeiXiongUST/README.md

Hi there 👋

I am Wei Xiong, currently a first-year Ph.D. student in computer science at UIUC. I work on RLHF for aligning language models.

Previously, I have spent time on the mathematical foundation of RL, where I was fortunate to collaborate with many great senior mentors and talented peers. I also spent time on deep RL at Microsoft Research Asia.

You can find more information about me at:

Pinned Loading

  1. RLHFlow/RLHF-Reward-Modeling RLHFlow/RLHF-Reward-Modeling Public

    Recipes to train reward model for RLHF.

    Python 802 67

  2. OptimalScale/LMFlow OptimalScale/LMFlow Public

    An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

    Python 8.3k 826

  3. Decentralized-Proximal-Algorithm-with-Variance-Reduction Decentralized-Proximal-Algorithm-with-Variance-Reduction Public

    This is the code used for the paper "PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction", prepint.

    Python 15

  4. multi-armed-bandit-test-framework multi-armed-bandit-test-framework Public

    This is the code about multi_armed bandit used for my undergraduate thesis.

    Python 5

  5. ShenGroup/MPMAB_BEACON ShenGroup/MPMAB_BEACON Public

    This is the official implementation for the paper "Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization" in NeurIPS 2021.

    Python 3

  6. RLHFlow/Online-RLHF RLHFlow/Online-RLHF Public

    A recipe for online RLHF and online iterative DPO.

    Python 417 46