Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

The repository contains the code for the SEPO algorithm presented in the paper:

Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods.

SEPO is an efficient, broadly applicable, and theoretically justified policy gradient algorithm, for fine-tuning discrete diffusion models over general rewards.

The code will be uploaded mid-February 2025...

In the mean time, enjoy this pretty GIF of a denoising diffusion process guided by SEPO, in the discrete case of language!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
img		img
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

About

Releases

Packages

License

ozekri/SEPO

Folders and files

Latest commit

History

Repository files navigation

Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages