The repository contains the code for the SEPO
algorithm presented in the paper:
Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods.
SEPO
is an efficient, broadly applicable, and theoretically justified policy gradient algorithm, for fine-tuning discrete diffusion models over general rewards.
The code will be uploaded mid-February 2025...
In the mean time, enjoy this pretty GIF of a denoising diffusion process guided by SEPO
, in the discrete case of language!