GFN LLM FineTuning #191

josephdviviano · 2024-10-09T04:10:22Z

We need to re-implement this paper as part of the torchgfn library: https://github.com/GFNOrg/gfn-lm-tuning

In particular many quality of life improvements would be welcome, such as being able to initialize a policy from a huggingface model with a clean API, etc.

EDIT by @saleml : See proposed motivation and plan in the first comment below

saleml · 2024-12-03T16:04:46Z

Introduction

gflownet-finetuning of language models https://arxiv.org/abs/2310.04363 consists of fine-tuning the parameters of an LLM transformer (say decoder-only) so that it generates sequences (of tokens) that satisfy some constraints or preferences. This entails that:

trajectories in the DAG/gflownet, correspond to sequences of tokens, where $s_0$ is the empty string, and $s_t$ is the sequence generated so far. We need to store the whole generated sequence to keep the auto-regressive property of decoder-only language models, where we generate token $i$ conditioned on the tokens $1\dots i-1$.
the original LLM is the initial forward policy $P_F$. Fine-tuning corresponds to updating the parameters of the forward policy

States and actions

The goal is to take inspiration from the code that accompanies the original paper to implement gflownet-finetuning with torchgfn.

Following what’s written above, it’s clear that the environment corresponds to the set of all possible sequences (say up to a certain limit). Actions correspond to adding a token. Therefore, the action space, should correspond to the vocabulary. Different language models have different tokenizers, with different total number of tokens. It is therefore important to have as a parameter, the total number of tokens. States correspond to sequences of tokens. States can be represented as:

strings (decoded sequences of tokens) — this is useful for immediate evaluation and investigation of generated sequences by humans
lists/tensors (sequences of tokens) — this is useful to be processed by the different estimators, including the language model/policy $P_F$.

Fortunately, the preprocessor object needed to define an environment can handle casting between these two types!

First steps towards a PR

Following the introduction above, one can implement the environment, that takes as input the language model and the tokenizer, and defines the preprocessor, states class, and actions class. It is important to look at env.py and examples such as hypergrid.py and discrete_ebm.py, in order to implement the missing/abstract functions.

To test the implementation, we should be able to instantiate the environment with an arbitrary language model / tokenizer, and generate random sequences. The generation will require using the transformers library of huggingface.

josephdviviano added enhancement New feature or request help wanted Extra attention is needed labels Jan 13, 2025

josephdviviano mentioned this issue Jan 13, 2025

Guidelines on finetuning LLMs as policy models #170

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GFN LLM FineTuning #191

GFN LLM FineTuning #191

josephdviviano commented Oct 9, 2024 •

edited by saleml

Loading

saleml commented Dec 3, 2024

GFN LLM FineTuning #191

GFN LLM FineTuning #191

Comments

josephdviviano commented Oct 9, 2024 • edited by saleml Loading

saleml commented Dec 3, 2024

Introduction

States and actions

First steps towards a PR

josephdviviano commented Oct 9, 2024 •

edited by saleml

Loading