Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFN LLM FineTuning #191

Open
josephdviviano opened this issue Oct 9, 2024 · 1 comment
Open

GFN LLM FineTuning #191

josephdviviano opened this issue Oct 9, 2024 · 1 comment
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@josephdviviano
Copy link
Collaborator

josephdviviano commented Oct 9, 2024

We need to re-implement this paper as part of the torchgfn library: https://github.com/GFNOrg/gfn-lm-tuning

In particular many quality of life improvements would be welcome, such as being able to initialize a policy from a huggingface model with a clean API, etc.

EDIT by @saleml : See proposed motivation and plan in the first comment below

@saleml
Copy link
Collaborator

saleml commented Dec 3, 2024

Introduction

gflownet-finetuning of language models https://arxiv.org/abs/2310.04363 consists of fine-tuning the parameters of an LLM transformer (say decoder-only) so that it generates sequences (of tokens) that satisfy some constraints or preferences. This entails that:

  • trajectories in the DAG/gflownet, correspond to sequences of tokens, where $s_0$ is the empty string, and $s_t$ is the sequence generated so far. We need to store the whole generated sequence to keep the auto-regressive property of decoder-only language models, where we generate token $i$ conditioned on the tokens $1\dots i-1$.
  • the original LLM is the initial forward policy $P_F$. Fine-tuning corresponds to updating the parameters of the forward policy

States and actions

The goal is to take inspiration from the code that accompanies the original paper to implement gflownet-finetuning with torchgfn.

Following what’s written above, it’s clear that the environment corresponds to the set of all possible sequences (say up to a certain limit). Actions correspond to adding a token. Therefore, the action space, should correspond to the vocabulary. Different language models have different tokenizers, with different total number of tokens. It is therefore important to have as a parameter, the total number of tokens. States correspond to sequences of tokens. States can be represented as:

  • strings (decoded sequences of tokens) — this is useful for immediate evaluation and investigation of generated sequences by humans
  • lists/tensors (sequences of tokens) — this is useful to be processed by the different estimators, including the language model/policy $P_F$.

Fortunately, the preprocessor object needed to define an environment can handle casting between these two types!

First steps towards a PR

Following the introduction above, one can implement the environment, that takes as input the language model and the tokenizer, and defines the preprocessor, states class, and actions class. It is important to look at env.py and examples such as hypergrid.py and discrete_ebm.py, in order to implement the missing/abstract functions.

To test the implementation, we should be able to instantiate the environment with an arbitrary language model / tokenizer, and generate random sequences. The generation will require using the transformers library of huggingface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants