You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In particular many quality of life improvements would be welcome, such as being able to initialize a policy from a huggingface model with a clean API, etc.
EDIT by @saleml : See proposed motivation and plan in the first comment below
The text was updated successfully, but these errors were encountered:
gflownet-finetuning of language models https://arxiv.org/abs/2310.04363 consists of fine-tuning the parameters of an LLM transformer (say decoder-only) so that it generates sequences (of tokens) that satisfy some constraints or preferences. This entails that:
trajectories in the DAG/gflownet, correspond to sequences of tokens, where $s_0$ is the empty string, and $s_t$ is the sequence generated so far. We need to store the whole generated sequence to keep the auto-regressive property of decoder-only language models, where we generate token $i$ conditioned on the tokens $1\dots i-1$.
the original LLM is the initial forward policy$P_F$. Fine-tuning corresponds to updating the parameters of the forward policy
States and actions
The goal is to take inspiration from the code that accompanies the original paper to implement gflownet-finetuning with torchgfn.
Following what’s written above, it’s clear that the environment corresponds to the set of all possible sequences (say up to a certain limit). Actions correspond to adding a token. Therefore, the action space, should correspond to the vocabulary. Different language models have different tokenizers, with different total number of tokens. It is therefore important to have as a parameter, the total number of tokens. States correspond to sequences of tokens. States can be represented as:
strings (decoded sequences of tokens) — this is useful for immediate evaluation and investigation of generated sequences by humans
lists/tensors (sequences of tokens) — this is useful to be processed by the different estimators, including the language model/policy $P_F$.
Fortunately, the preprocessor object needed to define an environment can handle casting between these two types!
First steps towards a PR
Following the introduction above, one can implement the environment, that takes as input the language model and the tokenizer, and defines the preprocessor, states class, and actions class. It is important to look at env.py and examples such as hypergrid.py and discrete_ebm.py, in order to implement the missing/abstract functions.
To test the implementation, we should be able to instantiate the environment with an arbitrary language model / tokenizer, and generate random sequences. The generation will require using the transformers library of huggingface.
We need to re-implement this paper as part of the torchgfn library: https://github.com/GFNOrg/gfn-lm-tuning
In particular many quality of life improvements would be welcome, such as being able to initialize a policy from a huggingface model with a clean API, etc.
EDIT by @saleml : See proposed motivation and plan in the first comment below
The text was updated successfully, but these errors were encountered: