Proposition on the implementation of token alignement #1239
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The aim of this PR is to explore a possible solution to the topic of prompt alignment. I decided to open a different draft PR as the option considered here is quite different from what I had in mind in #531
I'm trying to find a solution that:
states_to_token_maps
after initializationget_next_instruction
andget_next_state
methodsFor this test I've looked only at the
RegexGuide
and have not covered some things that will need to be added later as I want to focus on the idea of changing thestates_to_token_maps
.To do so, the idea is to create during the initialization of the
RegexGuide
astates_to_token_maps
that could accommodate token alignment for any prompt that will be received later by including states that are "before" the initial state (anticipating that some characters at the end of the prompt will be removed for token alignment).The downside of this approach is that it adds many states to the
states_to_token_maps
that are used/valid only for a given prompt. Thus, when running an inference, some of the states cannot be reached for the prompt provided, which sounds a bit strange.Another problem is that it adds overhead to the initialization of the generator. The amount of overhead varies a lot based on the size of the vocabulary and on how "constrained" the first characters of the regex are (worst case scenario is unconstrained text). I have not really looked at how I could optimize how long those operations take though, so maybe it could be reduced.
I would be curious to have your opinion on this @rlouf