Description
Guidance implements a method called token healing, which consists in correcting for the quirks introduced by modern encodings like BPE. See this notebook for a thorough explanation of why this is necessary. The implementation for Transformers
models is here.
This consists in backtracking one or several tokens and start generation by imposing that we reproduce the text that corresponds to the removed tokens. This can be integrated in the __call__
method of the Sequence
class.