documentation of the pattern
parameter in pre_tokenizers.Split
is incorrect
#1565
Labels
documentation
Improvements or additions to documentation
The documentation for pre_tokenizers.Split states:
pattern
(str
orRegex
) — A pattern used to split the string. Usually a string or a a regex built withtokenizers.Regex
However, this is incorrect. A
str
does not work, using tokenizers 0.19.1. The following example demonstrates:Can you please update the documentation to state it must be a
tokenizers.Regex
object, and give an example, as above, on how that can be done. It would also be good to documenttokenizers.Regex
somewhere. Another alternative would be to update the code so it does work with astr
value.Note that PR 1264 improved the documentation, but it is still incorrect and confusing.
Also, while you're at it, the documentation for
behavior
could use the example in the rust docs hereThe text was updated successfully, but these errors were encountered: