Skip to content

Conversation

@wyattscarpenter
Copy link

Using a raw string.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is automatically generated! # Generated content DO NOT EDIT as the header. If you want to modify something it should be the doc here: https://github.com/huggingface/tokenizers/blob/main/bindings/python/src/pre_tokenizers.rs#L344

@wyattscarpenter
Copy link
Author

Thanks. So it's something to do with the transformation from a rust docstring.

/// This pre-tokenizer simply splits using the following regex: `\w+|[^\w\s]+`

This is a valid rust docstring, but whatever turns it into a python docstring needs to be updated to respect the python escape rules. (Or, maybe, it wants to let you use python escapes, and so we must correct this docstring to escape the escapes? Hmm...)

@wyattscarpenter
Copy link
Author

I don't really know how to deal with that myself, offhand, so I guess I'll just pivot to opening an issue about it instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants