Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated characters, underscore and comma preprocessors to be TorchScriptable. #3602
Updated characters, underscore and comma preprocessors to be TorchScriptable. #3602
Changes from all commits
3e6ce11
95f0501
42ba7e3
23459ec
5635f8d
6fa1e63
7401aa6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you are adapting an existing implementation, though this seems more complicated than I would expect (for example, why do we have a
get_tokens()
function that returns its own input?).@geoffreyangus, ooc does this also look strange to you, or is this imposed on us by torchscript?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like
NgramTokenizer
, which subclassesSpaceStringToListTokenizer
(which in turn subclasses the newStringSplitTokenizer
), seems to overrideget_tokens
: https://github.com/ludwig-ai/ludwig/pull/3602/files#diff-5cbace55f4f4fd07725c061b9f981b83fe43cb53b0045cf1257c9fb5d4931f0dR132-R142