Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

processor.tokenize support non-lang-syms #796

Closed
yzmyyff opened this issue Nov 17, 2021 · 2 comments · Fixed by #819
Closed

processor.tokenize support non-lang-syms #796

yzmyyff opened this issue Nov 17, 2021 · 2 comments · Fixed by #819

Comments

@yzmyyff
Copy link
Contributor

yzmyyff commented Nov 17, 2021

We can assign --non-lang-syms in the old io framework, such as text2token.py and format_data. It leaves a special symbol([laugh] or [noise]) as it is.

But in uio we don't support this feature.

Do we have any plan for this? Or how do I achieve the same purpose?

@robin1001
Copy link
Collaborator

I didn't go through the details of --non-lang-syms. do you have the bandwidth to support that?

@yzmyyff
Copy link
Contributor Author

yzmyyff commented Nov 19, 2021

I'd like to but I'm not free recently. Maybe next week or after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants