Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GiNZA >= 5.1 cannot process long (over 49149 bytes) texts #242

Open
TatsuyaShirakawa opened this issue Mar 28, 2022 · 0 comments
Open

GiNZA >= 5.1 cannot process long (over 49149 bytes) texts #242

TatsuyaShirakawa opened this issue Mar 28, 2022 · 0 comments

Comments

@TatsuyaShirakawa
Copy link

TatsuyaShirakawa commented Mar 28, 2022

This is essentially due to sudachi.rs's limitation but texts over 49149 bytes cannot be processed by GiNZA >= 5.1.

According to the sudachi.rs's code, the maximum text length (in bytes) is defined as u16::MAX / 4 * 3 (= 49149), so if a given text is longer than this size in bytes, sudachipy (sudachi.rs) raises an InputTooLong error.

Here is the related lines in the sudachi.rs's repo, which might help.

(I personally asked sudachi's developpers about this limitation and they gave me a feedback that the max length (u16::MAX / 4 * 3) is chosen for performance.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant