Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError on empty input text #142

Closed
lifeiteng opened this issue Nov 25, 2024 · 2 comments
Closed

AssertionError on empty input text #142

lifeiteng opened this issue Nov 25, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@lifeiteng
Copy link

from wtpsplit import SaT

sat = SaT("sat-3l")
# optionally run on GPU for better performance
# also supports TPUs via e.g. sat.to("xla:0"), in that case pass `pad_last_batch=True` to sat.split
sat.half().to("cuda")

[v for v in sat.split(["This is a test This is another test.", ""])]
wtpsplit/wtpsplit/extract.py", line 194, in extract
    assert current_chunk == num_chunks
AssertionError
@markus583 markus583 added the bug Something isn't working label Nov 27, 2024
@pf-crypto12
Copy link

pf-crypto12 commented Nov 29, 2024

Similar issue here. It can also happen when the text given is made of newlines (ex: \r\n\r\n\r\n\r\n\r\n\r\n) which is then tokenized into an empty string and finally yield the same assertion error.

@markus583
Copy link
Collaborator

Hi, thanks for raising this. Both cases are now handled with the current version (2.1.2), which I just released. Please let me know if this fixes your issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants