Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify DocumentSplitter and NLTKDocumentSplitter #8600

Closed
davidsbatista opened this issue Dec 3, 2024 · 0 comments · Fixed by #8617
Closed

Unify DocumentSplitter and NLTKDocumentSplitter #8600

davidsbatista opened this issue Dec 3, 2024 · 0 comments · Fixed by #8617
Assignees
Labels
P2 Medium priority, add to the next sprint if no P1 available
Milestone

Comments

@davidsbatista
Copy link
Contributor

These two classes are very much alike. The only difference is that the NLTKDocumentSplitter uses NLTK's sentence boundary detection algorithm. We should merge those two into one single component.

It could still be possible to give the user the choice to either use a naive approach for sentence boundary detection (e.g., ".") or, if he/she wishes so, use NLTK sentence boundary detection.

@davidsbatista davidsbatista self-assigned this Dec 3, 2024
@julian-risch julian-risch added the P2 Medium priority, add to the next sprint if no P1 available label Dec 6, 2024
@davidsbatista davidsbatista added this to the 2.9.0 milestone Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Medium priority, add to the next sprint if no P1 available
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants