-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Recursive Chunking strategy #8548
Comments
@davidsbatista This sounds great! One idea I had for this is some way to indicate that we'd like to utilize something like NLTK to do sentence splitting. So normally I think the list of separator characters would look like What do you think? |
Also I wanted to ask will the splitting by separators (e.g. |
that's a good suggestions, I will take it into consideration |
I would suggest using I think we could use the What do you say? Also, this |
That sounds good to me!
Yes I also agree. Let's reuse that and move it into utils.
This is totally correct! I asked the same question here and it does seem like we would like to merge these two in the future. Sounds like we should open an issue for this. |
|
Hey @davidsbatista~ The idea seems to be exactly how Link: semchunk Also, can we add support for more chunking methods? Full disclosure: I write a lot of chunking and splitting methods at Chonkie Thanks! 😊 |
Use a set of predefined separators to split text recursively. The process follows these steps:
The text was updated successfully, but these errors were encountered: