-
-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add content chunkers #36
Comments
@rickychilcott @alchaplinsky Is this something that could be useful here? https://github.com/moekidev/baran @moekidev |
I think the Langchainrb's module style is excellent. I actually have a project where I'm using only the chunker without Langchain's LLM clients. |
@moekidev i was just asking if we could use your Baran library in this Langchain.rb project. |
@andreibondarev Sorry. I hope so. I just wanted to say that I think it would be better if this Langchainrb didn't have a chunker internally, regardless of whether Baran is useful or not. |
Oh! Why do you say that? Why not have chunkers here, in Langchainrb? |
@andreibondarev I don't think it's a good idea to make everything in-house, like versions of Python or TypeScript. For example, even for a chunker, commits get dispersed because they overlap with other LLM full stack libraries. I think it's more effective to have the community commit as an independent library, as this can ultimately lead to richer and safer features. So, regardless of whether it's Baran or not, I think it would be better to call on an external chunker library. This is the reason why I chose to release Baran instead of committing to Langchainrb. |
@moekidev - this is an interesting take. I would like to design a chunking system in Langchain.rb that will allow for simple chunking, but allow developers to utilize their own chunking systems as needed. Given the work you've done on Baran, when I build this out, I may just use your libraries under the hood for our standard chunking, for now. This will allow me to focus on the interfaces of the classes and not re-invent the wheel. This will perhaps be most fruitful in allowing us to then be compatible (or at least play nicely) with other external libraries. I get the idea that Baran has utility outside of Langchain.rb so a separate gem is useful, but I think we want to provide a good development experience, batteries included, without having to find, locate, and stitch together many libraries -- especially for new AI/ML developers. |
@rickychilcott @andreibondarev That sounds good. If we maintain these perspectives, Langchainrb might be able to achieve a more sustainable design than the other language versions! Should I try adding a class that uses Baran to Langchainrb on my end? |
@rickychilcott is in middle of it right now I think! |
Hey there ✌️ The problem when adding data is actually using Another solution to process large files could be:
I'll proceed with that on my side, that could prob be a solution for large .docx files as well. WDYT? |
I know I took a stab at #40 a while ago but I just haven’t had the time to dedicate to trying again after some conversations and some of the major class changes. I’ve got some other things that I’m working on for this project and for my primary job. But I don’t imagine I’ll be able to attempt again for another week or two. So if anyone wants to pick this up.. by all means. I’ve held this hostage for too long. Sorry. |
I went ahead and added a basic text chunker: #188. I'd like to get some feedback and then we can figure out what the next iteration looks like. For example we want may to automatically pick a chunking strategy based on the original file format. I'm closing this issue for now. |
No description provided.
The text was updated successfully, but these errors were encountered: