Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #29759: Use local chunk_size_ for looping in embed_documents #29761

Merged
merged 3 commits into from
Feb 13, 2025

Conversation

chaymaeelaattabi
Copy link
Contributor

This fix ensures that the chunk size is correctly determined when processing text embeddings. Previously, the code did not properly handle cases where chunk_size was None, potentially leading to incorrect chunking behavior.

Now, chunk_size_ is explicitly set to either the provided chunk_size or the default self.chunk_size, ensuring consistent chunking. This update improves reliability when processing large text inputs in batches and prevents unintended behavior when chunk_size is not specified.

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Feb 12, 2025
Copy link

vercel bot commented Feb 12, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Feb 13, 2025 1:24am

@dosubot dosubot bot added Ɑ: embeddings Related to text embedding models module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Feb 12, 2025
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Feb 13, 2025
@@ -573,7 +573,7 @@ def embed_documents(
chunk_size_ = chunk_size or self.chunk_size
if not self.check_embedding_ctx_length:
embeddings: List[List[float]] = []
for i in range(0, len(texts), self.chunk_size):
for i in range(0, len(texts), chunk_size_):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Interestingly, this is already done for the async method.

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Feb 13, 2025
@ccurme ccurme enabled auto-merge (squash) February 13, 2025 01:24
@ccurme ccurme merged commit 4b08a7e into langchain-ai:master Feb 13, 2025
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: embeddings Related to text embedding models module lgtm PR looks good. Use to confirm that a PR is ready for merging. size:S This PR changes 10-29 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants