-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix(vector-io): handle missing document_id in insert_chunks #3521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(vector-io): handle missing document_id in insert_chunks #3521
Conversation
cc: @leseb as per our discussions here is the bug fix. Thanks. |
Fixed KeyError when chunks don't have document_id in metadata or chunk_metadata. Updated logging to safely extract document_id using getattr and RAG memory to handle different document_id locations. Added test for missing document_id scenarios. Fixes issue llamastack#3494 where /v1/vector-io/insert would crash with KeyError.
- Gate debug logging behind isEnabledFor check to avoid unnecessary computation - Add Chunk.document_id property to safely handle metadata/chunk_metadata extraction - Simplify RAG memory code using new property
db366c4
to
2510bd3
Compare
Required by project logging rules to use logging.DEBUG constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please address comment. lgtm otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add logging back in but otherwise lgtm too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add logging back in but otherwise lgtm too
Simplified logging to always execute instead of gating behind isEnabledFor check. Main fix is using safe chunk.document_id property instead of direct metadata access.
Added back the return statement in insert_chunks for consistency with the document_id property pattern where None is explicitly returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!
Address PR feedback to validate document_id type when accessed from metadata dict. Since metadata is dict[str, Any], we now fail fast with a clear TypeError when document_id exists but isn't a string, rather than silently returning None. Also removed redundant isinstance check for chunk_metadata.document_id since Pydantic already guarantees it's str | None. Added test coverage for the new validation behavior.
@ehhuang your comments have been taken care of, final look? Thanks! |
Fixed KeyError when chunks don't have document_id in metadata or chunk_metadata. Updated logging to safely extract document_id using getattr and RAG memory to handle different document_id locations. Added test for missing document_id scenarios.
Fixes issue #3494 where /v1/vector-io/insert would crash with KeyError.
Fixed KeyError when chunks don't have document_id in metadata or chunk_metadata. Updated logging to safely extract document_id using getattr and RAG memory to handle different document_id locations. Added test for missing document_id scenarios.
What does this PR do?
Fixes a KeyError crash in
/v1/vector-io/insert
when chunks are missingdocument_id
fields. The APIwas failing even though
document_id
is optional according to the schema.Closes #3494
Test Plan
Before fix:
/v1/vector-io/insert
with chunks → 500 KeyErrordocument_id
was placedAfter fix:
document_id
scenarios