Skip to content

Conversation

skamenan7
Copy link
Contributor

Fixed KeyError when chunks don't have document_id in metadata or chunk_metadata. Updated logging to safely extract document_id using getattr and RAG memory to handle different document_id locations. Added test for missing document_id scenarios.

Fixes issue #3494 where /v1/vector-io/insert would crash with KeyError.
Fixed KeyError when chunks don't have document_id in metadata or chunk_metadata. Updated logging to safely extract document_id using getattr and RAG memory to handle different document_id locations. Added test for missing document_id scenarios.

What does this PR do?

Fixes a KeyError crash in /v1/vector-io/insert when chunks are missing document_id fields. The API
was failing even though document_id is optional according to the schema.

Closes #3494

Test Plan

Before fix:

  • POST to /v1/vector-io/insert with chunks → 500 KeyError
  • Happened regardless of where document_id was placed

After fix:

  • Same request works fine → 200 OK
  • Tested with Postman using FAISS backend
  • Added unit test covering missing document_id scenarios

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 22, 2025
@skamenan7 skamenan7 marked this pull request as ready for review September 22, 2025 21:25
@skamenan7
Copy link
Contributor Author

cc: @leseb as per our discussions here is the bug fix. Thanks.

@skamenan7 skamenan7 requested a review from ehhuang September 24, 2025 21:20
Fixed KeyError when chunks don't have document_id in metadata or chunk_metadata.
Updated logging to safely extract document_id using getattr and RAG memory
to handle different document_id locations. Added test for missing document_id scenarios.

Fixes issue llamastack#3494 where /v1/vector-io/insert would crash with KeyError.
- Gate debug logging behind isEnabledFor check to avoid unnecessary computation
- Add Chunk.document_id property to safely handle metadata/chunk_metadata extraction
- Simplify RAG memory code using new property
@skamenan7 skamenan7 force-pushed the fix/vector-io-document-id-keyerror-3494 branch from db366c4 to 2510bd3 Compare September 25, 2025 18:00
Required by project logging rules to use logging.DEBUG constant
Copy link
Contributor

@ehhuang ehhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please address comment. lgtm otherwise

Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add logging back in but otherwise lgtm too

@franciscojavierarceo franciscojavierarceo dismissed their stale review October 7, 2025 14:41

Hit wrong button.

Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add logging back in but otherwise lgtm too

Simplified logging to always execute instead of gating behind isEnabledFor check.
Main fix is using safe chunk.document_id property instead of direct metadata access.
Added back the return statement in insert_chunks for consistency with the
document_id property pattern where None is explicitly returned.
Copy link
Collaborator

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

Address PR feedback to validate document_id type when accessed from metadata dict.
Since metadata is dict[str, Any], we now fail fast with a clear TypeError when
document_id exists but isn't a string, rather than silently returning None.

Also removed redundant isinstance check for chunk_metadata.document_id since
Pydantic already guarantees it's str | None.

Added test coverage for the new validation behavior.
@skamenan7 skamenan7 requested a review from leseb October 9, 2025 21:08
@leseb
Copy link
Collaborator

leseb commented Oct 10, 2025

@ehhuang your comments have been taken care of, final look? Thanks!

@skamenan7 skamenan7 requested a review from leseb October 13, 2025 12:39
@skamenan7 skamenan7 requested a review from ehhuang October 15, 2025 17:54
@ehhuang ehhuang merged commit bc8b377 into llamastack:main Oct 15, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vector-IO Insert API fails with KeyError: 'document_id' on all requests

4 participants