DOC: Index requires that your documents table uses a GUID as your primary ID. #7607

alph486 · 2025-01-27T20:30:35Z

Checklist

I added a very descriptive title to this issue.
I included a link to the documentation page I am referring to (if applicable).

Issue with current documentation:

Re: https://js.langchain.com/docs/how_to/indexing

According to @langchain/core/dist/indexing/base.js, the index function generates hashes for each of your documents prior to indexing them. Then, it attempts to write them to the database using the ids optional parameter.

        if (docsToIndex.length > 0) {
            await vectorStore.addDocuments(docsToIndex, { ids: uids });
            numAdded += docsToIndex.length - seenDocs.size;
            numUpdated += seenDocs.size;
        }

If you have setup your tables to use an integer or other type of unique id for the table, this function will not work properly if I'm understanding correctly.

Idea or request for content:

Please correct the documentation to either: a) add this as a notice or b) Add an example or configuration on how to use it with a different type of id, as I expect this will be relatively common.

The text was updated successfully, but these errors were encountered:

nick-w-nick · 2025-02-02T07:47:18Z

@alph486 I am pretty sure that is just the name of the variable and isn't directly reflective of any actual format requirements. I personally have used multiple ID formats within my vector stores, many of which not being UUIDs.

For example, here are the docs from Pinecone that mention how you can even use custom delimiters in your IDs to make it easier to assign indexes to individual document chunks, implying that you can basically use anything you want for your document IDs as long as they are unique.

alph486 · 2025-02-02T14:53:58Z

@nick-w-nick were you using the 'index' feature when you achieved the result? This issue is related to the way the indexing feature handles the ids not the way that vector stores themselves handle them.

In this case index() is handling the vector store for you and assuming uuids afaict.

dosubot bot added the auto:documentation Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Index requires that your documents table uses a GUID as your primary ID. #7607

DOC: Index requires that your documents table uses a GUID as your primary ID. #7607

alph486 commented Jan 27, 2025

nick-w-nick commented Feb 2, 2025

alph486 commented Feb 2, 2025

DOC: Index requires that your documents table uses a GUID as your primary ID. #7607

DOC: Index requires that your documents table uses a GUID as your primary ID. #7607

Comments

alph486 commented Jan 27, 2025

Checklist

Issue with current documentation:

Idea or request for content:

nick-w-nick commented Feb 2, 2025

alph486 commented Feb 2, 2025