Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cohere rerank exception: invalid request: list of documents must not be empty #7370

Open
5 tasks done
ahm750 opened this issue Dec 15, 2024 · 4 comments · May be fixed by #7372
Open
5 tasks done

Cohere rerank exception: invalid request: list of documents must not be empty #7370

ahm750 opened this issue Dec 15, 2024 · 4 comments · May be fixed by #7372
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@ahm750
Copy link

ahm750 commented Dec 15, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code


const retriever = vectorstore.asRetriever({
k: 10,
});

const cohereRerank = new CohereRerank({
    apiKey: COHERE_API_KEY,
    model: "rerank-v3.5",
    topN: 3,
});

const contextualCompressionRetriever = new ContextualCompressionRetriever({
    baseCompressor: cohereRerank,
    baseRetriever: retriever,
});


const historyAwareCompressedRetriever = await createHistoryAwareRetriever({
    llm: qaGeneratorModel,
    retriever: contextualCompressionRetriever,
    rephrasePrompt: contextualizeQPrompt,
});


const ragChain = await createRetrievalChain({
    retriever: historyAwareCompressedRetriever,
    combineDocsChain: questionAnswerChain,
});

const stream = await ragChain.stream({
    chat_history: chatHistoryParsedArray,
    input: sanitizedText,
});

Error Message and Stack Trace (if applicable)

BadRequestError
Status code: 400
Body: {
"message": "invalid request: list of documents must not be empty"
}
/node_modules/langchain/dist/chains/combine_documents/base.js:8
const formattedDocs = await Promise.all(documents.map((document) => documentPrompt
^

TypeError: Cannot read properties of undefined (reading 'map')
at formatDocuments (/node_modules/langchain/dist/chains/combine_documents/base.js:8:55)
at RunnableLambda.func (/node_modules/langchain/dist/chains/combine_documents/stuff.js:27:90)
at /node_modules/@langchain/core/dist/runnables/base.js:1735:44
at MockAsyncLocalStorage.run (/node_modules/@langchain/core/dist/singletons/async_local_storage/index.js:10:16)
at AsyncLocalStorageProvider.runWithConfig (/node_modules/@langchain/core/dist/singletons/async_local_storage/index.js:53:24)
at output (/node_modules/@langchain/core/dist/runnables/base.js:1733:53)
at new Promise ()
at RunnableLambda._transform (/node_modules/@langchain/core/dist/runnables/base.js:1732:30)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async RunnableLambda._transformStreamWithConfig (/node_modules/@langchain/core/dist/runnables/base.js:301:30)

Description

I'm using the Cohere reranker, and it throws an exception when the retriever has empty documents. This prevents the AI for generating a response.

The ideal behavior should be to return an empty list so that the AI can generate a response based on the instructions we've given it.

Edit: this issue is similar to the ones reported in the Java and Python versions of Langchain, which were fixed a long time ago:

langchain4j/langchain4j#770
langchain-ai/langchain#5304

System Info

Package versions:
"langchain": "^0.3.7"
"@langchain/cohere": "^0.3.1"
"@langchain/core": "^0.3.23"

Copy link

dosubot bot commented Dec 15, 2024

To resolve the "Cohere rerank exception: invalid request: list of documents must not be empty" error, ensure that the list of documents passed to the CohereRerank is not empty. This error occurs because the CohereRerank class requires a non-empty list of documents to function correctly.

Here's what you can do:

  1. Check the Retriever Output: Before passing the documents to the CohereRerank, verify that the retriever (vectorstore.asRetriever) is returning a non-empty list of documents. You can add a check to ensure that the list is not empty before proceeding with the reranking.

  2. Handle Empty Document Lists: Implement a conditional check to handle cases where the retriever returns an empty list. You might want to log a message or handle this scenario gracefully by skipping the reranking step if no documents are retrieved.

Here's a basic example of how you might implement such a check:

const documents = await retriever.retrieve(sanitizedText);

if (documents.length === 0) {
  console.error("No documents retrieved. Skipping reranking.");
  // Handle the empty case as needed, e.g., return a default response or log an error.
} else {
  const rerankedDocuments = await cohereRerank.compressDocuments(documents, sanitizedText);
  // Proceed with using rerankedDocuments
}

By ensuring that the list of documents is not empty before calling the rerank method, you can avoid this error [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Dec 15, 2024
@ahm750
Copy link
Author

ahm750 commented Dec 15, 2024

P.S: This is similar to the issues reported in the Java and Python versions of Langchain, which were fixed a long time ago:

@hwchase17 Could you apply the patch from the above packages to langchainjs?

@ahm750
Copy link
Author

ahm750 commented Dec 15, 2024

Below are the two issues that are causing this error and the proposed fixes:

  1. In the formatDocuments methods in @langchain/dist/chains/combine_documents/base.js, an empty string should be returned if documents is empty or null. Since this check is not implemented, it throws an exception when documents.map() is called.

Example:

export async function formatDocuments({ documentPrompt, documentSeparator, documents, config, }) {
    // PROPOSED FIX
    if (documents == null || documents.length == 0) {
        return "";
    }

    const formattedDocs = await Promise.all(documents.map((document) => documentPrompt
        .withConfig({ runName: "document_formatter" })
        .invoke({ ...document.metadata, page_content: document.pageContent }, config)));
    return formattedDocs.join(documentSeparator);
}
  1. In compressDocuments method in @langchain/cohere/dist/rerank.js, when documents is empty, it should return an empty array.

Example:

    async compressDocuments(documents, query) {
        // PROPOSED FIX
        if (documents == null || documents.length == 0) {
            return [];
        }
        const _docs = documents.map((doc) => doc.pageContent);
        const { results } = await this.client.rerank({
            model: this.model,
            query,
            documents: _docs,
            topN: this.topN,
            maxChunksPerDoc: this.maxChunksPerDoc,
        });
        const finalResults = [];
        for (let i = 0; i < results.length; i += 1) {
            const result = results[i];
            const doc = documents[result.index];
            doc.metadata.relevanceScore = result.relevanceScore;
            finalResults.push(doc);
        }
        return finalResults;
    }

ahm750 added a commit to ahm750/langchainjs that referenced this issue Dec 16, 2024
ahm750 added a commit to ahm750/langchainjs that referenced this issue Dec 16, 2024
…nts if the documents parameter is null or empty
@ahm750
Copy link
Author

ahm750 commented Dec 16, 2024

Created PRs to address this issue. Please review. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
1 participant