Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: assign current_time to datetime.now() if current_time is None #5045

Conversation

mbchang
Copy link
Contributor

@mbchang mbchang commented May 20, 2023

Assign current_time to datetime.now() if it current_time is None in time_weighted_retriever

Fixes #4825

As implemented, add_documents in TimeWeightedVectorStoreRetriever assigns doc.metadata["last_accessed_at"] and doc.metadata["created_at"] to datetime.datetime.now() if current_time is not in kwargs.

    def add_documents(self, documents: List[Document], **kwargs: Any) -> List[str]:
        """Add documents to vectorstore."""
        current_time = kwargs.get("current_time", datetime.datetime.now())
        # Avoid mutating input documents
        dup_docs = [deepcopy(d) for d in documents]
        for i, doc in enumerate(dup_docs):
            if "last_accessed_at" not in doc.metadata:
                doc.metadata["last_accessed_at"] = current_time
            if "created_at" not in doc.metadata:
                doc.metadata["created_at"] = current_time
            doc.metadata["buffer_idx"] = len(self.memory_stream) + i
        self.memory_stream.extend(dup_docs)
        return self.vectorstore.add_documents(dup_docs, **kwargs)

However, from the way add_documents is being called from GenerativeAgentMemory, current_time is set as a kwarg, but it is given a value of None:

    def add_memory(
        self, memory_content: str, now: Optional[datetime] = None
    ) -> List[str]:
        """Add an observation or memory to the agent's memory."""
        importance_score = self._score_memory_importance(memory_content)
        self.aggregate_importance += importance_score
        document = Document(
            page_content=memory_content, metadata={"importance": importance_score}
        )
        result = self.memory_retriever.add_documents([document], current_time=now)

The default of now was set in #4658 to be None. The proposed fix is the following:

    def add_documents(self, documents: List[Document], **kwargs: Any) -> List[str]:
        """Add documents to vectorstore."""
        current_time = kwargs.get("current_time", datetime.datetime.now())
        # `current_time` may exist in kwargs, but may still have the value of None.
        if current_time is None:
            current_time = datetime.datetime.now()

Alternatively, we could just set the default of now to be datetime.datetime.now() everywhere instead. Thoughts @hwchase17? If we still want to keep the default to be None, then this PR should fix the above issue. If we want to set the default to be datetime.datetime.now() instead, I can update this PR with that alternative fix. EDIT: seems like from #5018 it looks like we would prefer to keep the default to be None, in which case this PR should fix the error.

Before submitting

Who can review?

Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested:

@hwchase17
@dev2049

Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if doing this, can just do current_time = kwargs.get("current_time") in line above

@mbchang
Copy link
Contributor Author

mbchang commented May 21, 2023

current_time = kwargs.get("current_time") is not sufficient, because even though the variable current_time exists as a kwarg, the value passed for the current_time is None. This causes doc.metadata["last_accessed_at"] = None and doc.metadata["created_at"] = None, which causes issue #4825

@hwchase17
Copy link
Contributor

current_time = kwargs.get("current_time") is not sufficient, because even though the variable current_time exists as a kwarg, the value passed for the current_time is None. This causes doc.metadata["last_accessed_at"] = None and doc.metadata["created_at"] = None, which causes issue #4825

i mean keep the added lines, but just remove the default of datatime.now - since you check for None after, it doesnt do anything

@dev2049 dev2049 mentioned this pull request May 22, 2023
@mbchang
Copy link
Contributor Author

mbchang commented May 22, 2023

oh yeah makes sense; I've updated it

@dev2049 dev2049 merged commit e173e03 into langchain-ai:master May 22, 2023
hwchase17 pushed a commit that referenced this pull request May 24, 2023
…ever (#5155)

# Same as PR #5045, but for async

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes #4825 

I had forgotten to update the asynchronous counterpart `aadd_documents`
with the bug fix from PR #5045, so this PR also fixes `aadd_documents`
too.

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@dev2049

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->
vowelparrot pushed a commit that referenced this pull request May 24, 2023
…ever (#5155)

# Same as PR #5045, but for async

<!--
Thank you for contributing to LangChain! Your PR will appear in our next
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.
-->

<!-- Remove if not applicable -->

Fixes #4825 

I had forgotten to update the asynchronous counterpart `aadd_documents`
with the bug fix from PR #5045, so this PR also fixes `aadd_documents`
too.

## Who can review?

Community members can review the PR once tests pass. Tag
maintainers/contributors who might be interested:
@dev2049

<!-- For a quicker response, figure out the right person to tag with @

        @hwchase17 - project lead

        Tracing / Callbacks
        - @agola11

        Async
        - @agola11

        DataLoaders
        - @eyurtsev

        Models
        - @hwchase17
        - @agola11

        Agents / Tools / Toolkits
        - @vowelparrot
        
        VectorStores / Retrievers / Memory
        - @dev2049
        
 -->
@danielchalef danielchalef mentioned this pull request Jun 5, 2023
This was referenced Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TypeError: unsupported operand type(s) for -: 'datetime.datetime' and 'NoneType'
3 participants