-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds semantic cache scoped access and additional functionality #180
Conversation
Args: | ||
document_ids (Union[str, List[str]]): The document ID or IDs to remove from the cache. | ||
""" | ||
if isinstance(document_ids, List): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a similar theme to the work around clear()
, to me this feels like an addition to that method by supporting an optional list of ids? by default clear removes everything. Maybe if this list is provided, instead of clearing everything it will just focus on using this pipeline code to clear the subset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of combining methods. What about something like
clear(keys: Optional[List[str], str]):
"""Remove all or specific entries from the cache by it's ID.
Args:
document_ids (Union[str, List[str]]): The document ID or IDs to remove from the cache. If none are provided then all documents are removed.
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will move this pipeline code into index.
@@ -16,6 +18,9 @@ class SemanticCache(BaseLLMCache): | |||
entry_id_field_name: str = "id" | |||
prompt_field_name: str = "prompt" | |||
vector_field_name: str = "prompt_vector" | |||
inserted_at_field_name: str = "inserted_at" | |||
updated_at_field_name: str = "updated_at" | |||
tag_field_name: str = "scope_tag" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to understand this data model a bit more. As it stands here, it looks like there would be 1 field for scoping (scope_tag
) and filtering. What if I have both year, location, and user ID data that I'd like to filter my cache check on?
I think we might need to generalize a bit more and support a set of optional/customizable filterable fields (providable to the class on init??)
So if staying with Hash, the schema could be something like:
id
prompt
prompt_vector
response
inserted_at
updated_at
metadata
scope_tag
???- ?
Just thinking out loud!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment, but I will address it later in my PR this evening!
@@ -182,6 +194,14 @@ def delete(self) -> None: | |||
index.""" | |||
self._index.delete(drop=True) | |||
|
|||
def drop(self, document_ids: Union[str, List[str]]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for this, are we expecting users to pass the full redis key? or just the id portion (without prefix)? I think the terminology we try to use for this throughout the library is id
when we are referring to the part without the prefix. So maybe we just use ids
instead of document_ids
?
This PR is the first of a few PRs that adds to RedisVL’s existing semantic cache class more functionality around dropping and updating cache entries. It also adds scoped access control and meta data fields.