-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add multiple gsite feature #239
add multiple gsite feature #239
Conversation
YujingYang666777
commented
Nov 28, 2023
- modified SearchTool class in the tools.py file
- added 4 unit testing
- passed the integration testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I think this is a great feature to have. I've left some comments.
src/sherpa_ai/tools.py
Outdated
if self.config.gsite: | ||
gsite_list = self.config.gsite.split(", ") | ||
gsite_list = [i for i in gsite_list if i != " " and i != "\n" and i != None] | ||
if False in [validate_url(i) for i in gsite_list]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit duplicated as we are doing this check for every call of the search. At this point, the URL in the configuration should already be valid. I think we should put this check inside the AgentConfig. We can create a new attribute in the AgentConfig called search_domains
as a list. Then, we should parse the gsite
string into list of URLs in this method. Then we don't need to repeat this check here.
src/sherpa_ai/tools.py
Outdated
args={"error": f"The input URL is not valid"}, | ||
) | ||
gsite_list = [query + " site:" + i for i in gsite_list] | ||
if len(gsite_list) >= 5: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this truncation can be done in the AgentConfig
as well.
src/sherpa_ai/tools.py
Outdated
|
||
else: | ||
gsite_list = [query] | ||
top_k = int(10 / len(gsite_list)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make 10
as a parameter to the search method
src/sherpa_ai/tools.py
Outdated
gsite_list = self.config.gsite.split(", ") | ||
gsite_list = [i for i in gsite_list if i != " " and i != "\n" and i != None] | ||
if False in [validate_url(i) for i in gsite_list]: | ||
return TaskAction( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we return TaskAction for error?
assert error == expected_error | ||
|
||
def test_search_query_invalid_format(): | ||
site = "https://www.google.com,https://www.langchain.com,https://openai.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same above
) | ||
assert search_result is not None | ||
assert search_result == expected_result | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another test is to test if the results can be combined correctly. For this, we should do a Mock of the Google Search API. We can discuss this in more detail separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now, merging...