-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tutorial for citation validation and fix a bug in it #371
Add tutorial for citation validation and fix a bug in it #371
Conversation
963e414
to
10dbeab
Compare
|
||
The `DocumentSearch` action inherit from the `BaseAction` class, which has a method `add_resources` that can be used to add a citation to the response. The `add_resources` method takes a list of dictionaries, each dictionary should contain the following keys: | ||
|
||
- `Document`: Content of the resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this specifically the chunk that was placed in the context? if so perhaps clarify that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some description to it.
|
||
The above example shows how to add citations to the Google search action. However, sometimes we may also want to add citations to the responses from the document search action. In this case, we need to manually add the citation to the response. | ||
|
||
The `DocumentSearch` action inherit from the `BaseAction` class, which has a method `add_resources` that can be used to add a citation to the response. The `add_resources` method takes a list of dictionaries, each dictionary should contain the following keys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I feel like this should be "add_source", not "add_resource". like you're adding the "source" of the insight, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think since the method takes both the source of the document as well as the document content, we should keep the name as add_resource
|
||
Ask me a question: What is data leakage | ||
2024-05-09 00:24:57.552 | INFO | sherpa_ai.agents.base:run:70 - Action selected: ('DocumentSearch', {'query': 'What is data leakage'}) | ||
Data leakage refers to the potential for data to be unintentionally exposed or disclosed to unauthorized parties [1](doc:chunk_5), [3](doc:chunk_45). In the context provided, data leakage is discussed in relation to the presence of inter-dataset code duplication and the implications for the evaluation of language models in software engineering research [1](doc:chunk_5). It is highlighted as a potential threat that researchers need to consider when working with pre-training and fine-tuning datasets for language models [1](doc:chunk_5). By acknowledging the risk of data leakage due to code duplication, researchers can enhance the robustness of their evaluation methodologies and improve the validity of their results [1](doc:chunk_5). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"chunk_5" etc is what you're calling "source" above? does that need to be a unique ID? in some cases I'm assuming that we would want to allow the front end to show the text of the chunk when clicked on this link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added instructions to output the chunk table so that one can check the chunk associated with the chunk id
Description
Type of change
Checklists
To speed up the review process, please follow these checklists:
Development
make format && make lint
)make test
)See the testing guidelines for help on tests, especially those involving web services.
Code review
💔 Thank you for submitting a pull request!