Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search Refinement #393

Merged
merged 4 commits into from
Jul 17, 2024
Merged

Search Refinement #393

merged 4 commits into from
Jul 17, 2024

Conversation

Bubbletea98
Copy link
Contributor

@Bubbletea98 Bubbletea98 commented Jun 23, 2024

Description

Resolve #363

Added refinement function to extract key answers by running LLM with input query.

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Maintenance
  • New release

Related issues

Mention related GitHub and Linear issues. E.g. Closes #xxx or Fixes #xxx. Otherwise delete this section.

Checklists

To speed up the review process, please follow these checklists:

Development

  • The Pull Request is small and focused on one topic
  • Lint rules pass locally (make format && make lint)
  • The code changed/added as part of this pull request has been covered with tests
  • All tests related to the changed code pass in development (make test)
  • The changes generate no new warnings (or explain any new warnings and why they're ok)
  • Commit messages are detailed
  • Changed code is self-explanatory and/or I added comments
  • I updated the documentation (docstrings, /docs)
    See the testing guidelines for help on tests, especially those involving web services.

Code review

  • This pull request has a descriptive title and information useful to a reviewer. There may be a screenshot or screencast attached.
  • I have performed a self-review of my code
  • Issue from task tracker has a link to this pull request

💔 Thank you for submitting a pull request!

@20001LastOrder
Copy link
Collaborator

@Eyobyb . I'm good with these changes. Please also take a look at them.

@Eyobyb
Copy link
Collaborator

Eyobyb commented Jul 1, 2024

"Have you encountered hallucinations with this? It adds unwanted details and, rather than rearranging them, it elaborates on them.

Take this example:


document = [
     "Iron Man fears Hulk more than anybody.",
     "Hulk was named the strongest Avenger on Sakaar.",
     "Natasha loves Bruce Banner.",
     "SHIELD built a contingency plan only for Hulk if he gets angry."
]

query = "Why is Hulk the strongest Avenger?"

It returns all of them in the same sequence, but it elaborates on each document to make sense based on the question."

@20001LastOrder
Copy link
Collaborator

This is an interesting observation. I think in general if the answer is relevant I'm fine that the LLM elaborates them. But we should probably fix the prompt so that it filters documents that are not relevant (e.g. give an empty string). For now, it seems that the LLM gives a description about why the document is not relevant instead of gives an empty string.

Also, the prompt can be configured by the user. But we should try to give a solid baseline.

@20001LastOrder
Copy link
Collaborator

Also notice that this feature is about extracting relevant information from a document, instead of reranking them

@Bubbletea98
Copy link
Contributor Author

Hi @Eyobyb thank you so much for your comment. Improved prompting to reduce hallucinations and added a function to remove irrelevant answers.

@amirfz
Copy link
Contributor

amirfz commented Jul 2, 2024 via email

@amirfz
Copy link
Contributor

amirfz commented Jul 3, 2024

image

@20001LastOrder
Copy link
Collaborator

@Bubbletea98 Lets move ahead with the new refinery so we can move this PR forward!

…t will only come from original input sentences. 1. tokenize and index sentences 2. let llm pick the relevant sentences' index number. Updated test case in refinement.py as well.
@Bubbletea98
Copy link
Contributor Author

Bubbletea98 commented Jul 11, 2024

Hi Team, updated the refinement function:

  • Added a new refinement function: RefinementBySentence
    • Tokenize input sentences and index each sentence
    • let LLM pick relevant sentences' index number by checking question(query) and tokenized input answer.
  • Added a new test case for RefinementBySentence: check if the refined output is only from input sentences
  • Removed {k} variable ( max(# sentence) )from RefinementByQuery

Please let me know if there are any additional conditions we need to include. :)

@amirfz amirfz merged commit acf9916 into Aggregate-Intellect:main Jul 17, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Refinement of Search Results
4 participants