-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for the HyDE method in quey analysis for RAG plates #1413
base: main
Are you sure you want to change the base?
Conversation
Simulation functions (mock_openai_embedding, mock_azure_embedding, mock_gemini_embedding, and mock_ollama_embedding) have been added. Reason for adding: Fix the issue that static methods are not callable: The previous code parameterized the static method as a parameterized test, but the static method was not a callable object, resulting in a TypeError error.Factory.py
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #1413 +/- ##
===========================================
+ Coverage 30.64% 55.66% +25.01%
===========================================
Files 320 323 +3
Lines 19426 19458 +32
===========================================
+ Hits 5954 10831 +4877
+ Misses 13472 8627 -4845 ☔ View full report in Codecov by Sentry. |
config/config2.example.yaml
Outdated
@@ -20,6 +20,10 @@ embedding: | |||
embed_batch_size: 100 | |||
dimensions: # output dimension of embedding model | |||
|
|||
# RAG Analysis | |||
hyde: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the structure like to support more configuration inside rag
rag:
query:
hyde:
include_original: True
api_key: "YOUR_API_KEY" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to commit this file if there are no related changes.
examples/rag_pipeline.py
Outdated
from pydantic import BaseModel | ||
|
||
from metagpt.const import DATA_PATH, EXAMPLE_DATA_PATH | ||
from metagpt.logs import logger | ||
from metagpt.rag.engines import SimpleEngine | ||
from metagpt.rag.factories.HyDEQueryTransformFactory import HyDEQueryTransformFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file name usually in low case with '_'
examples/rag_pipeline.py
Outdated
@@ -212,6 +214,22 @@ async def init_and_query_es(self): | |||
answer = await engine.aquery(TRAVEL_QUESTION) | |||
self._print_query_result(answer) | |||
|
|||
async def use_HyDe(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_hyde
and keep in a uniform format, HyDE. No HyDe
metagpt/config2.py
Outdated
@@ -51,6 +52,9 @@ class Config(CLIParams, YamlModel): | |||
# RAG Embedding | |||
embedding: EmbeddingConfig = EmbeddingConfig() | |||
|
|||
# RAG Analysis | |||
hyde: HydeConfig = HydeConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HyDEConfig
@@ -0,0 +1,5 @@ | |||
from metagpt.utils.yaml_model import YamlModel | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use rag_config.py to support independent rag configuration
metagpt/rag/query_analysis/HyDE.py
Outdated
|
||
if self._include_original: | ||
embedding_strs.extend(query_bundle.embedding_strs) | ||
logger.info(f" Hypothetical doc:{embedding_strs} ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually not to print embedding, it's too long and not a good log str
examples/rag_pipeline.py
Outdated
engine = SimpleEngine.from_docs(input_files=[TRAVEL_DOC_PATH]) | ||
# create HyDE query engine | ||
hyde_query_transformr = HyDEQueryTransformFactory().create_hyde_query_transform() | ||
hyde_query_engine = TransformQueryEngine(engine, hyde_query_transformr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to integrate with SimpleEngine, not directly TransformQueryEngine.
What I means is that one engine entrance to support like query rewrite, rerank and so on.
examples/rag_pipeline.py
Outdated
# 1. save docs | ||
engine = SimpleEngine.from_docs(input_files=[TRAVEL_DOC_PATH]) | ||
# create HyDE query engine | ||
hyde_query_transformr = HyDEQueryTransformFactory().create_hyde_query_transform() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add datasets comparison result with/without HyDE method.
config/config2.example.yaml
Outdated
@@ -23,13 +23,9 @@ rag: | |||
# RAG Query Analysis | |||
query_analysis: | |||
hyde: | |||
include_original: true # In the query rewrite, determines whether to include the original | |||
include_original: True # In the query rewrite, determines whether to include the original |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true not True
metagpt/rag/query_analysis/hyde.py
Outdated
@@ -0,0 +1,63 @@ | |||
from typing import Any, Dict, Optional | |||
from llama_index.core.llms import LLM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why import this, not used
api_version: "" | ||
embed_batch_size: 100 | ||
dimensions: # output dimension of embedding model | ||
embedding: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't change this embedding one.
# Conflicts: # metagpt/config2.py
# Conflicts: # metagpt/config2.py
The configuration information and results from running the configurations with and without the HyDE method using
|
"""This example show how to use HyDE: HyDE enhances search results by generating Hypothetical doc(virtual | ||
article), for more details please refer to the paper: http://arxiv.org/abs/2212.10496 | ||
Query Result: | ||
Bob likes traveling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment correct?
from metagpt.configs.redis_config import RedisConfig | ||
from metagpt.configs.s3_config import S3Config | ||
from metagpt.configs.search_config import SearchConfig | ||
from metagpt.configs.workspace_config import WorkspaceConfig | ||
from metagpt.const import CONFIG_ROOT, METAGPT_ROOT | ||
from MetaGPT.metagpt.configs.rag_config import RAGConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be deleted
|
||
class RAGConfig(YamlModel): | ||
embedding: EmbeddingConfig = EmbeddingConfig() | ||
query_analysis: QueryAnalysisConfig = QueryAnalysisConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is recommended to add QueryAnalysisConfig and EmbeddingConfig in rag_config.py without hyde_config.py and query_analysis_config.py files.
Features
Added the HyDE method for query-analysis in the RAG module, including an example for better understanding.
Fixed the issue with the static methods in TestRAGEmbeddingFactory not being callable. The previous code passed static methods as parameters for parameterized testing, but static methods are not callable objects, leading to a TypeError. This was resolved by converting static methods to regular functions and defining them outside the class.
Feature Docs
No additional documentation provided.
Influence
As an optional process in RAG, query-analysis will rewrite queries to enhance search results.
Result
All unit tests for the new features have passed.
The query-analysis process in the RAG module runs smoothly, effectively rewriting and optimizing queries for better search results.
Other
Added a detailed description of the changes and fixes made in the submission.