Facing issue generating correct evaluation report for RAG application #1997
-
Hello, I'm currently evaluating my RAG application, which uses a WebSocket for interaction. I successfully created the knowledge base, generated the test set, and performed the evaluation. However, when I check the evaluation report, I see the following:
I’ve implemented an asynchronous method to call the WebSocket API and get the response. This same method is used in the get_answer function. This is the code that I am using currently from dotenv import load_dotenv, find_dotenv _ = load_dotenv(find_dotenv()) openai.api_key = os.getenv("OPENAI_API_KEY") set_llm_api("openai") sid = "a3bf-cac7defc43aa" -------- To generate test suite and evaluate the application ----------CHUNK_SIZE = 800 def split_into_chunks(content, chunk_size, overlap_size): def read_files_from_folder(folder_path):
folder_path = "src/main/giskardevaluation/docs" df = pd.DataFrame(file_data, columns=["file_name", "content"]) knowledge_base = KnowledgeBase(df) testset = generate_testset( testset.save("src/main/giskardevaluation/my_testset.jsonl") loaded_testset = QATestset.load("src/main/giskardevaluation/my_testset.jsonl") #------ Function to call websocket api and get response ----------- # async def receive_complete_response(websocket): async def get_answer_from_agent(messages): ----------- get answer function ---------------def get_answer_fn(question: str, history=None) -> str:
report = evaluate(get_answer_fn, testset=loaded_testset, knowledge_base=knowledge_base) report.to_html("src/main/giskardevaluation/rag_eval_report.html") Could anyone please help me understand why the generator, evaluator, and rewriter metrics are showing 0%, while the router and knowledge base are at 100%? Any guidance on where I might be going wrong would be greatly appreciated. Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hello, can you check manually that you get correct answers when you call your When all answers are incorrect, it is expected that the knowledge base and router components are at 100% (e.g. for the knowledge base component it is computed as the 1 minus the gap between the topics with best and worst correctness, but if every answer is incorrect, the gap is 0). The other components are computed directly from the correctness on the question and therefore are at 0% when every answer is wrong. Also I see that your agent does not support history, the |
Beta Was this translation helpful? Give feedback.
Hello, can you check manually that you get correct answers when you call your
get_answer_fn
? I suspect that you don't get any answer at all.When all answers are incorrect, it is expected that the knowledge base and router components are at 100% (e.g. for the knowledge base component it is computed as the 1 minus the gap between the topics with best and worst correctness, but if every answer is incorrect, the gap is 0). The other components are computed directly from the correctness on the question and therefore are at 0% when every answer is wrong.
Also I see that your agent does not support history, the
generate_testset
function will generate some questions split into two messages (conv…