From 6dea219a224cd775d4eaca850cc58781770ac0b9 Mon Sep 17 00:00:00 2001 From: Marco Lehner Date: Tue, 1 Oct 2024 12:21:47 +0200 Subject: [PATCH] Update README.md --- README.md | 62 ++----------------------------------------------------- 1 file changed, 2 insertions(+), 60 deletions(-) diff --git a/README.md b/README.md index 4e6cd77..5e2a820 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,4 @@ -![logo](assets/logoSecondOpinion.gif) - -Detect hallucinated content in generated answers for any RAG system. +This repo provides the summarisation functionality for the second opinion demo. :warning: This is only a Proof of Concept and not ready for production use. @@ -35,64 +33,8 @@ curl -X 'POST' \ Since this API is also designed for demonstration purposes it is possible to enforce hallucinations by setting the honest parameter to `false`. As models you can choose either `gpt-3.5-turbo` or `gpt-4-turbo`. - -## Fact check - -The endpoint `check` performs a check if a `sentence` is contained in the `source`. - -If this test is passed, the endpoint returns a boolean value `result` as `true`. If the information from the sentence is -not contained in the source, it will return false. - -Besides this boolean value, the endpoint returns an array `answers`, which spell out for each sentence in the source -why or why not it is contained in the source. - -As an URL paramater you can pass the threshold, a lower threshold means higher latency and possibly better accuracy. -The default value is 0.65. - -```shell -curl -X 'POST' \ - 'http://localhost:3000/check?semantic_similarity_threshold=0.65&model=gpt-3.5-turbo' \ - -H 'accept: application/json' \ - -H 'Content-Type: application/json' \ - -d '{ - "source": "string", - "sentence": "string" -}' -``` - -## Evaluation -This repository contains two scripts designed to evaluate and enhance the accuracy of our hallucination detection systems. -The script `evaluation.py` aims to validate the effectiveness of the system by comparing its predictions with the -gold standard dataset, ultimately providing a measure of accuracy. - -The script `predictor.py` focuses on processing the test data set using the provided API to create set to validate against. - -### Available Test- and Training Data - -The test and training data ist purely synthetic. It is generated by a random dump from our vector store containing -BR24 articles, split by `

`aragraphs. For the test set 150 of those paragraphs are randomly sampled and saved to -`data/test.csv`. - -This file is used by `create_training_data.py` to generate a question which can be answered given the paragraph. - -Using this question and the paragraph, GPT 3.5 Turbo is used to generate answers to the questions. In some cases -the LLM is explicitly asked to add wrong but plausible content to the answer. - -## Hypothesis data - -Your hypothesis data should be placed in the data folder and be suffixed with `_result.jsonl`. Each row shall contain a -JSON object with the structure as follows: - -```json -{ - "id": "string", - "hallucination": true, - "prob": 0.01 -} -``` - ## Evaluation To run the evaluation simply run `python evaluate.py` after you've placed your results in the data folder. The evaluation script calculates the accuracy - e.g. the percentage of correctly predicted samples. -The current \ No newline at end of file +The current