Update README.md

br-data · Oct 1, 2024 · 6dea219 · 6dea219
1 parent 1805e16
commit 6dea219
Showing 1 changed file with 2 additions and 60 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,4 @@
-![logo](assets/logoSecondOpinion.gif)
-
-Detect hallucinated content in generated answers for any RAG system.
+This repo provides the summarisation functionality for the second opinion demo.
 
 :warning: This is only a Proof of Concept and not ready for production use.
 
@@ -35,64 +33,8 @@ curl -X 'POST' \
 Since this API is also designed for demonstration purposes it is possible to enforce hallucinations by setting the 
 honest parameter to `false`. As models you can choose either `gpt-3.5-turbo` or `gpt-4-turbo`. 
 
-
-## Fact check
-
-The endpoint `check` performs a check if a `sentence` is contained in the `source`. 
-
-If this test is passed, the endpoint returns a boolean value `result` as `true`. If the information from the sentence is
-not contained in the source, it will return false.
-
-Besides this boolean value, the endpoint returns an array `answers`, which spell out for each sentence in the source
-why or why not it is contained in the source.
-
-As an URL paramater you can pass the threshold, a lower threshold means higher latency and possibly better accuracy. 
-The default value is 0.65.
-
-```shell
-curl -X 'POST' \
-  'http://localhost:3000/check?semantic_similarity_threshold=0.65&model=gpt-3.5-turbo' \
-  -H 'accept: application/json' \
-  -H 'Content-Type: application/json' \
-  -d '{
-  "source": "string",
-  "sentence": "string"
-}'
-```
-
-## Evaluation
-This repository contains two scripts designed to evaluate and enhance the accuracy of our hallucination detection systems. 
-The script `evaluation.py` aims to validate the effectiveness of the system by comparing its predictions with the
-gold standard dataset, ultimately providing a measure of accuracy.
-
-The script `predictor.py` focuses on processing the test data set using the provided API to create set to validate against. 
-
-### Available Test- and Training Data
-
-The test and training data ist purely synthetic. It is generated by a random dump from our vector store containing 
-BR24 articles, split by `<p>`aragraphs. For the test set 150 of those paragraphs are randomly sampled and saved to
-`data/test.csv`. 
-
-This file is used by `create_training_data.py` to generate a question which can be answered given the paragraph.
-
-Using this question and the paragraph, GPT 3.5 Turbo is used to generate answers to the questions. In some cases
-the LLM is explicitly asked to add wrong but plausible content to the answer.
-
-## Hypothesis data
-
-Your hypothesis data should be placed in the data folder and be suffixed with `_result.jsonl`. Each row shall contain a 
-JSON object with the structure as follows:
-
-```json
-{
-  "id": "string",
-  "hallucination": true,
-  "prob": 0.01
-}
-```
-
 ## Evaluation
 To run the evaluation simply run `python evaluate.py` after you've placed your results in the data folder.
 The evaluation script calculates the accuracy - e.g. the percentage of correctly predicted samples.
 
-The current 
+The current