diff --git a/README.md b/README.md
index ebd49d1..a9d8476 100644
--- a/README.md
+++ b/README.md
@@ -1,12 +1,14 @@
 ![logo](assets/logoSecondOpinion.gif)
 
-Detect hallucinated content in generated answers for any RAG system.
+Detect hallucinated content in generated answers for any RAG system. A live demo of this tool can be
+tried [here](https://interaktiv.brdata-dev.de/second-opinion-demo/).
 
 :warning: This is only a Proof of Concept and not ready for production use.
 
 ## Installation
 
-This API is tested with Python version 3.11 on Debian but should run on most recent Python versions and operation systems.
+This API is tested with Python version 3.11 on Debian but should run on most recent Python versions and operation
+systems.
 
 1. Create virtual environment `pyenv virtualenv 3.11.7 NAME && pyenv activate NAME`
 2. Install dependencies `pip install -r requirements.txt`
@@ -19,7 +21,7 @@ You can now view the Swagger documentation of the API in your browser under `loc
 
 ## Fact check
 
-The endpoint `check` performs a check if a `sentence` is contained in the `source`. 
+The endpoint `check` performs a check if a `sentence` is contained in the `source`.
 
 If this test is passed, the endpoint returns a boolean value `result` as `true`. If the information from the sentence is
 not contained in the source, it will return false.
@@ -27,7 +29,7 @@ not contained in the source, it will return false.
 Besides this boolean value, the endpoint returns an array `answers`, which spell out for each sentence in the source
 why or why not it is contained in the source.
 
-As an URL paramater you can pass the threshold, a lower threshold means higher latency and possibly better accuracy. 
+As an URL paramater you can pass the threshold, a lower threshold means higher latency and possibly better accuracy.
 The default value is 0.65.
 
 ```shell
@@ -42,20 +44,20 @@ curl -X 'POST' \
 ```
 
 ## Evaluation
-This repository contains two scripts designed to evaluate and enhance the accuracy of our hallucination detection systems. 
+
+This repository contains two scripts designed to evaluate and enhance the accuracy of our hallucination detection
+systems.
 The script `evaluation.py` aims to validate the effectiveness of the system by comparing its predictions with the
 gold standard dataset, ultimately providing a measure of accuracy.
 
-The script `predictor.py` focuses on processing the test data set using the provided API to create set to validate against.
-
-:warning: This test set is designed to detect wrong answers to questions if the source is given. We think the two tasks
-are similar enough to use this on our domain as well.
+The script `predictor.py` focuses on processing the test data set using the provided API to create set to validate
+against.
 
 ### Available Test- and Training Data
 
-The test and training data ist purely synthetic. It is generated by a random dump from our vector store containing 
+The test and training data ist purely synthetic. It is generated by a random dump from our vector store containing
 BR24 articles, split by `<p>`aragraphs. For the test set 150 of those paragraphs are randomly sampled and saved to
-`data/test.csv`. 
+`data/test.csv`.
 
 This file is used by `create_training_data.py` to generate a question which can be answered given the paragraph.
 
@@ -64,7 +66,7 @@ the LLM is explicitly asked to add wrong but plausible content to the answer.
 
 ## Hypothesis data
 
-Your hypothesis data should be placed in the data folder and be suffixed with `_result.jsonl`. Each row shall contain a 
+Your hypothesis data should be placed in the data folder and be suffixed with `_result.jsonl`. Each row shall contain a
 JSON object with the structure as follows:
 
 ```json
@@ -76,7 +78,6 @@ JSON object with the structure as follows:
 ```
 
 ## Evaluation
-To run the evaluation simply run `python evaluate.py` after you've placed your results in the data folder.
-The evaluation script calculates the accuracy - e.g. the percentage of correctly predicted samples.
 
-The current 
+To run the evaluation simply run `python evaluate.py` after you've placed your results in the data folder.
+The evaluation script calculates the accuracy - e.g. the percentage of correctly predicted samples.
\ No newline at end of file
diff --git a/data/DatasetCardHallucinations.md b/evaluation/dataset/DatasetCardHallucinations.md
similarity index 100%
rename from data/DatasetCardHallucinations.md
rename to evaluation/dataset/DatasetCardHallucinations.md
diff --git a/data/create_hallucination_training_data.py b/evaluation/dataset/create_hallucination_training_data.py
similarity index 100%
rename from data/create_hallucination_training_data.py
rename to evaluation/dataset/create_hallucination_training_data.py
diff --git a/data/train.jsonl b/evaluation/dataset/train.jsonl
similarity index 100%
rename from data/train.jsonl
rename to evaluation/dataset/train.jsonl
diff --git a/data/verified.txt b/evaluation/dataset/verified.txt
similarity index 100%
rename from data/verified.txt
rename to evaluation/dataset/verified.txt
diff --git a/src/prompts.py b/src/prompts.py
index f77528c..f9511db 100644
--- a/src/prompts.py
+++ b/src/prompts.py
@@ -11,7 +11,7 @@
 - Vorhandensein und Korrektheit von Quellenangaben
 - Inhaltliche Übereinstimmung der Informationen mit dem Ausgangstext
 
-Falls einer dieser Aspekte misachtet wird, gilt die Aussage als nicht unterstützt.
+Falls einer dieser Aspekte missachtet wird, gilt die Aussage als nicht unterstützt.
 
 Für jede Faktenprüfungsaufgabe erhalten Sie:
 Einen Satz, der eine oder mehrere zu überprüfende Behauptungen enthält