Add retrieval evaluation pipeline #2

dgbaenar · 2024-08-13T18:48:40Z

Retrieval evaluation pipeline in scripts folder

app/db/index_documents.py

llm/llama.py

llm/main.py

scripts/rag_evaluation/create_evaluation_dataset/main.py

mats-claassen · 2024-08-15T13:50:41Z

scripts/rag_evaluation/create_evaluation_dataset/openai_api.py

+        )
+        tokens_user_prompt = get_total_tokens_from_string(user_prompt_formatted)
+
+        if tokens_system_prompt + tokens_user_prompt < 10000:


this 10k could be a constant defined at the top of the file

mats-claassen · 2024-08-15T13:51:13Z

scripts/rag_evaluation/create_evaluation_dataset/main.py

+fhir_file = '../data/Abraham100_Oberbrunner298_9dbb826d-0be6-e8f9-3254-dbac25d83be6.json'
+flat_file_path = '../data/flat_files'
+
+output_json_file = '../data/FHIR_chunks.json'
+output_report_file = '../data/FHIR_report.txt'
+output_openai_batch_requests_file = '../data/openai_requests.jsonl'


All these could be parameters parsed using argparse. You can set these values as defaults

scripts/rag_evaluation/data/tokens_lenghts.png

mats-claassen · 2024-08-15T14:03:57Z

scripts/rag_evaluation/data/FHIR_report.txt

+Total openai queries: 1915
+Total aproximate costs: 0.23
+Total aproximate input costs: 0.054
+Total aproximate output costs: 0.172


You could update your estimate of output tokens to reflect the current costs

mats-claassen · 2024-08-15T14:07:59Z

scripts/rag_evaluation/utils/evaluate_retrieval.py

+    metrics["Total contexts"] = total_contexts
+
+    for series_name, value in metrics.items():
+        task.get_logger().report_scalar(


You can also use logger.report_single_value(metric, value) and results will be shown in a table of metrics, instead of a chart by iteration, as you have no iterations here

mats-claassen · 2024-08-15T14:09:53Z

scripts/rag_evaluation/utils/test_contexts_and_models.py

@@ -0,0 +1,109 @@
+from dotenv import load_dotenv


What is the purpose of this file?

The purpose is to test the speed of the generation process by using different context lenghts

mats-claassen · 2024-08-26T20:25:03Z

app/services/search_documents.py

 def search_query(query_text, embedding_model,
                 es_client, index_name=settings.index_name,
-                 k=3, threshold=0.2):
+                 k=5, threshold=0.2,
+                 text_boost=1.0, embedding_boost=1.0):


These should be updated to the tuned values

mats-claassen · 2024-08-26T20:26:43Z

scripts/rag_evaluation/create_evaluation_dataset/chunks_flatten_strategy/create_chunks.py

+    with open(output_file, "w") as out:
+        json.dump(documents, out, indent=2)
+
+    measure_tokens_lenghts("../data/tokens_lenghts.png", tokens_list)


This path should be a config, constant or parameter

mats-claassen · 2024-08-26T20:28:07Z

scripts/rag_evaluation/create_evaluation_dataset/chunks_flatten_strategy/create_chunks.py

+    chunk_size=500,
+    chunk_overlap=50,


These are relevant parameters which should also be stored in the experiment tracker, which also means that you should pass them in as parameters

mats-claassen · 2024-08-26T20:29:17Z

scripts/rag_evaluation/create_evaluation_dataset/chunks_flatten_strategy/fhir_flattener.py

@@ -0,0 +1,140 @@
+import base64


If this logic is taken from another open source code then you should add a comment here saying from where you got the code

mats-claassen · 2024-08-26T20:30:59Z

scripts/rag_evaluation/create_evaluation_dataset/chunks_flatten_strategy/main.py

+fhir_file = '../data/Abraham100_Oberbrunner298_9dbb826d-0be6-e8f9-3254-dbac25d83be6.json'
+flat_file_path = '../data/flat_files'
+
+output_json_file = '../data/FHIR_chunks.json'
+output_report_file = '../data/FHIR_report.txt'
+output_openai_batch_requests_file = '../data/openai_requests.jsonl'


Would be good to have all of these as argparse parameters

mats-claassen · 2024-08-26T20:35:34Z

scripts/rag_evaluation/create_evaluation_dataset/chunks_flatten_strategy/create_chunks.py

+    return len(encoding.encode(string))
+
+
+def measure_tokens_lenghts(file_path, tokens_lengths):


this function seems to be duplicated several times in different files. You should prefer to implement it once and reuse

mats-claassen · 2024-08-26T20:36:43Z

scripts/rag_evaluation/create_evaluation_dataset/full_json_dumps_strategy/json_dumps.py

+    return resource
+
+
+def remove_urls_from_fhir(data):


This also looks like a function which could be reused in multiple strategies

scripts/rag_evaluation/evaluate_generation/generation_speed/main.py

scripts/rag_evaluation/evaluate_generation/generation_speed/utils/llm.py

…atten approach

dgbaenar requested a review from mats-claassen August 13, 2024 18:48

dgbaenar assigned mats-claassen Aug 13, 2024

mats-claassen reviewed Aug 13, 2024

View reviewed changes

app/db/index_documents.py Outdated Show resolved Hide resolved

llm/llama.py Outdated Show resolved Hide resolved

llm/llama.py Outdated Show resolved Hide resolved

llm/main.py Outdated Show resolved Hide resolved

mats-claassen reviewed Aug 15, 2024

View reviewed changes

scripts/rag_evaluation/create_evaluation_dataset/main.py Outdated Show resolved Hide resolved

mats-claassen reviewed Aug 15, 2024

View reviewed changes

mats-claassen reviewed Aug 26, 2024

View reviewed changes

dgbaenar added 14 commits August 29, 2024 20:05

Adding requirements-dev.txt to the repository

0064619

feature/initial_commits: adding evaluation datasets with OpenAI setup

c22c0d1

feature/initial_commits: retrieval evaluation

2bf86fb

feature/initial_commits: evaluation retrieval script

a14ce9a

llamaindex: add llamaindex manual metrics

7959378

llamaindex: format scrit to generate evaluation dataset using FHIR fl…

a6060f8

…atten approach

llamaindex: retrieval evaluation pipeline with FHIR flatten strategy

57433e2

llamaindex: cleaning code

4109452

llamaindex: Retrieval evaluation configuration

e349cab

llamaindex: removing noise files

a2c1d49

llamaindex: evaluate generation starting code

ba53767

llamaindex: generation speed evaluation script

92a14b9

llamaindex: add evaluate retrieval scripts

27c95d0

llamaindex: add hugging face token as environment variable

592e1a6

dgbaenar force-pushed the llamaindex branch from d2892d9 to bab62ad Compare August 30, 2024 01:53

mats-claassen approved these changes Aug 30, 2024

View reviewed changes

dgbaenar force-pushed the llamaindex branch from bab62ad to 875ee70 Compare August 30, 2024 17:44

git rebase and refactoring

fe22b9e

dgbaenar force-pushed the llamaindex branch from 875ee70 to fe22b9e Compare August 30, 2024 17:49

dgbaenar merged commit e4b518b into main Aug 30, 2024
2 checks passed

dgbaenar deleted the llamaindex branch September 3, 2024 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add retrieval evaluation pipeline #2

Add retrieval evaluation pipeline #2

dgbaenar commented Aug 13, 2024

mats-claassen Aug 15, 2024

dgbaenar Aug 20, 2024

mats-claassen Aug 15, 2024

mats-claassen Aug 15, 2024

mats-claassen Aug 15, 2024

mats-claassen Aug 15, 2024

dgbaenar Aug 20, 2024

mats-claassen Aug 26, 2024

mats-claassen Aug 26, 2024

mats-claassen Aug 26, 2024

mats-claassen Aug 26, 2024

mats-claassen Aug 26, 2024

mats-claassen Aug 26, 2024

mats-claassen Aug 26, 2024

		return len(encoding.encode(string))


		def measure_tokens_lenghts(file_path, tokens_lengths):

Add retrieval evaluation pipeline #2

Add retrieval evaluation pipeline #2

Conversation

dgbaenar commented Aug 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment