-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PDSM] Medical Testcases for benchmarking #157
Conversation
- Remove no_podcast.py cause not so wichtig
Datascience fragen
ADD Emergency Testcase
# Conflicts: # benchmark/data/benchmark_data.yaml
Meli test file
FIXED Encoding Bug
# Conflicts: # benchmark/data/benchmark_data.yaml
@slobentanzer Wir haben jetzt einen Teil der SW Engineering umgesetzt und die Fragen anpasst. Wir wären ready um den großen Durchlauf zu machen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ytehran and all,
could you kindly address the open issues and anything else you may want to fix before we merge into main? I'd like to do this soon to be able to proceed with the manuscript. :)
benchmark/conftest.py
Outdated
@@ -1,29 +1,28 @@ | |||
import os | |||
|
|||
import requests | |||
from dotenv import load_dotenv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ytehran could you or another team member address this? prevents me from merging.
benchmark/conftest.py
Outdated
@@ -387,6 +386,9 @@ def evaluation_conversation(): | |||
prompts={}, | |||
correct=False, | |||
) | |||
# delete first dots if venv is in project env | |||
cus_path = os.getcwd() + "../../venv/bin/.env" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ytehran please someone remove this so we can merge this PR.
openAIKey.py
Outdated
@@ -0,0 +1,9 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ytehran please address this (remove) in order to be able to merge into main
:)
Added documentation of functions
Stats graphs
@slobentanzer We changed our code in order to get the PR ready. Even some updates and improvements for our stats. If something is missing, please let me know |
Please note, that this PR is still in work
What does this PR do?
Notes
For the analysis:
Adaption of benchmark_utils for the failure_groups and added new methods
To test for synonyms, “nltk” is used, which must