-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
entity validator #295
entity validator #295
Conversation
string_comparison_with_jaccard_and_levenshtein( | ||
tester[0], tester[1], lev_constant | ||
) | ||
assert True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test as written is not useful; it will only fail if the string comparison function raises an exception.
Instead,
- make assertions about the correctness of the comparisons, for each item in
lists_of_test
- Use parameterized tests @pytest.mark.parametrize
- Test the happy path and the failure path
return {"entity_exisist": True, "messages": message} | ||
|
||
|
||
def check_entities_match(result: str, source: str, stage: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we remove this function unless we have a clear use for it. Right now there's only a test calling it.
If we need it,
- let's discuss
- The "stage" part is hard to understand and would have to be replaced by meaningful named values
data_numbers = expected_numbers | ||
logger.error(results[0].content) | ||
for number in extract_numbers_from_text(results[0].content): | ||
if number in data_numbers or len(data_numbers) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
This code is checking whether
len(data_numbers) == 0
every time it runs through the outerfor
loop. Instead, check that once before the for loop starts, and remember the cached result. -
Instead of doing a
for
loop use a Pythonset
. See @20001LastOrder/hydra_config 37dbe54 where I refactoredverify_numbers_against_source
to use sets.
message = "" | ||
levenshtein_constant = 0.5 | ||
|
||
print("********************************************") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of print statements throughout this PR. Please remove them. Maybe a few need to be left in as logger.debug statements.
return filtered_entities | ||
|
||
|
||
def is_subset(str1, str2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused function
|
||
|
||
def extract_entities(text): | ||
# text = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the commented out lines
# """ | ||
nlp = spacy.load("en_core_web_sm") | ||
doc = nlp(text) | ||
entity_types = ["NORP", "ORG", "GPE", "LAW"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment to explain the types
self.belief.get_histories_excluding_types(token_counter=self.llm.get_num_tokens , exclude_type=[EventType.result]), | ||
) | ||
if count == self.validation_count: | ||
result = result + "you might not see some of the entities mentioned in the context be aware of that ." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might not see some of the entities mentioned in the context be aware of that .
-->
Be aware that you might not see some of the entities mentioned in the context.
Your checklist for this pull request
Thank you for submitting a pull request! To speed up the review process, please follow this checklist:
make format
)/docs
)pytest tests
(offline mode)Additional steps for code with networking dependencies:
pytest tests --external_api
(online mode, making network calls)Description
Describe your pull request here.
What does this PR implement or fix? Explain.
If this PR resolves any currently open issues then mention them like this:
Closes #xxx
.Github will close such issues automatically when your PR is merged into
main
.Any relevant logs, error output, etc?
Any other comments? For example, will other contributors need to install new libraries via
poetry install
after picking up these changes?💔Thank you!