entity validator #295

Eyobyb · 2024-02-13T13:36:56Z

Your checklist for this pull request

Thank you for submitting a pull request! To speed up the review process, please follow this checklist:

My Pull Request is small and focused on one topic so it can be reviewed easily
My code follows the style guidelines of this project (make format)
Commit messages are detailed
I have performed a self-review of my code
I commented hard-to-understand parts of my code
I updated the documentation (docstrings, /docs)
My changes generate no new warnings (or explain any new warnings and why they're ok)
I have added tests that prove my fix is effective or that my feature works
All tests pass when I run pytest tests (offline mode)

Additional steps for code with networking dependencies:

I followed the offline and online testing guidelines
All tests pass when I run pytest tests --external_api (online mode, making network calls)

Description

Describe your pull request here.

What does this PR implement or fix? Explain.

If this PR resolves any currently open issues then mention them like this: Closes #xxx.
Github will close such issues automatically when your PR is merged into main.

Any relevant logs, error output, etc?

Any other comments? For example, will other contributors need to install new libraries via poetry install after picking up these changes?

💔Thank you!

oshoma · 2024-02-16T02:05:44Z

src/tests/unit_tests/test_util.py

+        string_comparison_with_jaccard_and_levenshtein(
+            tester[0], tester[1], lev_constant
+        )
+    assert True


This test as written is not useful; it will only fail if the string comparison function raises an exception.

Instead,

make assertions about the correctness of the comparisons, for each item in lists_of_test

Use parameterized tests @pytest.mark.parametrize

Test the happy path and the failure path

oshoma · 2024-02-16T02:14:54Z

src/sherpa_ai/utils.py

+    return {"entity_exisist": True, "messages": message}
+
+
+def check_entities_match(result: str, source: str, stage: int):


I suggest we remove this function unless we have a clear use for it. Right now there's only a test calling it.

If we need it,

let's discuss

The "stage" part is hard to understand and would have to be replaced by meaningful named values

oshoma · 2024-02-16T02:23:50Z

src/tests/integration_tests/test_entity_citation_validator.py

+        data_numbers = expected_numbers
+        logger.error(results[0].content)
+        for number in extract_numbers_from_text(results[0].content):
+            if number in data_numbers or len(data_numbers) == 0:


This code is checking whether len(data_numbers) == 0 every time it runs through the outer for loop. Instead, check that once before the for loop starts, and remember the cached result.

Instead of doing a for loop use a Python set. See @20001LastOrder/hydra_config 37dbe54 where I refactored verify_numbers_against_source to use sets.

oshoma · 2024-02-16T02:24:42Z

src/sherpa_ai/utils.py

+    message = ""
+    levenshtein_constant = 0.5
+
+    print("********************************************")


There are a lot of print statements throughout this PR. Please remove them. Maybe a few need to be left in as logger.debug statements.

oshoma · 2024-02-16T02:25:38Z

src/sherpa_ai/utils.py

+    return filtered_entities
+
+
+def is_subset(str1, str2):


unused function

oshoma · 2024-02-16T02:26:25Z

src/sherpa_ai/utils.py

+
+
+def extract_entities(text):
+    # text = """


remove the commented out lines

oshoma · 2024-02-16T02:26:48Z

src/sherpa_ai/utils.py

+    #         """
+    nlp = spacy.load("en_core_web_sm")
+    doc = nlp(text)
+    entity_types = ["NORP", "ORG", "GPE", "LAW"]


add a comment to explain the types

oshoma · 2024-02-16T02:28:23Z

src/sherpa_ai/agents/qa_agent.py

+                self.belief.get_histories_excluding_types(token_counter=self.llm.get_num_tokens , exclude_type=[EventType.result]),
+            )
+        if count == self.validation_count:
+            result = result + "you might not see some of the entities mentioned in the context be aware of that ."


you might not see some of the entities mentioned in the context be aware of that .
-->
Be aware that you might not see some of the entities mentioned in the context.

entity validator

b5b4a5b

oshoma requested changes Feb 16, 2024

View reviewed changes

Eyobyb mentioned this pull request Feb 19, 2024

Entity validation #298

Closed

11 tasks

Eyobyb closed this Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

entity validator #295

entity validator #295

Eyobyb commented Feb 13, 2024

oshoma Feb 16, 2024

oshoma Feb 16, 2024

oshoma Feb 16, 2024

oshoma Feb 16, 2024

oshoma Feb 16, 2024

oshoma Feb 16, 2024

oshoma Feb 16, 2024

oshoma Feb 16, 2024

		return {"entity_exisist": True, "messages": message}


		def check_entities_match(result: str, source: str, stage: int):

entity validator #295

entity validator #295

Conversation

Eyobyb commented Feb 13, 2024

Your checklist for this pull request

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment