Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

entity validator #295

Closed

Conversation

Eyobyb
Copy link
Collaborator

@Eyobyb Eyobyb commented Feb 13, 2024

Your checklist for this pull request

Thank you for submitting a pull request! To speed up the review process, please follow this checklist:

  • My Pull Request is small and focused on one topic so it can be reviewed easily
  • My code follows the style guidelines of this project (make format)
  • Commit messages are detailed
  • I have performed a self-review of my code
  • I commented hard-to-understand parts of my code
  • I updated the documentation (docstrings, /docs)
  • My changes generate no new warnings (or explain any new warnings and why they're ok)
  • I have added tests that prove my fix is effective or that my feature works
  • All tests pass when I run pytest tests (offline mode)

Additional steps for code with networking dependencies:

Description

Describe your pull request here.

What does this PR implement or fix? Explain.

If this PR resolves any currently open issues then mention them like this: Closes #xxx.
Github will close such issues automatically when your PR is merged into main.

Any relevant logs, error output, etc?

Any other comments? For example, will other contributors need to install new libraries via poetry install after picking up these changes?

💔Thank you!

Sorry, something went wrong.

string_comparison_with_jaccard_and_levenshtein(
tester[0], tester[1], lev_constant
)
assert True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test as written is not useful; it will only fail if the string comparison function raises an exception.

Instead,

  • make assertions about the correctness of the comparisons, for each item in lists_of_test
  • Use parameterized tests @pytest.mark.parametrize
  • Test the happy path and the failure path

return {"entity_exisist": True, "messages": message}


def check_entities_match(result: str, source: str, stage: int):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we remove this function unless we have a clear use for it. Right now there's only a test calling it.

If we need it,

  • let's discuss
  • The "stage" part is hard to understand and would have to be replaced by meaningful named values

data_numbers = expected_numbers
logger.error(results[0].content)
for number in extract_numbers_from_text(results[0].content):
if number in data_numbers or len(data_numbers) == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This code is checking whether len(data_numbers) == 0 every time it runs through the outer for loop. Instead, check that once before the for loop starts, and remember the cached result.

  2. Instead of doing a for loop use a Python set. See @20001LastOrder/hydra_config 37dbe54 where I refactored verify_numbers_against_source to use sets.

message = ""
levenshtein_constant = 0.5

print("********************************************")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of print statements throughout this PR. Please remove them. Maybe a few need to be left in as logger.debug statements.

return filtered_entities


def is_subset(str1, str2):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused function



def extract_entities(text):
# text = """
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the commented out lines

# """
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
entity_types = ["NORP", "ORG", "GPE", "LAW"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment to explain the types

self.belief.get_histories_excluding_types(token_counter=self.llm.get_num_tokens , exclude_type=[EventType.result]),
)
if count == self.validation_count:
result = result + "you might not see some of the entities mentioned in the context be aware of that ."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might not see some of the entities mentioned in the context be aware of that .
-->
Be aware that you might not see some of the entities mentioned in the context.

@Eyobyb Eyobyb mentioned this pull request Feb 19, 2024
11 tasks
@Eyobyb Eyobyb closed this Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants