Utility function to check if numbers in one string exists in another #260

Eyobyb · 2023-12-11T13:23:21Z

Number exisitance checking utility

This pull request introduces a utility function designed to determine whether the "source" set of text data contains the numbers referenced in the "result" set.

This utility function will serve as a temporary solution for verifying whether the numerical results from the language model originated from scraped data rather than being hallucinated.

20001LastOrder

Thanks for the PR, please check the comments below.

src/sherpa_ai/utils.py

20001LastOrder · 2023-12-13T03:04:14Z

src/tests/unit_tests/test_util.py

+    result = "Labore deserunt 12.45 $45,000  sit velit nulla. Sint ipsum reprehenderit sint cupidatat amet est id anim exercitation fugiat adipisicing elit. Id est dolore minim magna occaecat aute. Est dolore culpa laborum non esse nostrud."
+    check_result = check_if_number_exist(source ,result , 'jack.com')
+    assert check_result['number_exisit'] == False


Besides asserting the existence, we also want to check the numbers extracted are correct...

I added a separate test specifically for that.

oshoma · 2023-12-14T01:08:01Z

src/sherpa_ai/utils.py

+        if data not in source_numbers:
+            message.append(f"{data} is not mentioned in the {source_link}. ")
+    if len(message)>0:
+        return {"number_exisit": False , "messages":message}


typo number_exisit => number_exists

src/sherpa_ai/utils.py

oshoma · 2023-12-14T01:11:33Z

src/sherpa_ai/utils.py

@@ -300,3 +304,19 @@ def check_url(url):
    else:
        return True

+def extract_numbers_from_text(text):


Add unit tests to assure yourself this regex pattern works against the types of numbers you are expecting to match

oshoma · 2023-12-14T01:12:01Z

src/sherpa_ai/utils.py

+    pattern = r"\d+\.\d+|\d+\,\d+|\d+"
+    matches = re.findall(pattern, text)
+
+    return matches


add newline after return matches and delete newline before

20001LastOrder

I left some more comments mostly for the new tests. Please take a look.

20001LastOrder · 2023-12-19T16:33:16Z

src/tests/unit_tests/test_util.py

+    numbers_in_source_data = ['12.45','9', '45,000']
+    print(extracted_number)
+    if len(numbers_in_source_data) != len(extracted_number):


You can just use assert len(numbers_in_source_data) == len(extracted_number) without using if-else. It will fail right away if the length does not match

20001LastOrder · 2023-12-19T16:37:32Z

src/tests/unit_tests/test_util.py

+        assert False , "failed to extract a number"
+    else:
+        for number in extracted_number:


To compare the two lists with potential duplicated elements. An easy way is to use Counter. You just need

from collections import Counter assert Counter(extracted_number) == Counter(numbers_in_source_data)

20001LastOrder · 2023-12-19T16:38:43Z

src/tests/unit_tests/test_util.py

+    extracted_number = extract_numbers_from_text(source_data)
+    numbers_in_source_data = ['12.45','9', '45,000']
+    print(extracted_number)


Remove these print statements. If the assertions are done correctly, the content of these lists will be printout due to failed assertions

20001LastOrder · 2023-12-19T16:38:53Z

src/tests/unit_tests/test_util.py

+
+def test_extract_numbers_from_text(source_data):
+    extracted_number = extract_numbers_from_text(source_data)


extracted_numbers

20001LastOrder · 2023-12-19T16:39:12Z

src/tests/unit_tests/test_util.py

+def test_extract_numbers_from_text_pass(source_data, correct_result_data):
+    check_result = check_if_number_exist(source_data ,correct_result_data)
+    assert check_result['number_exists'] == True


assert check_result['number_exists']

20001LastOrder · 2023-12-19T16:39:21Z

src/tests/unit_tests/test_util.py

+def test_extract_numbers_from_text_fails(source_data, incorrect_result_data):
+    check_result = check_if_number_exist(incorrect_result_data , source_data)
+    assert check_result['number_exists'] == False


assert not check_result['number_exists']

Add a new line for the file. Maybe we can take a look together to set up a formatter in the IDE?

20001LastOrder · 2023-12-19T16:40:48Z

src/sherpa_ai/utils.py

+            message += numbers + ", "
+        message += f"not mentioned in the {source_link}. \n"
+        return {"number_exists": False , "messages":message}
+    return {"number_exists": True , "messages":message}


add a new line at the end

test for util to check if number exist

5cba558

Eyobyb requested review from 20001LastOrder and oshoma December 11, 2023 13:24

oshoma assigned Eyobyb Dec 12, 2023

oshoma mentioned this pull request Dec 12, 2023

Sherpa does not accurately extract numeric values from search results #80

Closed

oshoma added the theme-qa Quality assurance work label Dec 12, 2023

20001LastOrder requested changes Dec 13, 2023

View reviewed changes

oshoma reviewed Dec 14, 2023

View reviewed changes

oshoma requested changes Dec 14, 2023

View reviewed changes

Eyobyb added 3 commits December 14, 2023 18:19

fix typo , add test

90471eb

remove space

0a01593

add new line

5963717

20001LastOrder requested changes Dec 19, 2023

View reviewed changes

Eyobyb closed this Jan 11, 2024

Eyobyb mentioned this pull request Jan 11, 2024

Extract number from text #273

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utility function to check if numbers in one string exists in another #260

Utility function to check if numbers in one string exists in another #260

Eyobyb commented Dec 11, 2023 •

edited

Loading

20001LastOrder left a comment

20001LastOrder Dec 13, 2023

Eyobyb Dec 14, 2023

oshoma Dec 14, 2023

oshoma Dec 14, 2023

oshoma Dec 14, 2023

20001LastOrder left a comment

20001LastOrder Dec 19, 2023

20001LastOrder Dec 19, 2023

20001LastOrder Dec 19, 2023

20001LastOrder Dec 19, 2023

20001LastOrder Dec 19, 2023

20001LastOrder Dec 19, 2023

20001LastOrder Dec 19, 2023

20001LastOrder Dec 19, 2023


		def test_extract_numbers_from_text(source_data):
		extracted_number = extract_numbers_from_text(source_data)

Utility function to check if numbers in one string exists in another #260

Utility function to check if numbers in one string exists in another #260

Conversation

Eyobyb commented Dec 11, 2023 • edited Loading

20001LastOrder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

20001LastOrder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Eyobyb commented Dec 11, 2023 •

edited

Loading