Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes 67: Adding test coverage for louis-crawler #69

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

melanie-fressard
Copy link
Contributor

@melanie-fressard melanie-fressard commented Dec 13, 2023

issue #67

Commands used to see the coverage :

python -m coverage run -m unittest discover -s tests
python -m coverage report

@melanie-fressard melanie-fressard self-assigned this Dec 13, 2023
@melanie-fressard melanie-fressard linked an issue Dec 13, 2023 that may be closed by this pull request
@melanie-fressard melanie-fressard marked this pull request as draft December 13, 2023 21:00
@melanie-fressard
Copy link
Contributor Author

Coverage got up to 95%
image

@melanie-fressard melanie-fressard marked this pull request as ready for review December 15, 2023 19:55
@melanie-fressard melanie-fressard marked this pull request as draft December 18, 2023 21:59
Copy link
Member

@k-allagbe k-allagbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Impressive 95% test coverage overall. Testing edge cases should increase it even more.
  • Be careful not to use real data in tests. Instead, a good strategy would be to populate the tables with test data before each test and rollback after. There are other strategies you can explore as well.
  • Make sure assertions actually assert the effects of the functions.

def test_fetch_chunk_id_without_embedding(self):
"""sample test to check if fetch_chunk_id_without_embedding works"""
with db.cursor(self.connection) as cursor:
cursor.execute(test.embedding_table.format(embedding_model='test-model'))
rows = crawler.fetch_chunk_id_without_embedding(cursor, 'test-model')
_entity_id = rows[0]
self.connection.rollback()

def test_store_chunk_item(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case seems to assert the happy path for store_chunk_item. I suggest also testing the edge case where row is None here:

        row = cursor.fetchone()
        if row is not None:
            data['chunk_id'] = row['id']

in store_chunk_item. This is important because if there is a possibility that row is None, then we should know exactly what the end result would be.

To go about this, I suggest breaking store_chunk_item into more atomic functions (which would be tested separately), then mocking the one that would normally produce row to return None instead in the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broke down store_chunk_item into tinier functions
still need to test if there is a case with none

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

tests/test_db_crawler.py Outdated Show resolved Hide resolved
tests/test_db_crawler.py Outdated Show resolved Hide resolved
tests/test_db_crawler.py Outdated Show resolved Hide resolved
with db.cursor(self.connection) as cursor:
id = crawler.fetch_crawl_ids_without_chunk(cursor)
self.connection.rollback()
self.assertEqual(id, [])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion isn't adequately testing fetch_crawl_ids_without_chunk. A better test would be to first populate the table with a mix of crawls, some with chunk IDs and some without. Then, we should verify that the function's results match exactly the crawls that don't have chunk IDs, before performing a rollback.

def test_fetch_crawl_row(self):
"""Test fetching a crawl row."""
with db.cursor(self.connection) as cursor:
row = crawler.fetch_crawl_row(cursor, "https://inspection.canada.ca/a-propos-de-l-acia/structure-organisationnelle/mandat/fra/1299780188624/1319164463699")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data might not exist in the future, in which case this test will fail. A more suitable test would be to first populate the relevant tables with test items, perform the fetch, then assert the result matches the test items.

fetch_crawl_row also has multiple paths and edge cases worth testing.

def test_fetch_chunk_token_row(self):
"""Test fetching a chunk token row."""
with db.cursor(self.connection) as cursor:
row = crawler.fetch_chunk_token_row(cursor, "469812c5-190c-4e56-9f88-c8621592bcb5")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data might not exist in the future, in which case this test will fail. A more suitable test would be to first populate the relevant tables with test items, perform the fetch, then assert the result matches the test items.

@melanie-fressard
Copy link
Contributor Author

Pushing it to 96% :
image
Will apply correction requested by @k-allagbe soon :)

tests/test_db_crawler.py Outdated Show resolved Hide resolved
@melanie-fressard
Copy link
Contributor Author

To generate documentation based on docstring, python -m pydoc -p 1234 will open a localhost on port 1234 to display the documentation on html pages. Unfortunately, it do not detect the tests module and therefore is not doing any documentation on that despite dosctrings.

@melanie-fressard
Copy link
Contributor Author

Tried to apply more correction but cannot because I have trouble with adding test data to the database. As it is true that test-data should be added for better testing, I would suggest to start from here if anyone is assigned to this issue.
Otherwise, pushed tests to 97% :
image

@rngadam
Copy link
Contributor

rngadam commented Jan 11, 2024

@k-allagbe can you follow up yourself and get this PR in?

@k-allagbe
Copy link
Member

@k-allagbe can you follow up yourself and get this PR in?

On it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding test coverage for louis-crawler
4 participants