-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes 67: Adding test coverage for louis-crawler #69
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Impressive 95% test coverage overall. Testing edge cases should increase it even more.
- Be careful not to use real data in tests. Instead, a good strategy would be to populate the tables with test data before each test and rollback after. There are other strategies you can explore as well.
- Make sure assertions actually assert the effects of the functions.
def test_fetch_chunk_id_without_embedding(self): | ||
"""sample test to check if fetch_chunk_id_without_embedding works""" | ||
with db.cursor(self.connection) as cursor: | ||
cursor.execute(test.embedding_table.format(embedding_model='test-model')) | ||
rows = crawler.fetch_chunk_id_without_embedding(cursor, 'test-model') | ||
_entity_id = rows[0] | ||
self.connection.rollback() | ||
|
||
def test_store_chunk_item(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case seems to assert the happy path for store_chunk_item
. I suggest also testing the edge case where row
is None
here:
row = cursor.fetchone()
if row is not None:
data['chunk_id'] = row['id']
in store_chunk_item
. This is important because if there is a possibility that row
is None
, then we should know exactly what the end result would be.
To go about this, I suggest breaking store_chunk_item
into more atomic functions (which would be tested separately), then mocking the one that would normally produce row
to return None
instead in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
broke down store_chunk_item into tinier functions
still need to test if there is a case with none
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
with db.cursor(self.connection) as cursor: | ||
id = crawler.fetch_crawl_ids_without_chunk(cursor) | ||
self.connection.rollback() | ||
self.assertEqual(id, []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion isn't adequately testing fetch_crawl_ids_without_chunk
. A better test would be to first populate the table with a mix of crawls, some with chunk IDs and some without. Then, we should verify that the function's results match exactly the crawls that don't have chunk IDs, before performing a rollback.
def test_fetch_crawl_row(self): | ||
"""Test fetching a crawl row.""" | ||
with db.cursor(self.connection) as cursor: | ||
row = crawler.fetch_crawl_row(cursor, "https://inspection.canada.ca/a-propos-de-l-acia/structure-organisationnelle/mandat/fra/1299780188624/1319164463699") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This data might not exist in the future, in which case this test will fail. A more suitable test would be to first populate the relevant tables with test items, perform the fetch, then assert the result matches the test items.
fetch_crawl_row
also has multiple paths and edge cases worth testing.
def test_fetch_chunk_token_row(self): | ||
"""Test fetching a chunk token row.""" | ||
with db.cursor(self.connection) as cursor: | ||
row = crawler.fetch_chunk_token_row(cursor, "469812c5-190c-4e56-9f88-c8621592bcb5") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This data might not exist in the future, in which case this test will fail. A more suitable test would be to first populate the relevant tables with test items, perform the fetch, then assert the result matches the test items.
Pushing it to 96% : |
To generate documentation based on docstring, python -m pydoc -p 1234 will open a localhost on port 1234 to display the documentation on html pages. Unfortunately, it do not detect the tests module and therefore is not doing any documentation on that despite dosctrings. |
@k-allagbe can you follow up yourself and get this PR in? |
On it. |
issue #67
Commands used to see the coverage :