Skip to content

Conversation

@afg1
Copy link
Contributor

@afg1 afg1 commented Oct 27, 2025

Our unit tests for the pipeline don't run very often, and when they do, they fail on things that are irrelevant, partly because they use live APIs and data changes.

This PR aims to remove a lot of the network access by mocking the API calls. This has been done by recording the output of API calls and using that to build mock data structures, applied by unittest.mock patches so the tests don't use the real API.

This is a work in progress, and will touch many files, so I intend to do incremental PRs to keep reviews manageable.

This batch removes network calls for the OLS, tmRNA, ensembl karyotype and core PDBe tests.

@afg1 afg1 requested review from blakesweeney and pmb59 October 27, 2025 09:16
@afg1 afg1 self-assigned this Oct 27, 2025
@afg1
Copy link
Contributor Author

afg1 commented Oct 27, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces changes to test configurations and adds mocking to remove network dependencies in several tests. The changes involve modifying pyproject.toml and pytest.ini to include new test markers, and updating test files to use mocking for external API calls. This aims to improve the reliability and speed of unit tests by eliminating their reliance on live network connections.

Comment on lines 51 to 53


@pytest.mark.ols
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This test is marked as @pytest.mark.ols but it is missing the mock_ols_api fixture as an argument. This means the test will still attempt to make network calls, defeating the purpose of the mocking fixture. This needs to be corrected.

Suggested change
@pytest.mark.ols
def test_can_fetch_a_go_term():

"ci: Tests that should run in continuous integration",
"slow: Tests that take a long time",
"db: Test that require access to our database",
"network: Tests that require network",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Adding network marker to pyproject.toml to mark tests that require network access. This is good for categorizing tests, but it's important to ensure that all tests that actually use the network are marked, and that tests are not unnecessarily marked as requiring network access.

}

@pytest.mark.ensembl
@pytest.mark.network
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The @pytest.mark.network marker is no longer needed here, as the test has been refactored to use mocking and no longer depends on a network connection. Removing this marker will accurately reflect the test's independence from network resources.

Suggested change
@pytest.mark.network
def test_builds_with_known_bands():

@blakesweeney
Copy link
Member

I'm not sure I like this overall. The upside to these things failing is sometimes the APIs and the like change without notice. This does catch them. There are ways to just log which tests are flakly and I'd rather start with that to be honest.

@afg1
Copy link
Contributor Author

afg1 commented Oct 28, 2025

100%, we need tests to catch changes in the APIs. My plan was to separate them out. So we have our assumptions about the data fromat in these tests, which run fast and can be put into CI on github actions or something. Then we have separate tests that run less frequently (a month before release?) to check the API matches our assumptions.

My problem is that these tests fail for completely irrelevant things like the citation count changing, or the inclusion of an extra full stop in an ontology term definition. If we keep them on the live API, they will always fail, and my worry iss we will assume it is because of the dumb reason when a bigger API change happened.

I would propose having bigger integration tests that use the live API and don't check the fine details like these do, or have specific API data contract testing checks that compare the mocked structure to the real one.

This was just something I decided should be fixed one evening, so very happy to have any input!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants