[DRAFT] Quality assurance strategy for 3rd party API calls #224

oshoma · 2023-11-14T19:02:29Z

PROBLEM

Some of our tests call 3rd party APIs such as Serper (Google Search). This can give us non-deterministic results, slow tests, and tests that cost real money. It could also result in pushback from service providers due to bursty API call volumes. See background discussion.

SOLUTION

We need to define how we will test the quality of code that depends on 3rd party APIs via libraries. Here's a draft proposal for comment:

Goals

When run in its default mode, make test will operate offline. Tests will not perform network calls, including function calls to 3rd party APIs. Furthermore, the tests will be crafted to be deterministic. This avoids the problems described above, and is a necessary stepping stone towards implementing automatic testing on developer desktops and in Github actions.
A second, network-connected mode of make test will perform calls to 3rd party APIs. The purpose of this mode is to run end-to-end integration tests to demonstrate that our system is working in the real world. Since these tests are expected to be relatively slow, and will cost real money, they should only be run occasionally, e.g. when we are preparing to deploy into production.

Developer Guidelines

The job of our tests is to ensure our own code is high quality. We are not testing whether a 3rd party API works... that's the job of the 3rd party.
Create stubs to mimic the results of 3rd party API calls.
Occasionally refresh the stub values with new responses from the 3rd party service.
Always use stubbed values in tests of code that rely on 3rd party APIs.
Block network calls during default test runs. To achieve this we can integrate pytest-socket into the top level of our test configuration.
Create some integration tests that call 3rd party services, to be run selectively. These tests will run the real code (no stubs or mocks), resulting in actual network calls to 3rd party APIs. Design these tests so that they succeed when the 3rd party API returns successfully, and fail otherwise. Design these tests so that they do not issue a sustained high velocity burst of 3rd party API calls.
Mark the tests that call 3rd party APIs. Mark the special tests, e.g.@pytest.mark.external-api-call. Modify pyproject.toml so these tests do not run by default. Instead they can be selectively run by passing in a flag to pytest, e.g. `pytest -v -m external-api-call'
Run the "external-api-call" integration tests before deploying new code to production. Keep in mind that these tests can fail for unexpected reasons, including the 3rd party vendor being unavailable or crashing at the moment the tests run.

Implementation Thoughts

Here are additional thoughts from 2023-11-16 discussion with @20001LastOrder about how we might implement this:

Set LLM temperature to 0 in tests
Always use the same LLM when testing
Use a special logger to capture all LLM calls and responses
a) send to normal console output
b) send to fixure files, to be used for tests
...based on a command line option
Offline test mode (the default)
- Use Fake LLM | 🦜️🔗 Langchain to simulate LLM responses
- load canned LLM responses from fixture files, previously captured from real LLM usage
- tests should run quickly , e.g. < 15 seconds for entire test suite
- test the behavioral logic in our own code, which interacts with LLMs
- test expectations about how many times our code calls LLMs, and with what parameters
Online (network-connected) mode
- tests call the real 3rd party APIs
- we instantiate a real LLM when testing
- test suite will run slowly
- test suite will cost real money to run

The text was updated successfully, but these errors were encountered:

Improve support for command line options - Use ArgumentParser to make parsing of command line options more robust - Add support for `gsite` option to scope search to a particular URL - Rename `real` tests to `external_api` to indicate tests that call 3rd party APIs (see #224 for more info) - Fix some broken tests There are 7 failing tests. This is a known issue and will be fixed separately, as the test breaks were introduced by other commits. Fixes #215

oshoma · 2023-11-17T00:45:34Z

@20001LastOrder I added thoughts from our discussion today about making tests robust

20001LastOrder · 2024-02-09T16:11:31Z

To move this forward, I recently have the following thoughts:

The content of external API calls is not really important when testing offline, unless, of course, the test is about features using that API call directly. Since the ultimate content is generated by the LLM, which is already handled with caching right now.
Based on the above observation, I think
- If the API call is part of integration testing (e.g. QA agent using Google Search), we can just mock it with anything that follows the same format as the response of the API
- If the API call is the essential part of the integration testing, we should mock it with meaningful content following the format of the API. (For example, testing of number validation with Google Search should be mocked with some sample Google Search results containing numbers).

oshoma · 2024-02-09T18:09:42Z

I agree with all that. Test suite just uses mocks and runs completely offline, no real interactions with 3rd party services, right? Some other things we will still need: 1. A process to ensure the mocks remain consistent over time with the output of the services we're mocking. Because API signatures change and LLM output varies over time. 2. Automated smoke tests and/or acceptance tests to prove the system is working properly with 3rd party services whenever we deploy it to production. 3. Logic to gracefully handle failing calls to 3rd party services. Because sometimes they'll be down, overloaded, API key blocked, etc.

…

On Fri, Feb 9, 2024 at 11:11 AM Boqi (Percy) Chen ***@***.***> wrote: To move this forward, I recently have the following thoughts: 1. The content of external API calls is not really important when testing offline, unless, of course, the test is about features using that API call directly. Since the ultimate content is generated by the LLM, which is already handled with caching right now. 2. Based on the above observation, I think - If the API call is part of integration testing (e.g. QA agent using Google Search), we can just *mock* it with anything that follows the same format as the response of the API - If the API call is the essential part of the integration testing, we should mock it with meaningful content following the format of the API. (For example, testing of number validation with Google Search should be mocked with some sample Google Search results containing numbers). — Reply to this email directly, view it on GitHub <#224 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAKWT56SJJUUUO2MJUPNJLYSZDD5AVCNFSM6AAAAAA7LJPXY6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZWGIYDCNJYG4> . You are receiving this because you authored the thread.Message ID: ***@***.***>

20001LastOrder · 2024-04-04T20:08:00Z

Closed by #309 and #202

oshoma added the draft-proposal A draft proposal for how to manage this repository label Nov 14, 2023

oshoma mentioned this issue Nov 14, 2023

git hub action for the test #202

Merged

oshoma changed the title ~~[PROPOSAL] Quality assurance strategy for 3rd party API calls~~ [DRAFT] Quality assurance strategy for 3rd party API calls Nov 14, 2023

This was referenced Nov 17, 2023

[FEATURE] A Wrapper for LLM to Log Input/Output #227

Closed

[BUG] Failed Tests #228

Closed

20001LastOrder closed this as completed Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] Quality assurance strategy for 3rd party API calls #224

[DRAFT] Quality assurance strategy for 3rd party API calls #224

oshoma commented Nov 14, 2023 •

edited

Loading

oshoma commented Nov 17, 2023

20001LastOrder commented Feb 9, 2024

oshoma commented Feb 9, 2024 via email

20001LastOrder commented Apr 4, 2024

[DRAFT] Quality assurance strategy for 3rd party API calls #224

[DRAFT] Quality assurance strategy for 3rd party API calls #224

Comments

oshoma commented Nov 14, 2023 • edited Loading

PROBLEM

SOLUTION

Goals

Developer Guidelines

Implementation Thoughts

oshoma commented Nov 17, 2023

20001LastOrder commented Feb 9, 2024

oshoma commented Feb 9, 2024 via email

20001LastOrder commented Apr 4, 2024

oshoma commented Nov 14, 2023 •

edited

Loading