Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add pytest for e2e testing #1188

Merged
merged 4 commits into from
Sep 12, 2024
Merged

feat: add pytest for e2e testing #1188

merged 4 commits into from
Sep 12, 2024

Conversation

namchuai
Copy link
Contributor

@namchuai namchuai commented Sep 11, 2024

Describe Your Changes

  • Sample testing for our CLI and API server

Items

  • Engine list
  • Engine get
  • Engine install
  • Engine uninstall
  • Adding some tests for HTTP request

How to use

  1. Build cortex binary first. Binary should be available at engine/build/cortex-cpp
  2. Install python
  3. pip install pytest requests
    2.1. On windows, I have to use pip install pytest requests --user
  4. python e2e-test/main.py

Remaining things to do

  • Retest on windows to see if this works as expected
  • Wiring with our CI/CD?

Screenshots

Screenshot 2024-09-11 at 12 38 11

windows

image

Sample output

~/w/c/engine ❯❯❯ python e2e-test/main.py
===================================================================== test session starts ======================================================================
platform darwin -- Python 3.11.4, pytest-8.3.2, pluggy-1.5.0 -- /Users/jamesnguyen/.pyenv/versions/3.11.4/bin/python
cachedir: .pytest_cache
rootdir: /Users/jamesnguyen/workspace/cortex/engine
plugins: anyio-4.3.0
collected 7 items                                                                                                                                              

e2e-test/main.py::TestCliEngineList::test_engines_list_run_successfully_on_windows SKIPPED (Windows-specific test)                                       [ 14%]
e2e-test/main.py::TestCliEngineList::test_engines_list_run_successfully_on_macos PASSED                                                                  [ 28%]
e2e-test/main.py::TestCliEngineList::test_engines_list_run_successfully_on_linux SKIPPED (Linux-specific test)                                           [ 42%]
e2e-test/main.py::TestCliEngineGet::test_engines_list_run_successfully_on_windows SKIPPED (Windows-specific test)                                        [ 57%]
e2e-test/main.py::TestCliEngineGet::test_engines_get_run_successfully_on_macos PASSED                                                                    [ 71%]
e2e-test/main.py::TestCliEngineGet::test_engines_get_llamacpp_on_macos PASSED                                                                            [ 85%]
e2e-test/main.py::TestCliEngineGet::test_engines_list_run_successfully_on_linux SKIPPED (Linux-specific test)                                            [100%]

================================================================= 3 passed, 4 skipped in 2.12s =================================================================

Fixes Issues

Self Checklist

  • Added relevant comments, esp in complex areas
  • Updated docs (for bug fixes / features)
  • Created issues for follow-up changes or refactoring needed

@dan-homebrew
Copy link
Contributor

dan-homebrew commented Sep 11, 2024

@namchuai I think this is a great start, but I'd like to push our testing strategy to be more rigorous:

  • What are the key failure modes of engine installation? (e.g. detecting hardware type?)
  • Or are we pushing hardware detection to a separate group of tests?
  • EDIT: I see what you are doing with checking for CLI output - I agree that it's the correct way to do it

Also: can I check my understanding:

  • Are we intending to do an actual pull of TensorRT-LLM or ONNX engines? (i.e. e2e)
  • How often do we intend this test to run?

engine/e2e-test/test_runner.py Outdated Show resolved Hide resolved
engine/e2e-test/test_cli_engine_uninstall.py Outdated Show resolved Hide resolved
@namchuai
Copy link
Contributor Author

@namchuai I think this is a great start, but I'd like to push our testing strategy to be more rigorous:

  • What are the key failure modes of engine installation? (e.g. detecting hardware type?)
  • I see that most of our tests are cursory log level checks
  • Or are we pushing hardware detection to a separate group of tests?
  1. What are the key failure modes of engine installation? (e.g. detecting hardware type?)
    IMO, some of failure modes of engine installations is:
  • Not supported hardware or low version of Cuda or driver.
  • Internet issue (engine, e.g tensorrt might be heavy user got a corrupted download)
  1. I see that most of our tests are cursory log level checks
    This PR is intended for put a foundation for end to end testing. It's not meant to cover all features of the app. Other devs should checkin and write their feature's testcases.

I did put:
- API to start and stop api server for other developers can easily write their tests with API, covered by setup and teardown for each test suite.
- API to run a command in case developer want to write test for CLI.
- I write test for download engine and cortex actually download the suitable engine based on user's specs.

IMO, end to end testing is based on user's perspective. It ensure that whatever we changes or update in the future, the feature still works. And it will include mostly stdout and stderr validation.

  1. Or are we pushing hardware detection to a separate group of tests?
    Other tests should be in other PRs. Let's keep this PR be the foundation.
    And, I'm a bit curious about hardware detection test. Since user's hardware spec is varies, I don't think we can put it in e2e tests.

I think we can separate it into 2 parts:
Part 1: Detect user's hardware
Part 2: Based on above result, pick most suitable engine variant

With this, we can mock and write unit test the Part 2 with ease.

Part 1 is more complicated and involves:

  • CPU check (AVX stuffs, ARM/x86_64, etc.)
    -> Current state: we can have all of this information in our code base.
  • GPU check (nVidia/AMD/Intel ARC/Integrated GPU, etc.)
    -> Current state: we have to rely on other software and parse their output to read information (nvidia-smi, vulkanSDKInfo, etc.)
    Since the input is varies, I can't think of a way to test this with all of the hardwares.

For this, I think we can write an e2e test for cortex-cpp run tinyllama and expect the output to be successful and improve from there.

@dan-homebrew
Copy link
Contributor

@namchuai I think this is a great start, but I'd like to push our testing strategy to be more rigorous:

  • What are the key failure modes of engine installation? (e.g. detecting hardware type?)
  • I see that most of our tests are cursory log level checks
  • Or are we pushing hardware detection to a separate group of tests?
  1. What are the key failure modes of engine installation? (e.g. detecting hardware type?)
    IMO, some of failure modes of engine installations is:
  • Not supported hardware or low version of Cuda or driver.
  • Internet issue (engine, e.g tensorrt might be heavy user got a corrupted download)
  1. I see that most of our tests are cursory log level checks
    This PR is intended for put a foundation for end to end testing. It's not meant to cover all features of the app. Other devs should checkin and write their feature's testcases.

I did put: - API to start and stop api server for other developers can easily write their tests with API, covered by setup and teardown for each test suite. - API to run a command in case developer want to write test for CLI. - I write test for download engine and cortex actually download the suitable engine based on user's specs.

IMO, end to end testing is based on user's perspective. It ensure that whatever we changes or update in the future, the feature still works. And it will include mostly stdout and stderr validation.

  1. Or are we pushing hardware detection to a separate group of tests?
    Other tests should be in other PRs. Let's keep this PR be the foundation.
    And, I'm a bit curious about hardware detection test. Since user's hardware spec is varies, I don't think we can put it in e2e tests.

I think we can separate it into 2 parts: Part 1: Detect user's hardware Part 2: Based on above result, pick most suitable engine variant

With this, we can mock and write unit test the Part 2 with ease.

Part 1 is more complicated and involves:

  • CPU check (AVX stuffs, ARM/x86_64, etc.)
    -> Current state: we can have all of this information in our code base.
  • GPU check (nVidia/AMD/Intel ARC/Integrated GPU, etc.)
    -> Current state: we have to rely on other software and parse their output to read information (nvidia-smi, vulkanSDKInfo, etc.)
    Since the input is varies, I can't think of a way to test this with all of the hardwares.

For this, I think we can write an e2e test for cortex-cpp run tinyllama and expect the output to be successful and improve from there.

@namchuai Got it - thank you for thinking through this very clearly. Approved and fingers crossed that these new e2e tests will help us catch bugs before it goes out to users 🙏

Copy link
Contributor

@dan-homebrew dan-homebrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for driving this 🙏

@namchuai namchuai merged commit 848af19 into dev Sep 12, 2024
4 checks passed
@namchuai namchuai deleted the j/add-pytest-for-e2e-testing branch September 12, 2024 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants