Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[onnx] Build real onnx frontend cli/API #18289

Open
stellaraccident opened this issue Aug 19, 2024 · 2 comments
Open

[onnx] Build real onnx frontend cli/API #18289

stellaraccident opened this issue Aug 19, 2024 · 2 comments
Labels
integrations/onnx ONNX integration work quality of life 😊 Nice things are nice; let's have some

Comments

@stellaraccident
Copy link
Collaborator

The current ONNX compilation flow is serviceable but somewhat user hostile, directly requiring too much low-level manipulation of the system. Recommend creating a real ONNX frontend API and CL which handles all of the following properly. This would replace the existing frontdoor here: https://iree.dev/guides/ml-frameworks/onnx/

  • Automatic version upgrades to the current minver (17). There are a lot of old models out there and we have only implemented >= 17 compatibility. We shouldn't be asking users to roll their own Python upgrader script.
  • Proper separation of parameters in the default path (perhaps with an option to leave everything inlined).
  • Serialize intermediates with MLIR bytecode vs text format.
  • Enable MLIR reproducers and document what users should do with them (will work much better if parameter separation is default).
  • Invokes the compiler pipeline by default vs needing the user to separate import/compile (can still have flags to do import only, etc).
  • Presents error messages with command lines for using lower level tools for repro/diagnostic purposes.
  • Exposed as a supported Python API with CLI wrapper.
  • When used for CLI, maybe some eye candy textui progress, etc.

For the record, when I took myself down this journey recently, I started with something simple and ended up with this: https://gist.github.com/stellaraccident/1b3366c129c3bc8e7293fb1353254407

@ScottTodd ScottTodd added integrations/onnx ONNX integration work quality of life 😊 Nice things are nice; let's have some labels Aug 19, 2024
@stellaraccident
Copy link
Collaborator Author

Couple of more items needed:

  • Ability to set the entry-point name (this seems to be coming from some onnx graph id of some kind and seems all over the place: I see things like "torch-jit-export" when importing now)
  • Ability to set the module name (we often just call these "module" but that keeps them from co-existing when loading multiple into the same context);

stellaraccident added a commit to nod-ai/shark-ai that referenced this issue Sep 6, 2024
This test is not particularly inspired (and the API needs to be simplified) but it represents the first full system test in the repo.

In order to run the test, it is downloading a mobilenet onnx file from the zoo, upgrading it, and compiling. In the future, I'd like to switch this to a simpler model like MNIST for basic functionality, but I had some issues getting that to work via ONNX import and punted. While a bit inefficient (it will fetch on each pytest run), this will keep things held together until we can do something more comprehensive. Note that my experience here prompted me to file iree-org/iree#18289, as this is way too much code and sharp edges to compile from ONNX (but it does work). Verifies numerics against a silly test image.

Includes some fixes:

* Reworked the system detect marker so that we only run system specific tests (like amdgpu) on opt-in via a `--system amdgpu` pytest arg. This refinement was prompted by an ASAN violation in the HIP runtime code which was tripping me up when enabled by default. Filed here: iree-org/iree#18449
* Fixed a bug revealed when writing the test where an exception thrown from main could trigger a use-after-free because we were clearing workers when shutting down (vs at destruction) when all objects owned at the system level need to have a lifetime no less than the system.
stellaraccident added a commit to nod-ai/shark-ai that referenced this issue Sep 6, 2024
This test is not particularly inspired (and the API needs to be
simplified) but it represents the first full system test in the repo.

In order to run the test, it is downloading a mobilenet onnx file from
the zoo, upgrading it, and compiling. In the future, I'd like to switch
this to a simpler model like MNIST for basic functionality, but I had
some issues getting that to work via ONNX import and punted. While a bit
inefficient (it will fetch on each pytest run), this will keep things
held together until we can do something more comprehensive. Note that my
experience here prompted me to file
iree-org/iree#18289, as this is way too much
code and sharp edges to compile from ONNX (but it does work). Verifies
numerics against a silly test image.

Includes some fixes:

* Reworked the system detect marker so that we only run system specific
tests (like amdgpu) on opt-in via a `--system amdgpu` pytest arg. This
refinement was prompted by an ASAN violation in the HIP runtime code
which was tripping me up when enabled by default. Filed here:
iree-org/iree#18449
* Fixed a bug revealed when writing the test where an exception thrown
from main could trigger a use-after-free because we were clearing
workers when shutting down (vs at destruction) when all objects owned at
the system level need to have a lifetime no less than the system.
@ScottTodd
Copy link
Member

Some of my own observations from a day of coding:

  • An authoritative data type enum mapping could be hosted in a common location, similar to https://onnx.ai/onnx/api/mapping.html.
  • A step beyond that would be mapping between function signatures (ONNX proto inputs/outputs) and IREE in-process runtime function signatures or iree-run-module / MLIR tooling signatures. Not sure at what level we'd want to share that code... might be test utils, but could also be useful for "real" usage.
  • I found myself writing files to disk that could stay in memory and sometimes loading the same file/model into memory in multiple places. Would be nice to have the option to stay fully in memory (ideally including parameter archives I guess?)
  • ONNX Runtime has its own data structures and bindings: https://onnxruntime.ai/docs/api/python/api_summary.html#data-inputs-and-outputs. May want to interop with those ... though perhaps that would only need to happen in an execution provider, and not the standalone/standard tools.

ScottTodd added a commit to iree-org/iree-test-suites that referenced this issue Sep 19, 2024
Progress on #6.

A sample test report HTML file is available here:
https://scotttodd.github.io/iree-test-suites/onnx_models/report_2024_09_17.html

These new tests

* Download models from https://github.com/onnx/models
* Extract metadata from the models to determine which functions to call
with random data
* Run the models through [ONNX Runtime](https://onnxruntime.ai/) as a
reference implementation
* Import the models using `iree-import-onnx` (until we have a better
API: iree-org/iree#18289)
* Compile the models using `iree-compile` (currently just for `llvm-cpu`
but this could be parameterized later)
* Run the models using `iree-run-module`, checking outputs using
`--expected_output` and the reference data

Tests are written in Python using a set of pytest helper functions. As
the tests run, they can log details about what commands they are
running. When run locally, the `artifacts/` directory will contain all
the relevant files. More can be done in follow-up PRs to improve the
ergonomics there (like generating flagfiles).

Each test case can use XFAIL like
`@pytest.mark.xfail(raises=IreeRunException)`. As we test across
multiple backends or want to configure the test suite from another repo
(e.g. [iree-org/iree](https://github.com/iree-org/iree)), we can explore
more expressive marks.

Note that unlike the ONNX _operator_ tests, these tests use
`onnxruntime` and `iree-import-onnx` at test time. The operator tests
handle that as an infrequently ran offline step. We could do something
similar here, but the test inputs and outputs can be rather large for
real models and that gets into Git LFS or cloud storage territory.

If this test authoring model works well enough, we can do something
similar for other ML frameworks like TFLite
(#5).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrations/onnx ONNX integration work quality of life 😊 Nice things are nice; let's have some
Projects
None yet
Development

No branches or pull requests

2 participants