-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ONNX exporter #1826
base: master
Are you sure you want to change the base?
Add ONNX exporter #1826
Conversation
the end user.
Dynamo seems to be a bit buggy still
…t be causing the `asser_allclose` to fail. Make it a warning for now.
@MathijsdeBoer I think all the point you've written above are all valid when applying nnUNet in production environment. I think python is a required dependency in the foreseeable however removing pytorch would be already a huge win. |
@thangngoc89 I've since had the chance to run the exporter locally, and it produces an ONNX file successfully! I managed to get a very basic pipeline set up (no TTA, no parallelism, no ensembling, very naive sliding window, etc.) and it produces a very similar prediction on my local Quadro RTX 5000 machine in about a minute. For comparison, this is about the same time it takes our A100 server to create a prediction with the current nnUNet codebase (albeit with all the additional features included). Some caveates to the exporter, though, due to floating point stuff the models will never be 100% the same, so outputs will vary every so slightly. Usually this won't be noticable, except for those edge cases where the model is uncertain. As for your questions: Todo: Beyond the code, all that's left is for someone more intimately familiar with the codebase to double check my solution, to see if I'm not mis-/abusing anything. Perhaps I'm not quite loading the model correctly? Finally, to see if there is any interest in this functionality at all. Requiring nnUNet codebase: Dependencies: |
@MathijsdeBoer thank you for a very detailed answer.
Definitely. I was talking about not having Pytorch on inference mode. I will try to test this PR locally and provide feedbacks soon |
Hey @MathijsdeBoer thank you so much for all this work! Exporting to onnx is certainly something we should support. One of the reasons I have never done that so far is because I don't want to be responsible for people not being able to reproduce our results with their onnx files. After all, they would need to reimplement a lot of things around the model: Cropping, resampling, intensity normalization, sliding window inference with proper logit aggregation, resizing of the probabilities to the original data shape, conversion to segmentation, revert cropping. There are many opportunities for errors. I am happy to accept this PR once you are happy with the config.json file etc. Just ping me when everything is complete. One more thing (and please let me know if that doesn't make any sense - I am not very familiar with onnx and optimizations): Shouldn't it be possible to use the onnx optimized model in the regular nnU-Net inference code as well? Have you experimented with that? |
Hey Fabian, no worries. It didn't end up being a massive amount of work, as PyTorch has a built-in exporter! Main time was spent reading through documentation and nnUNet core code to see if I could reuse as many existing functions as possible. That should make it so that any updates to the core code will keep the exporter up to date as well. Barring any fundamental rewrites, of course.
Yep, I recall seeing you mention something like that a few times before. I could make the warning text a lot stronger, and add a disclaimer for any future support. Maybe something like:
Of course, there's always going to be someone who might ask questions on this, but this will at least make it pretty clear that they shouldn't expect any hand holding. One thing that might be nice, however, is a short document that has a step-wise overview of how the inference works. Optionally with where to find these things in the code. Just as a general reference for pipeline implementations. As for being happy with the config file, I think I've managed to include everything someone might need to accurately rebuild the pipeline with all possible export configurations. The only thing I haven't been able to test is what happens if someone has a label with multiple values. Fortunately, given the free nature of onnx pipeline implementations, there will be a lot of room for hardcoding things to fit their particular needs, so it might not be a big problem after all. I think that if you want to use ONNX in the existing nnUNet pipeline, you'd have to decouple the ...
# PyTorch
data = preprocess_data()
data = torch.tensor(data)
data.to(device)
torch_model = get_model()
torch_model.eval()
with torch.no_grad():
pred = torch_model(data)
pred.detach().cpu().numpy()
postprocess_pred(pred)
...
# ONNX
data = preprocess_data()
ort_model = ort.InferenceSession(
model_filepath,
providers=["CUDAExecutionProvider"] # Falls back to CPU automatically
)
# ONNX models use named inputs/outputs, exporter simply calls it "input", but this is more general
ort_inputs = {ort_model.get_inputs()[0].name: data}
# Identical to following in our case
ort_inputs = {"input": data}
pred = ort_model.run(None, ort_inputs)[0]
postprocess_pred(pred)
... There's a few more things one can do before starting inference sessions, such as offline optimizations ahead of time. This should make the model faster and/or smaller, but unfortunately might still impart a slight performance penalty. So these tradeoffs would be up to the end-user. |
Purpose
Hey there! I, and others, have occasionally wondered if it was possible to export the trained nnU-Nets to an ONNX format, for deployment, or other sharing. To this end, I've cobbled together an exporter using the built-in pytorch ONNX exporter.
In the past, I had managed to get an exporter working for v1, but this won't work since the release of v2 changed some things around. Nevertheless, the exported information let us build a very minimalist copy of the entire nnU-Net pipeline that produced outputs faster than the v1 inference pipeline, and with less overhead.
Implementation
I've "repurposed" bits and pieces of code I found around the project (Mainly
nnunetv2.inference.predict_from_raw_data.py
) to load in a network, set the parameters and acquire any information the user might want to build their own ONNX pipeline. The command works roughly the same as thennUNetv2_export_model_to_zip
command, but exports an.onnx
file for each fold and configuration combination instead. Each.onnx
file should be accompanied with a.json
file that has some basic information the user might want to know to build their own ONNX pipeline with.Finally, it's made perfectly clear that any ONNX pipelining is the sole responsibility of the end-user, not the maintainers of nnU-Net through a big warning box which is printed before the export begins.
Motivation
While I can't speak for any others' reasons, our main reason to want an ONNX format is the deployment of the trained models to an inference server. Due to server costs, we might want to use some of ONNX's excellent optimization techniques for faster inference, maybe move away from a python-based pipeline in favor of compiled languages, like C. C++ or Rust. Finally, a lot of CUDA+Python Docker containers produce gigantic files (for example, the docker image I use to train the nnU-Nets is 24GB when exported), which will take an extremely long time to boot up on servers.
A lot of clinicians really would like to see neural networks applied in their practice, and quite a few companies are willing make that investment. However, server costs are still a large burden, so any ability to offer a way to export an nnU-Net to a common format, even without further support beyond the exporter, will hopefully allow a larger adoption outside of pure research across the board.