-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide an efficient inference implementation using sparsification/quantization #206
Labels
Comments
jpata
changed the title
Provide an efficient inference implementation using sparsification/quantization
Provide an efficient GNN inference implementation using sparsification/quantization
Sep 14, 2023
jpata
changed the title
Provide an efficient GNN inference implementation using sparsification/quantization
Provide an efficient GNN inference implementation using sparsification/quantization with ONNX
Sep 29, 2023
adding @raj2022 |
jpata
changed the title
Provide an efficient GNN inference implementation using sparsification/quantization with ONNX
Provide an efficient inference implementation using sparsification/quantization
Apr 11, 2024
Also related: #315 |
Basically, to summarize:
I'm closing this issue, and putting it on the roadmap to study ONNX post-training static quantization separately. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Goal: reduce inference time of the model using quantization
We made some CPU inference performance results public for 2021 in CMS, https://cds.cern.ch/record/2792320/files/DP2021_030.pdf slide 16, “For context, on a single CPU thread (Intel i7-10700 @ 2.9GHz), the baseline PF requires approximately (9 ± 5) ms, the MLPF model approximately 320 ± 50 ms for Run 3 ttbar MC events”.
Now it's a good time to make the inference as fast as possible, while minimizing any physics impact.
Resources:
The text was updated successfully, but these errors were encountered: