-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Tensor zero point sparsity #1304
Closed
Closed
Changes from 7 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
b12d72f
Update onednn.js
lutzroeder 99c0381
Merge branch 'main' of https://github.com/kylesayrs/netron into main
kylesayrs 1b2cf28
Merge branch 'main' of https://github.com/lutzroeder/netron into main
kylesayrs bf425ca
Merge branch 'main' of https://github.com/lutzroeder/netron into main
kylesayrs 8b46d9a
Merge branch 'main' of https://github.com/lutzroeder/netron into main
kylesayrs 932b101
WIP: zero point
kylesayrs 2a5eb19
implement the rest of the op types
kylesayrs effbf09
Update TorchScript test files (#1067)
lutzroeder dda6828
Merge branch 'main' of https://github.com/lutzroeder/netron into tens…
kylesayrs fa41bde
fix lints
kylesayrs File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order for a tensor to know its zero, it must get the sibling initializer to it on a quantized node
The onnx format doesn't allow for easy lookup from initializer to node or from initializer to initializer, so it's best to
setZero
afteronnx.Tensor
has been constructed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An initializer is a parameter to a node. A node should not modify an initializer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code happens within
onnx.Context.Graph
, so it is the onnx context graph which is modifying initializers. Setting the zero point fromonnx.Context.Graph
seems like the best option?Attempting to force the zero point to be set during initializer construction seems like a bad idea. Initializers must know the context of their parent node in order to know their zero point.
O(N*M)
ifN
is the number of nodes andM
is the number of tensors, which seems like unacceptable runtime.onnx.Tensor
(ie, in some place likeonnx.Context.Graph
), which is similar to the current implementation. It also requires all nodes to be ingested into the lookup table before tensors are constructed, which also seems like an unacceptable solution.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initializer has no zero point. Only the context of the operation or result of the operation does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lutzroeder The zero point is needed by the initializer/tensor to correctly calculate the sparsity metric. Without the zero point, the sparsity metric for quantized weights is meaningless. It would be nice if the sparsity metric for quantized tensors reflected the true number of zeros that would be skipped by a sparsity-aware inference engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lutzroeder Sorry, let me see if I can summarize what you're suggesting
While it's true that these calculations are somewhat similar to runtime execution, I don't consider them to be the same since calculating sparsity and flops can be done much faster and can be done lazily/piece-wise behind expanders as previously suggested.
I agree that work will need to be done to support different formats. Metrics calculations may go unsupported for some formats. I'm not sure if I understand the alternatives though.
When you refer to runtime itself, I assume you mean ORT or other 3rd party runtimes? Would the intention be to have the user run the model outside of Netron? The problem would obviously be a non-standardization/absence of metrics output files. Even if there was a sparsity-aware runtime for each format which outputted metrics files, these metrics files would be in different formats and we would have the same problem as 1 where we now have to do work to support all runtime output formats.
This part I do not understand. This seems similar to the job of a profiler specific to the runtime? How would Netron know the internals of how a runtime performed on a network? What would it look like for the user to "attach a runtime"?
I truly apologize if I am missing something obvious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. These would be the harder questions that need better answers. It isn't clear what the right path is.
Are these a few exceptions or the beginning of discovering that many useful metrics require deeper runtime integration or crossing over into profiling.
Which metrics output file formats and tools do exist. How are these problems solved today?
UX approach isn't clear. How can users intuitively understand what's happening. The sparsity is not the sparsity of this tensor, it's the result of an operation. Some formats tweaking source tensors and others not is confusing at best. Users wouldn't be able to tell without reading the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lutzroeder I have implemented sparsity and ops calculations in the past for ONNX here: neuralmagic/sparsezoo#173. The only metric which was somewhat runtime specific was the flops calculations for conv.
If we choose not to calculate skipped padding for conv ops, then we don't need to iterate through tensor values at all and can calculate ops from the raw number of zeros. This makes all common onnx operations one-pass linear time to calculate metrics for.
In terms of performance, I suspect that calculating most individual tensor metrics will be negligible, especially if they're behind expanders. Calculating metrics for all tensors/nodes so that they can be searched will likely take some time no matter which approach is taken. I believe users would be willing to endure some wait time for these metrics if Netron
Giving users the option to use these metrics in exchange for some wait time seems like a valuable feature.
I think 2 and 3 are viable, but having users use an external engine or attach a runtime will likely be slower than if Netron calculates the metrics directly, not to mention more inconvenient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because Netron supports 1 doesn't mean it can't support 2 and/or 3 down the line. Indeed I think Netron should support 2 in some capacity. All of these options require format-specific implementations, at least with 1, more of the logic is shared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, trying to summarize:
This pull request should not merge as modifying the source tensors is running the computation and storing the result in the source tensor which will lead to conflicts. It would not be transparent to users that the source tensor was modified and confusing as this happens in some cases but not others. A better way to model this might be sparsity of the operation (e.g. can
Node
have asparsity
).This leads to next level of questions how the computation is performed and tensor access is needed to perform these computations. Not sure if these issues need to be solved immediately towards making metrics a minimum viable feature or if this can be more exploratory. There seem to be more basic changes like lazy expansion for tensor properties and basic metrics like
min
,max
,stddev
. Also, now that some effort went into tensor properties, would tensor visualizations for example be more valuable and broadly applicable than handling zero point in special operators in a single framework.On the explorations side, is it possible to generalize metrics that need tensor data and computation to avoid implementations for each framework or is that too far fetched? If the later, will this lead to more of an execution layer that needs to be implemented for each framework (similar question for shape inference) and should this code be always loaded. Starting down this route will create the expectation that this gets implemented and maintained more broadly. e.g. how to guarantee these "partial execution implementations" are more than prototypes adding maintenance scope. Other routes worth exploring would be driving metric implementations via external tools or the runtime and serializing to a metrics file (which might be also desirable for performance metrics) or attaching to Python runtimes to avoid duplicate execution implementations and enabling live execution.