-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TOPI] Vectorize depthwise conv2d output operator #14519
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be the value of
ci
limited somehow? What if the value ofci
is huge? You can usefilter
argument indefine_split
to prevent this situation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. I'm not sure what would happen if
ci
was huge, I guess LLVM would legalize the vectors and loop over the size of the vector registers in hardware when they are storing 32bit floats.I sort of just copied this from above though using the same scheduling that we use for the actual convolution.
Since the fallback scheduling is to use a loop splitting where
ci = 8
and for the hardware in question this is the number of 32bit float in a vector I think we should be okay. In the case we aren't using the fallback i.e. auto-scheduling I would hope the auto-scheduler is able to find the optimal choice forci
which should be at least as good as8
or better, but perhaps there are scenarios where it could choose a bad value?We could do something similar to how we handle regular convolutions, and define a tunable parameter for vectorization, but that would potentially change the scheduling for the depthwise convolution here, rather than its output, which is sort of beyond the initial scope of this PR.
What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your answer. I think that to avoid guesses about the size of
ci
(are use sure that it will be equal to8
?) you can just add a filter parameter to define split, e.g.:In this case I believe we can guarantee that
ci
won't be bigger than 32. What do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are already defining a split here though, this is where I got the value of
8
from.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This split will work only for run w/o tuning statistic (in default mode). I suggest restricting the search space in tuning by adding filter value in
define_split
. It will help to improve tuning time and dropout useless configurations.By the way, I reread your message and noticed that probably I was wrong than suggest to restrict vectorization size by 32. You wrote that it is storing 32bit floats, that means that the maximum possible capacity for vectorization is 8, am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for the suggestion, I've added a filter on the split.
Yeah I think you are almost correct, for neon where vectors are
128
bits andsizeof(float) = 32
, then there will be128 / 32 = 4
elements processed in each vector operation. I think the backend will split the 8 element llvm vectors into two 4 element vectors during legalization, so this isn't a problem.