-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cranelift-wasm: Only allocate if vectors need bitcasts #4543
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it help at all to use a
SmallVec
here, tuned to some size that captures most cases?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about using
SmallVec
but figured first I'd try the experiment that doesn't require any tuning. I'm not sure: how would you suggest picking a size? In the past I've usually picked however many elements will fit in twousize
, because that's the minimum size the inline part of aSmallVec
will consume anyway. But in this case that's 1, which hardly seems worth doing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
SmallVec
in this case will be on-stack so we can be a bit looser with its size I think (andsub rsp, 128
is almost always faster thanmalloc(128)
). For some cases like this I have just picked a reasonable value but to be more objective about it we could try 1, 4, 16 and see what the difference is...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the intgemm-simd benchmark, Sightglass says plain
Vec
is 53-73% faster than a size-16 SmallVec, and 8-22% faster than a size-4 SmallVec. I don't understand that result, do you?Unless there's a simple explanation I'm not seeing, I'm inclined to not think further about SmallVec right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's somewhat surprising, but I can hypothesize at least one potential explanation: if the return value is almost always an empty list, then the empty SmallVec takes more stack space than an empty Vec and hence creates a little more spread in the memory usage (so bad cache effects). I'm stretching a bit with that though.
Anyway if it's rare enough then a simple
Vec
is totally fine here; it's certainly better than what we had before!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to give size-1 SmallVec a shot too for completeness: if it's a matter of different stack frame sizes, then that should be no worse than Vec. But Vec was 44-58% faster on intgemm-simd than even size-1 SmallVec. So I remain thoroughly puzzled. 🤷