Inline Index and IndexMut implementations #79
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I ran into this when optimizing simdnoise. With gather ops no longer being a thing, I reimplemented them as standard loops. This led to a 20-30% slowdown in my application (which spends around half its time in simdnoise, so simdnoise itself is about 40-60% slower).
I thought the software gather ops were just inherently slower, but the actual problem is this line, with the innocuous-looking
indices[i]
access.i
ranges from 0 toWIDTH
, so the compiler should be able to easily remove the bounds check, but it couldn't, because the indexing operations couldn't be inlined.With that fixed, simdnoise should now be as fast as it used to be before updating to simdeez 2.