-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Experiment] Clone blocks with bounds checks #112595
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This PR also sort of closes #109983 For this case we don't have to clone the blocks actually, but it's better than nothing 🙂 |
@EgorBot -amd -arm --filter benchmarkLU |
This comment was marked as resolved.
This comment was marked as resolved.
Actually, this should be good as is (as a start). Diffs - obviously, a size regression, but mostly a clean PerfScore improvement:
The PR scans all GT_BOUNDS_CHECK nodes in a block, then groups them by "base indexVN + lengthVN" in a hash table (sort of I decided to perform this optimization after the range check phase - it allows me to handle only those bounds checks nobody was able to handle before me. A bit unfortunate phase ordering issue is that CSE sometimes decides to perform CSE for the index tree (because it's used in the bounds check and the actual array access) and when my algorithm drops the bounds check node, we're left with a redundant local: This is main source of the regressions - it'd be nice to have some late forward sub to clean these up. Also, this issue is solved with JitOptRepeat, perhaps, my phase could request an extra iteration of that in the future (not today as JitOptRepeat has issues). PTAL @jakobbotsch @AndyAyersMS @dotnet/jit-contrib |
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM now. Cool opt 🙂
Closes #112524
Closes #109983
For an arbitrary block
In order to remove bounds checks, JIT can clone the whole thing to have fast and slow paths under "block clonning conditions":
It works not only for stores, but for any expressions with bounds checks, e.g.
arr[1] + arr[2] + arr[3]
Codegen example:
Current codegen (4 bounds checks):
New codegen (single SIMD store in the fast path!):