-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port remaining HECO evaluation artifacts and ensure they produce equivalently optimal results. #571
Comments
This PR ports the gx_kernel evaluation artifact from HECO, and supports it by doing two things: - Adding a canonicalize pass before insert-rotate but after full-loop-unroll, to ensure that ND tensors of constants are materialized to constants of tensors. - Adding two patterns to splat constants into tensors during rotations insertion. This leaves unanswered an important question: how should we detect and handle plaintext types that are not constants (say, function inputs or function inputs that are modified by some IR-internal ops). I will file a followup issue on that topic. Part of #571 PiperOrigin-RevId: 621247689
This PR ports the gx_kernel evaluation artifact from HECO, and supports it by doing two things: - Adding a canonicalize pass before insert-rotate but after full-loop-unroll, to ensure that ND tensors of constants are materialized to constants of tensors. - Adding two patterns to splat constants into tensors during rotations insertion. This leaves unanswered an important question: how should we detect and handle plaintext types that are not constants (say, function inputs or function inputs that are modified by some IR-internal ops). I will file a followup issue on that topic. Part of #571 PiperOrigin-RevId: 621247689
This PR ports the gx_kernel evaluation artifact from HECO, and supports it by doing two things: - Adding a canonicalize pass before insert-rotate but after full-loop-unroll, to ensure that ND tensors of constants are materialized to constants of tensors. - Adding two patterns to splat constants into tensors during rotations insertion. This leaves unanswered an important question: how should we detect and handle plaintext types that are not constants (say, function inputs or function inputs that are modified by some IR-internal ops). I will file a followup issue on that topic. Note canonicalize slows the box_blur_64x64 test to a crawl (> 15m), so I converted it to "enormous" size so that it is skipped in CI. Part of #571 PiperOrigin-RevId: 621247689
This PR ports the gx_kernel evaluation artifact from HECO, and supports it by doing two things: - Adding a canonicalize pass before insert-rotate but after full-loop-unroll, to ensure that ND tensors of constants are materialized to constants of tensors. - Adding two patterns to splat constants into tensors during rotations insertion. This leaves unanswered an important question: how should we detect and handle plaintext types that are not constants (say, function inputs or function inputs that are modified by some IR-internal ops). I will file a followup issue on that topic. Note canonicalize slows the box_blur_64x64 test to a crawl (> 15m), so I converted it to "enormous" size so that it is skipped in CI. Part of #571 PiperOrigin-RevId: 621247689
Last one appears to be upgrading the roberts cross 4x4 to 64x64 (though we know this will be very slow due to #589). Maybe in the mean time we can limit it to 32x32. |
This PR ports the gx_kernel evaluation artifact from HECO, and supports it by doing two things: - Adding a canonicalize pass before insert-rotate but after full-loop-unroll, to ensure that ND tensors of constants are materialized to constants of tensors. - Adding two patterns to splat constants into tensors during rotations insertion. This leaves unanswered an important question: how should we detect and handle plaintext types that are not constants (say, function inputs or function inputs that are modified by some IR-internal ops). I will file a followup issue on that topic. Note canonicalize slows the box_blur_64x64 test to a crawl (> 15m), so I converted it to "enormous" size so that it is skipped in CI. Part of #571 PiperOrigin-RevId: 621247689
This PR ports the gx_kernel evaluation artifact from HECO, and supports it by doing two things: - Adding a canonicalize pass before insert-rotate but after full-loop-unroll, to ensure that ND tensors of constants are materialized to constants of tensors. - Adding two patterns to splat constants into tensors during rotations insertion. This leaves unanswered an important question: how should we detect and handle plaintext types that are not constants (say, function inputs or function inputs that are modified by some IR-internal ops). I will file a followup issue on that topic. Note canonicalize slows the box_blur_64x64 test to a crawl (> 15m), so I converted it to "enormous" size so that it is skipped in CI. Part of #571 PiperOrigin-RevId: 621321174
I'm trying to get up to speed after RWC/Easter Holidays, and I saw that you mentioned compile times in the order of 50 minutes in #589, which sounds an order of magnitude worse than what HECO gets. In HECO, I did a few tricks to speed things up, such as moving one or two patterns out of normal canonicalization into their own pass, maybe we can do something similar here. EDIT: It seems like the PR (#587) mentioned in that issue is somehow borked - there's no commits?
Mh, that's odd - the repo, and the
Sure, let me run those for you and figure out how best to post them. |
That is bizarre about copybara. Here's the commit: 4d84f04 |
Yeah I was thinking I could extract the needed patterns (converting the cleartext weight tensor to individual constants would suffice), but haven't had the chance to figure out exactly which pattern does that in canonicalize. |
Here's that HECO output, I figured adding it to the repo as another tag would be easier than creating a bunch of gists: HECO@artifact_mlir_output |
Summarizing:
So remaining work is:
|
👀 I think HECO generates the same program as porcupine here (which should be optimal, so I'd be surprised to see something better) but let me check whether this is actually the case. EDIT: I looked it up and Porcupine's solution is better, but it still needs 3 rotations, not 2. |
After discussion, we realized the reason HEIR seems to do worse on the box_blur example is because HECO's kernel is 2x2, while HEIR's is 3x3, so HEIR's should be equivalent to HECO on that example. robertscross_64x64 is the remaining example to inspect. |
Now that the first e2e runtime test is done (simple_sum_test.cpp), I'm looking at adding analogous tests for the rest of the HECO examples to bolster our confidence in their correctness. @AlexanderViand-Intel: do you happen to have any input-output test pairs lying around from HECO? Just to save some time manually generating them. |
Roberts cross had a bug in the input MLIR that is fixed in #676 |
Sorry, everything from HECO would be SEAL based (e.g., Porcupine-based SEAL implementation), so I think what you ended up doing is the most effective way forward. |
I think this is the last open item in this issue, right? |
From #571 (comment), we agreed that it was 7 because the kernel is 3x3, whereas HECO did 2x2. The gx_kernel still has one more rotation than HECO that I haven't explained, and it also doesn't have an end-to-end test yet. |
🤦 Sorry, I got confused by seeing the intermediate-value rotations, which is structurally different from what HECO generates. This is actually a good thing, though , as HEIR does in fact get 3 rotations in 2x2 kernel mode, but HECO for 3x3 requires (as I'd expect) 8 rotations (kernel_size - 1).
I wouldn't be surprised if it turns out this is another bug in the HECO inputs 🙈 as it'd be odd for HEIR to successfully handle kernels of this style but not two such kernels in sequence. |
Closing and removing remaining work to #758 |
@AlexanderViand-Intel I may need your help here. I haven't figured out how to build HECO yet (everything succeeds in your repo instructions up to the linking step, then it fails with a bazillion linker errors to LLVM).
What I think I'd want is just the output MLIR from each of the evaluation artifacts, so that I can compare the rotation counts in them with what HEIR produces. I tried manually porting roberts-cross, and I think I see where HEIR is still lacking, but I want some confirmation.
The text was updated successfully, but these errors were encountered: