Port remaining HECO evaluation artifacts and ensure they produce equivalently optimal results. #571

j2kun · 2024-03-27T21:50:52Z

@AlexanderViand-Intel I may need your help here. I haven't figured out how to build HECO yet (everything succeeds in your repo instructions up to the linking step, then it fails with a bazillion linker errors to LLVM).

What I think I'd want is just the output MLIR from each of the evaluation artifacts, so that I can compare the rotation counts in them with what HEIR produces. I tried manually porting roberts-cross, and I think I see where HEIR is still lacking, but I want some confirmation.

This PR ports the gx_kernel evaluation artifact from HECO, and supports it by doing two things: - Adding a canonicalize pass before insert-rotate but after full-loop-unroll, to ensure that ND tensors of constants are materialized to constants of tensors. - Adding two patterns to splat constants into tensors during rotations insertion. This leaves unanswered an important question: how should we detect and handle plaintext types that are not constants (say, function inputs or function inputs that are modified by some IR-internal ops). I will file a followup issue on that topic. Part of #571 PiperOrigin-RevId: 621247689

This PR ports the gx_kernel evaluation artifact from HECO, and supports it by doing two things: - Adding a canonicalize pass before insert-rotate but after full-loop-unroll, to ensure that ND tensors of constants are materialized to constants of tensors. - Adding two patterns to splat constants into tensors during rotations insertion. This leaves unanswered an important question: how should we detect and handle plaintext types that are not constants (say, function inputs or function inputs that are modified by some IR-internal ops). I will file a followup issue on that topic. Note canonicalize slows the box_blur_64x64 test to a crawl (> 15m), so I converted it to "enormous" size so that it is skipped in CI. Part of #571 PiperOrigin-RevId: 621247689

j2kun · 2024-04-02T21:00:42Z

Last one appears to be upgrading the roberts cross 4x4 to 64x64 (though we know this will be very slow due to #589). Maybe in the mean time we can limit it to 32x32.

This PR ports the gx_kernel evaluation artifact from HECO, and supports it by doing two things: - Adding a canonicalize pass before insert-rotate but after full-loop-unroll, to ensure that ND tensors of constants are materialized to constants of tensors. - Adding two patterns to splat constants into tensors during rotations insertion. This leaves unanswered an important question: how should we detect and handle plaintext types that are not constants (say, function inputs or function inputs that are modified by some IR-internal ops). I will file a followup issue on that topic. Note canonicalize slows the box_blur_64x64 test to a crawl (> 15m), so I converted it to "enormous" size so that it is skipped in CI. Part of #571 PiperOrigin-RevId: 621247689

This PR ports the gx_kernel evaluation artifact from HECO, and supports it by doing two things: - Adding a canonicalize pass before insert-rotate but after full-loop-unroll, to ensure that ND tensors of constants are materialized to constants of tensors. - Adding two patterns to splat constants into tensors during rotations insertion. This leaves unanswered an important question: how should we detect and handle plaintext types that are not constants (say, function inputs or function inputs that are modified by some IR-internal ops). I will file a followup issue on that topic. Note canonicalize slows the box_blur_64x64 test to a crawl (> 15m), so I converted it to "enormous" size so that it is skipped in CI. Part of #571 PiperOrigin-RevId: 621321174

AlexanderViand-Intel · 2024-04-04T11:51:05Z

I'm trying to get up to speed after RWC/Easter Holidays, and I saw that you mentioned compile times in the order of 50 minutes in #589, which sounds an order of magnitude worse than what HECO gets. In HECO, I did a few tricks to speed things up, such as moving one or two patterns out of normal canonicalization into their own pass, maybe we can do something similar here.

EDIT: It seems like the PR (#587) mentioned in that issue is somehow borked - there's no commits?

@AlexanderViand-Intel I may need your help here. I haven't figured out how to build HECO yet (everything succeeds in your repo instructions up to the linking step, then it fails with a bazillion linker errors to LLVM).

Mh, that's odd - the repo, and the artifact tag specifically, should "just" work, having been through the USENIX artifact evaluation process. What linker did you use? lld or something else?

What I think I'd want is just the output MLIR from each of the evaluation artifacts, so that I can compare the rotation counts in them with what HEIR produces. I tried manually porting roberts-cross, and I think I see where HEIR is still lacking, but I want some confirmation.

Sure, let me run those for you and figure out how best to post them.

j2kun · 2024-04-04T13:00:45Z

That is bizarre about copybara. Here's the commit: 4d84f04

j2kun · 2024-04-04T13:54:25Z

Yeah I was thinking I could extract the needed patterns (converting the cleartext weight tensor to individual constants would suffice), but haven't had the chance to figure out exactly which pattern does that in canonicalize.

AlexanderViand-Intel · 2024-04-08T14:16:09Z

What I think I'd want is just the output MLIR from each of the evaluation artifacts, so that I can compare the rotation counts in them with what HEIR produces. I tried manually porting roberts-cross, and I think I see where HEIR is still lacking, but I want some confirmation.

Sure, let me run those for you and figure out how best to post them.

Here's that HECO output, I figured adding it to the repo as another tag would be easier than creating a bunch of gists: HECO@artifact_mlir_output

j2kun · 2024-04-08T18:12:34Z

Summarizing:

	boxblur_64x64.mlir	dotproduct_8.mlir	gxkernel_64x64.mlir	hammingdistance_4.mlir	linearpolynomial_64.mlir	quadraticpolynomial_64.mlir	robertscross_64x64.mlir
HEIR	7	3	6 (8x8, not 64x64)	3	0	0	2 (4x4)
HECO	3	3	5	3	0	0	4

So remaining work is:

Improve box_blur
upgrade gx_kernel to 64x64 from 8x8 and roberts_cross from 4x4 to 64x64 (Upgrade roberts cross and gx_kernel to 64x64 #604)
Scrutinize roberts_cross to ensure that HEIR's "better" program is actually correct.

AlexanderViand-Intel · 2024-04-09T08:38:17Z

Scrutinize roberts_cross to ensure that HEIR's "better" program is actually correct.

👀 I think HECO generates the same program as porcupine here (which should be optimal, so I'd be surprised to see something better) but let me check whether this is actually the case.

EDIT: I looked it up and Porcupine's solution is better, but it still needs 3 rotations, not 2.

j2kun · 2024-04-18T13:14:08Z

After discussion, we realized the reason HEIR seems to do worse on the box_blur example is because HECO's kernel is 2x2, while HEIR's is 3x3, so HEIR's should be equivalent to HECO on that example. robertscross_64x64 is the remaining example to inspect.

j2kun · 2024-04-26T17:07:44Z

Now that the first e2e runtime test is done (simple_sum_test.cpp), I'm looking at adding analogous tests for the rest of the HECO examples to bolster our confidence in their correctness.

@AlexanderViand-Intel: do you happen to have any input-output test pairs lying around from HECO? Just to save some time manually generating them.

j2kun · 2024-05-08T18:48:14Z

Roberts cross had a bug in the input MLIR that is fixed in #676

AlexanderViand-Intel · 2024-05-13T06:12:18Z

@AlexanderViand-Intel: do you happen to have any input-output test pairs lying around from HECO? Just to save some time manually generating them.

Sorry, everything from HECO would be SEAL based (e.g., Porcupine-based SEAL implementation), so I think what you ended up doing is the most effective way forward.

AlexanderViand-Intel · 2024-05-13T06:14:21Z

So remaining work is:

Improve box_blur

I think this is the last open item in this issue, right?
According to the box_blur_64x64.mlir test, HEIR still generates a solution with 7 rotations, some of which are rotations not of the input image, but of intermediate results.

j2kun · 2024-05-13T16:44:42Z

So remaining work is:

Improve box_blur

I think this is the last open item in this issue, right? According to the box_blur_64x64.mlir test, HEIR still generates a solution with 7 rotations, some of which are rotations not of the input image, but of intermediate results.

From #571 (comment), we agreed that it was 7 because the kernel is 3x3, whereas HECO did 2x2.

The gx_kernel still has one more rotation than HECO that I haven't explained, and it also doesn't have an end-to-end test yet.

AlexanderViand-Intel · 2024-05-13T17:29:16Z

From #571 (comment), we agreed that it was 7 because the kernel is 3x3, whereas HECO did 2x2.

🤦 Sorry, I got confused by seeing the intermediate-value rotations, which is structurally different from what HECO generates. This is actually a good thing, though , as HEIR does in fact get 3 rotations in 2x2 kernel mode, but HECO for 3x3 requires (as I'd expect) 8 rotations (kernel_size - 1).

The gx_kernel still has one more rotation than HECO that I haven't explained, and it also doesn't have an end-to-end test yet.

I wouldn't be surprised if it turns out this is another bug in the HECO inputs 🙈 as it'd be odd for HEIR to successfully handle kernels of this style but not two such kernels in sequence.

j2kun · 2024-06-30T12:30:33Z

Closing and removing remaining work to #758

This was referenced Mar 28, 2024

Port roberts_cross from HECO, and use sparse constant analysis to identify static indices #573

Merged

Rewrite rotation analysis to support dot product port from HECO #575

Merged

Port gxkernel example #577

Closed

copybara-service bot mentioned this issue Apr 2, 2024

Handle one-sided constants in insert-rotate #587

Closed

j2kun mentioned this issue Apr 5, 2024

upgrade roberts cross to 32x32 #600

Closed

j2kun mentioned this issue Apr 8, 2024

Upgrade roberts cross and gx_kernel to 64x64 #604

Merged

AlexanderViand-Intel mentioned this issue Apr 9, 2024

Port HECO's heuristic "Automatic SIMD-ification" and target slot selection to HEIR #475

Closed

j2kun mentioned this issue May 8, 2024

add e2e roberts cross example #676

Merged

j2kun mentioned this issue Jun 30, 2024

Write an end-to-end test for gx_kernel, try to optimize further #758

Open

j2kun closed this as completed Jun 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port remaining HECO evaluation artifacts and ensure they produce equivalently optimal results. #571

Port remaining HECO evaluation artifacts and ensure they produce equivalently optimal results. #571

j2kun commented Mar 27, 2024

j2kun commented Apr 2, 2024 •

edited

Loading

AlexanderViand-Intel commented Apr 4, 2024 •

edited

Loading

j2kun commented Apr 4, 2024

j2kun commented Apr 4, 2024

AlexanderViand-Intel commented Apr 8, 2024 •

edited

Loading

j2kun commented Apr 8, 2024 •

edited

Loading

AlexanderViand-Intel commented Apr 9, 2024 •

edited

Loading

j2kun commented Apr 18, 2024

j2kun commented Apr 26, 2024

j2kun commented May 8, 2024 •

edited

Loading

AlexanderViand-Intel commented May 13, 2024

AlexanderViand-Intel commented May 13, 2024

j2kun commented May 13, 2024

AlexanderViand-Intel commented May 13, 2024 •

edited

Loading

j2kun commented Jun 30, 2024

Port remaining HECO evaluation artifacts and ensure they produce equivalently optimal results. #571

Port remaining HECO evaluation artifacts and ensure they produce equivalently optimal results. #571

Comments

j2kun commented Mar 27, 2024

j2kun commented Apr 2, 2024 • edited Loading

AlexanderViand-Intel commented Apr 4, 2024 • edited Loading

j2kun commented Apr 4, 2024

j2kun commented Apr 4, 2024

AlexanderViand-Intel commented Apr 8, 2024 • edited Loading

j2kun commented Apr 8, 2024 • edited Loading

AlexanderViand-Intel commented Apr 9, 2024 • edited Loading

j2kun commented Apr 18, 2024

j2kun commented Apr 26, 2024

j2kun commented May 8, 2024 • edited Loading

AlexanderViand-Intel commented May 13, 2024

AlexanderViand-Intel commented May 13, 2024

j2kun commented May 13, 2024

AlexanderViand-Intel commented May 13, 2024 • edited Loading

j2kun commented Jun 30, 2024

j2kun commented Apr 2, 2024 •

edited

Loading

AlexanderViand-Intel commented Apr 4, 2024 •

edited

Loading

AlexanderViand-Intel commented Apr 8, 2024 •

edited

Loading

j2kun commented Apr 8, 2024 •

edited

Loading

AlexanderViand-Intel commented Apr 9, 2024 •

edited

Loading

j2kun commented May 8, 2024 •

edited

Loading

AlexanderViand-Intel commented May 13, 2024 •

edited

Loading