Add test that exports to MLIR a small sharded Llama model #220

sogartar · 2024-09-26T16:17:34Z

Add test that exports to MLIR a small sharded Llama model

The decode step requires exporting in non-strict torch mode due to
pytorch/pytorch#135061

This export required to extend the registration functionality of our custom
tensor types by provinding flatten_with_keys_fn. This is also required
to bump the PyTorch version to >=2.4 for other export tests.

The export to MLIR fails with
TypeError: Unsupported torch type conversion for !torch.vtensor<[3,1,7],complex>
Needs further debugging.
Detailed error here.

rsuderman · 2024-09-26T20:46:48Z

Could you include a link to the export failure in a gist? We should be able to know why the export is failing. There is a good chance we are successfully generating IR just failing during the conversion from torch to linalg (this would match the issue that Kyle saw).

rsuderman · 2024-09-26T20:43:22Z

sharktank/tests/models/llama/sharded_llama_test.py

-        torch.testing.assert_close(
-            sharded_prefill_result, expected_prefill_result, atol=1e-3, rtol=1e-2
+    def make_decode_args(self, model: PagedLlamaModelV1) -> Dict[str, Any]:
+        seq_lens = torch.randint(


Be specific in random testing where possible. If we want to check variable sequence length we should pick a set of numbers instead of randomizing.

We are setting the random seed at the test setup stage, so it should be consistent between runs. Isn't the torch random generator going to be stable between releases?

I changed them to fixed values.

rsuderman · 2024-09-26T20:44:12Z

sharktank/tests/models/llama/sharded_llama_test.py

-            high=vocabulary_size,
-            size=[batch_size, 1],
+            high=self.vocabulary_size,
+            size=[self.batch_size, 1],
            dtype=torch.int32,
        )
        decode_seq_lens = torch.randint(


Same about above. We would expect the decode to be one more than the sequence length of the encode. We should maintain that behavior.

About the sequence length I think that is the case, but the cache is being regenerated again. In this test it really does not matter if we get plausible numbers, just that the numbers are close. I can make it behave more like in the "real" world.

I made them prefill_seq_lens + 1.

rsuderman · 2024-09-26T20:51:23Z

sharktank/tests/models/llama/sharded_llama_test.py

Given we have a sharded attention block test, lets keep both around and renamed the original one to sharded_attention_test.py. It would be better to increase test coverage rather than replace. We can always delete it in the future.

I actually had the name wrong on my previous PR. This test checks the whole model. Not just the attention. I just refactored a lot to not replicate code in the export test.

sogartar · 2024-09-26T22:15:13Z

Could you include a link to the export failure in a gist? We should be able to know why the export is failing. There is a good chance we are successfully generating IR just failing during the conversion from torch to linalg (this would match the issue that Kyle saw).

The test calls just shark_turbine.aot.exporter.export, which I think does not compile with IREE. It returns an object that can then compile the IR. Are there conversion to linalg before that?

sogartar · 2024-09-27T11:17:19Z

@rsuderman is this your PR llvm/torch-mlir#3738 that may fix the type conversion issue?

TypeError: Unsupported torch type conversion for !torch.vtensor<[3,1,7],complex>

sogartar · 2024-09-27T11:56:27Z

Could you include a link to the export failure in a gist?

I added a link with full error in the description.

sogartar · 2024-09-27T15:16:18Z

I tried with the changes from llvm/torch-mlir#3738, but it does not solve the type conversion issue.

The decode step requires exporting in non-strict torch mode due to pytorch/pytorch#135061 This export required to extend the registration functionality of our custom tensor types by provinding `flatten_with_keys_fn`. This is also required to bump the PyTorch version to >=2.4 for other export tests. The export to MLIR fails with TypeError: Unsupported torch type conversion for !torch.vtensor<[3,1,7],complex<f32>> Needs further debugging.

sogartar force-pushed the sharded-llama-iree-export-test branch from 5fc031d to e8011ec Compare September 26, 2024 18:16

sogartar changed the title ~~WIP Add test that exports to IREE a small sharded Llama model~~ Add test that exports to IREE a small sharded Llama model Sep 26, 2024

sogartar requested review from rsuderman and IanNod September 26, 2024 18:18

sogartar marked this pull request as ready for review September 26, 2024 18:18

sogartar marked this pull request as draft September 26, 2024 18:19

sogartar force-pushed the sharded-llama-iree-export-test branch from e8011ec to 7931d44 Compare September 26, 2024 20:34

sogartar marked this pull request as ready for review September 26, 2024 20:34

sogartar changed the title ~~Add test that exports to IREE a small sharded Llama model~~ Add test that exports to MLIR a small sharded Llama model Sep 26, 2024

sogartar force-pushed the sharded-llama-iree-export-test branch from 7931d44 to 375a436 Compare September 26, 2024 20:35

rsuderman requested changes Sep 26, 2024

View reviewed changes

sogartar requested a review from rsuderman September 26, 2024 21:50

sogartar mentioned this pull request Sep 27, 2024

Torch type conversion to native type does not support complex numbers iree-org/iree-turbine#174

Closed

rsuderman approved these changes Sep 27, 2024

View reviewed changes

sogartar added 2 commits September 27, 2024 13:37

Address Rob's PR comments

9f09fee

sogartar force-pushed the sharded-llama-iree-export-test branch from eb925e5 to 9f09fee Compare September 27, 2024 17:37

sogartar enabled auto-merge (squash) September 27, 2024 17:38

sogartar merged commit fca29f4 into nod-ai:main Sep 27, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test that exports to MLIR a small sharded Llama model #220

Add test that exports to MLIR a small sharded Llama model #220

sogartar commented Sep 26, 2024 •

edited

Loading

rsuderman commented Sep 26, 2024 •

edited

Loading

rsuderman Sep 26, 2024

sogartar Sep 26, 2024

sogartar Sep 27, 2024

rsuderman Sep 26, 2024

sogartar Sep 26, 2024 •

edited

Loading

sogartar Sep 27, 2024

rsuderman Sep 26, 2024

sogartar Sep 26, 2024

sogartar commented Sep 26, 2024

sogartar commented Sep 27, 2024

sogartar commented Sep 27, 2024 •

edited

Loading

sogartar commented Sep 27, 2024

Add test that exports to MLIR a small sharded Llama model #220

Add test that exports to MLIR a small sharded Llama model #220

Conversation

sogartar commented Sep 26, 2024 • edited Loading

rsuderman commented Sep 26, 2024 • edited Loading

rsuderman Sep 26, 2024

Choose a reason for hiding this comment

sogartar Sep 26, 2024

Choose a reason for hiding this comment

sogartar Sep 27, 2024

Choose a reason for hiding this comment

rsuderman Sep 26, 2024

Choose a reason for hiding this comment

sogartar Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

sogartar Sep 27, 2024

Choose a reason for hiding this comment

rsuderman Sep 26, 2024

Choose a reason for hiding this comment

sogartar Sep 26, 2024

Choose a reason for hiding this comment

sogartar commented Sep 26, 2024

sogartar commented Sep 27, 2024

sogartar commented Sep 27, 2024 • edited Loading

sogartar commented Sep 27, 2024

sogartar commented Sep 26, 2024 •

edited

Loading

rsuderman commented Sep 26, 2024 •

edited

Loading

sogartar Sep 26, 2024 •

edited

Loading

sogartar commented Sep 27, 2024 •

edited

Loading