Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test that exports to MLIR a small sharded Llama model #220

Merged
merged 2 commits into from
Sep 27, 2024

Conversation

sogartar
Copy link
Contributor

@sogartar sogartar commented Sep 26, 2024

Add test that exports to MLIR a small sharded Llama model

The decode step requires exporting in non-strict torch mode due to
pytorch/pytorch#135061

This export required to extend the registration functionality of our custom
tensor types by provinding flatten_with_keys_fn. This is also required
to bump the PyTorch version to >=2.4 for other export tests.

The export to MLIR fails with
TypeError: Unsupported torch type conversion for !torch.vtensor<[3,1,7],complex>
Needs further debugging.
Detailed error here.

@sogartar sogartar force-pushed the sharded-llama-iree-export-test branch from 5fc031d to e8011ec Compare September 26, 2024 18:16
@sogartar sogartar changed the title WIP Add test that exports to IREE a small sharded Llama model Add test that exports to IREE a small sharded Llama model Sep 26, 2024
@sogartar sogartar marked this pull request as ready for review September 26, 2024 18:18
@sogartar sogartar marked this pull request as draft September 26, 2024 18:19
@sogartar sogartar force-pushed the sharded-llama-iree-export-test branch from e8011ec to 7931d44 Compare September 26, 2024 20:34
@sogartar sogartar marked this pull request as ready for review September 26, 2024 20:34
@sogartar sogartar changed the title Add test that exports to IREE a small sharded Llama model Add test that exports to MLIR a small sharded Llama model Sep 26, 2024
@sogartar sogartar force-pushed the sharded-llama-iree-export-test branch from 7931d44 to 375a436 Compare September 26, 2024 20:35
@rsuderman
Copy link
Contributor

rsuderman commented Sep 26, 2024

Could you include a link to the export failure in a gist? We should be able to know why the export is failing. There is a good chance we are successfully generating IR just failing during the conversion from torch to linalg (this would match the issue that Kyle saw).

torch.testing.assert_close(
sharded_prefill_result, expected_prefill_result, atol=1e-3, rtol=1e-2
def make_decode_args(self, model: PagedLlamaModelV1) -> Dict[str, Any]:
seq_lens = torch.randint(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be specific in random testing where possible. If we want to check variable sequence length we should pick a set of numbers instead of randomizing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are setting the random seed at the test setup stage, so it should be consistent between runs. Isn't the torch random generator going to be stable between releases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed them to fixed values.

high=vocabulary_size,
size=[batch_size, 1],
high=self.vocabulary_size,
size=[self.batch_size, 1],
dtype=torch.int32,
)
decode_seq_lens = torch.randint(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same about above. We would expect the decode to be one more than the sequence length of the encode. We should maintain that behavior.

Copy link
Contributor Author

@sogartar sogartar Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the sequence length I think that is the case, but the cache is being regenerated again. In this test it really does not matter if we get plausible numbers, just that the numbers are close. I can make it behave more like in the "real" world.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made them prefill_seq_lens + 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we have a sharded attention block test, lets keep both around and renamed the original one to sharded_attention_test.py. It would be better to increase test coverage rather than replace. We can always delete it in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had the name wrong on my previous PR. This test checks the whole model. Not just the attention. I just refactored a lot to not replicate code in the export test.

@sogartar sogartar requested a review from rsuderman September 26, 2024 21:50
@sogartar
Copy link
Contributor Author

Could you include a link to the export failure in a gist? We should be able to know why the export is failing. There is a good chance we are successfully generating IR just failing during the conversion from torch to linalg (this would match the issue that Kyle saw).

The test calls just shark_turbine.aot.exporter.export, which I think does not compile with IREE. It returns an object that can then compile the IR. Are there conversion to linalg before that?

@sogartar
Copy link
Contributor Author

@rsuderman is this your PR llvm/torch-mlir#3738 that may fix the type conversion issue?

TypeError: Unsupported torch type conversion for !torch.vtensor<[3,1,7],complex>

@sogartar
Copy link
Contributor Author

sogartar commented Sep 27, 2024

Could you include a link to the export failure in a gist?

I added a link with full error in the description.

@sogartar
Copy link
Contributor Author

I tried with the changes from llvm/torch-mlir#3738, but it does not solve the type conversion issue.

The decode step requires exporting in non-strict torch mode due to
pytorch/pytorch#135061

This export required to extend the registration functionality of our custom
tensor types by provinding `flatten_with_keys_fn`. This is also required
to bump the PyTorch version to >=2.4 for other export tests.

The export to MLIR fails with
TypeError: Unsupported torch type conversion for !torch.vtensor<[3,1,7],complex<f32>>
Needs further debugging.
@sogartar sogartar force-pushed the sharded-llama-iree-export-test branch from eb925e5 to 9f09fee Compare September 27, 2024 17:37
@sogartar sogartar enabled auto-merge (squash) September 27, 2024 17:38
@sogartar sogartar merged commit fca29f4 into nod-ai:main Sep 27, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants