Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example for Llama2 on Inf2 #2458
Example for Llama2 on Inf2 #2458
Changes from 25 commits
2838ad0
73a0ba4
f5c0855
5c72745
0d06004
c5720c5
89e4ffb
483004b
83dc576
7920a96
a0c199c
3e0697b
2993154
98e2a94
f50ac63
9af1611
b0392cc
5ac4696
3da4a78
b747cd3
7c1b130
80eb640
e55fd86
36b0d96
5176dce
a983a01
cfaf385
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to a have unit test for the handler. You can mock out inferentia and model related parts. This example shows how to mock the context etc
serve/test/pytest/test_example_scriptable_tokenzier.py
Line 222 in f1c22a5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@namannandan I am wondering if compilation can be done a head of time and we just load the compiled graphs here the way it was working for inf1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the _save_compiled_artifacts . It is able to generate a neuron model. However, the transformers_neuronx still needs to recompile. I already let Neuron team know they need more work on the experimental feature _save_compiled_artifacts.