Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Co-authored-by: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>
  • Loading branch information
philschmid and JingyaHuang authored May 8, 2024
1 parent a406f81 commit 8c9fc10
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ The custom module can override the following methods:
## 🏎️ Deploy Models on AWS Inferentia2

The SageMaker Hugging Face Inference Toolkit provides support for deploying Hugging Face on AWS Inferentia2. To deploy a model on Inferentia2 you have 3 options:
* Provide an already compiled model with a `model.neuron` file as `HF_MODEL_ID`, .e.g. `optimum/tiny_random_bert_neuron`
* Provide `HF_MODEL_ID`, the model repo id on huggingface.co which contains the compiled model under `.neuron` format. e.g. `optimum/bge-base-en-v1.5-neuronx`
* Provide the `HF_OPTIMUM_BATCH_SIZE` and `HF_OPTIMUM_SEQUENCE_LENGTH` environment variables to compile the model on the fly, e.g. `HF_OPTIMUM_BATCH_SIZE=1 HF_OPTIMUM_SEQUENCE_LENGTH=128`
* Include `neuron` dictionary in the [config.json](https://huggingface.co/optimum/tiny_random_bert_neuron/blob/main/config.json) file in the model archive, e.g. `neuron: {"static_batch_size": 1, "static_sequence_length": 128}`

Expand Down

0 comments on commit 8c9fc10

Please sign in to comment.