From d63b862139dd3da85764730c688f025929cdae39 Mon Sep 17 00:00:00 2001 From: Manfei <41607353+ManfeiBai@users.noreply.github.com> Date: Sun, 5 Nov 2023 23:54:22 -0800 Subject: [PATCH] Update TORCH_XLA_USER_GUIDE.md --- TORCH_XLA_USER_GUIDE.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/TORCH_XLA_USER_GUIDE.md b/TORCH_XLA_USER_GUIDE.md index 10c561f85..55d156285 100644 --- a/TORCH_XLA_USER_GUIDE.md +++ b/TORCH_XLA_USER_GUIDE.md @@ -91,11 +91,11 @@ gcloud compute tpus tpu-vm scp params_70b.json ${TPU_NAME}:params.json --zone ${ gcloud compute tpus tpu-vm ssh ${TPU_NAME} --zone ${ZONE} --project ${PROJECT_ID} --worker=all --command="cd $HOME/llama && PJRT_DEVICE=TPU XLA_FLAGS=--xla_dump_to=/tmp/dir_name PROFILE_LOGDIR=/tmp/home/ python3.8 example_text_completion.py --ckpt_dir . --tokenizer_path $HOME/llama/t5_tokenizer/spiece.model --max_seq_len 2048 --max_gen_len 1000 --max_batch_size 2 --mp True --dynamo True" ``` -## Commands to Run Llama2 using XLA:GPU (e.g. L4 or H100) +## Commands to Run Llama2 using XLA:GPU (e.g. L4 or H100) without Quantization -`example_text_completion.py` can also be ran on GPUs with XLA:GPU. To do that, you need different wheels than the above such +`example_text_completion.py` can also be ran on GPUs with XLA:GPU without quantization. To do that, you need different wheels than the above such that you have XLA:GPU support. Please refer to [pytorch/xla](https://github.com/pytorch/xla#wheel) repo to download -a suitable GPU nightly wheel for your environment. +a suitable GPU nightly at 2023/04/22 wheel for your environment. After that, you can run the following the command: ```