Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: RayServe with vLLM using AWS Neuron on Amazon EKS #607

Merged
merged 6 commits into from
Aug 20, 2024

Conversation

ratnopamc
Copy link
Collaborator

What does this PR do?

Adds capability to deploy LLMs for inference on AWS Inferentia with ray and vLLM
🛑 Please open an issue first to discuss any significant work and flesh out details/direction - we would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.

Motivation

#591

More

  • [] Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
  • [] Mandatory for new blueprints. Yes, I have added a example to support my blueprint PR
  • Mandatory for new blueprints. Yes, I have updated the website/docs or website/blog section for this feature
  • Yes, I ran pre-commit run -a with this PR. Link for installing pre-commit locally

For Moderators

  • E2E Test successfully complete before merge?

Additional Notes

@ratnopamc ratnopamc requested a review from vara-bonthu August 8, 2024 12:29
Copy link
Contributor

@omrishiv omrishiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! I think it's going to be really helpful. I left some comments, hopefully they make sense.

data:
hf-token: $HUGGING_FACE_HUB_TOKEN
---
apiVersion: ray.io/v1
Copy link
Contributor

@omrishiv omrishiv Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to be a little more consistent with how we set up the resource configuration. We have 3 max replicas. We should be consistent with deciding if we want to scale nodes or if we want to scale actors. If we want to allow for intra node scaling, I believe you want to set the

neuron_codes = (6 * 2) / 3

or

(NUM_NEURON_DEVICES * 2) / NUM_REPLICAS_PER_NODE

We also want to make sure we set the num_cpus = total_node_cpus / num_replicas

If you want to use only full node scaling, we should max out the single node replica by setting the denominator to 1

gen-ai/inference/vllm-rayserve-inf2/Dockerfile Outdated Show resolved Hide resolved
gen-ai/inference/vllm-rayserve-inf2/vllm_asyncllmengine.py Outdated Show resolved Hide resolved
gen-ai/inference/vllm-rayserve-inf2/Dockerfile Outdated Show resolved Hide resolved
gnupg2 \
&& sudo rm -rf /var/lib/apt/lists/*

RUN sudo wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB > ./GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be grouped into one layer by having only 1 RUN command and using && \ between all of the commands, like:

RUN sudo wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB > ./GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB && \
sudo gpg --no-default-keyring --keyring ./aws_neuron_keyring.gpg --import ./GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB && \
sudo gpg --no-default-keyring --keyring ./aws_neuron_keyring.gpg --export > ./.etc/apt/trusted.gpg.d/aws_neuron.gpg && \
sudo rm ./GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB

RUN sudo mv ./aws_neuron.gpg /etc/apt/trusted.gpg.d/
RUN sudo rm ./GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB

RUN sudo add-apt-repository -y "deb https://apt.repos.neuron.amazonaws.com jammy main"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also group all the apt commands together and follow it up with rm -rf /var/lib/apt/lists/*

RUN sudo apt-get -y install aws-neuronx-runtime-lib=2.*
RUN sudo apt-get -y install aws-neuronx-tools=2.*

RUN pip3 config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please group all the pip installs

ENV VLLM_TARGET_DEVICE=neuron
RUN git clone https://github.com/vllm-project/vllm.git
RUN cd vllm && git checkout v0.5.0
COPY patches/vllm_v0.5.0_neuron.patch vllm/vllm_v0.5.0_neuron.patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you put the copy ahead of the RUN, you can chain all the RUNs together

schedulerName: my-scheduler # Correct placement
containers:
- name: worker
image: public.ecr.aws/data-on-eks/vllm-ray-neuron-mistral7b:latest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ratnopamc Update this image with new one

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will create a new ecr repo and image.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vara-bonthu , Pushed the below changes -

  • Fixed Indentation, whitespacing and formatting of the neuron patch file.
  • Created a public repository vllm-ray2.32.0-inf2-llama3 under data-on-eks and pushed image vllm-ray2.32.0-inf2-llama3 built with the above patch.
  • Updated deployment.yaml to include openai install command.
  • Tested autoscaling across nodes, works fine!

Please review.

@vara-bonthu vara-bonthu changed the title feat: Add blueprint for using rayserve with vLLM on Inferentia2 feat: vLLM and RayServe with Neuron on Amazon EKS Aug 20, 2024
@vara-bonthu
Copy link
Collaborator

Next Steps once this PR is merged

1/ Add HF Token to deployment yaml and config map serving script to handle gated models
2/ Website Doc for the deployment

@ratnopamc ratnopamc changed the title feat: vLLM and RayServe with Neuron on Amazon EKS feat: RayServe with vLLM using AWS Neuron on Amazon EKS Aug 20, 2024
@vara-bonthu vara-bonthu merged commit 1f682cd into awslabs:main Aug 20, 2024
37 of 38 checks passed
rm -rf /var/lib/apt/lists/* && \
wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | gpg --no-default-keyring --keyring ./aws_neuron_keyring.gpg --import && \
gpg --no-default-keyring --keyring ./aws_neuron_keyring.gpg --export > /etc/apt/trusted.gpg.d/aws_neuron.gpg && \
add-apt-repository -y "deb https://apt.repos.neuron.amazonaws.com jammy main" && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd want to separate this into another layer to make it easier to update

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ratnopamc Could you please update this in your second PR along with Website doc?

@ratnopamc ratnopamc deleted the ray-vllm-inf2-updates branch September 3, 2024 22:30
lindarr915 added a commit to lindarr915/data-on-eks that referenced this pull request Sep 4, 2024
* fix: bump data on eks addons to 1.33 to support karpenter helm resources with bottlerocket

* feat: RayServe with vLLM using AWS Neuron on Amazon EKS (awslabs#607)

Co-authored-by: Vara Bonthu <vara.bonthu@gmail.com>

* feat: Mountpoint S3 for loading additional Spark Jars (awslabs#606)

Co-authored-by: Karanbir Bains <bainskb@amazon.com>

* fixes for pre-commit

* fix pre-commit on the merged main

* chore: Delete ai-ml/kubeflow directory (awslabs#619)

* feat: Updated mountpoint-s3 for spark readme (awslabs#618)

Co-authored-by: Karanbir Bains <bainskb@amazon.com>

* feat: Trainium blueprint upgrade (awslabs#622)

* feat: Neuron scheduler update for trainium-inferentia blueprints (awslabs#624)

* feat: Website Updates (awslabs#626)

* feat: Updates to the sidebar (awslabs#627)

* feat: Added deprecating notes; added Jark stack doc;added warnings for ML p… (awslabs#628)

* feat: NVIDIA NIM Updates (awslabs#631)

* feat: Udate NVIDIA NIM blueprint with grafana dashboard and docs (awslabs#633)

* feat: Add OpenWebUI for vllm-rayserve-inf2 blueprint (awslabs#635)

---------

Co-authored-by: Ratnopam Charabarti <ratnopamc@yahoo.com>
Co-authored-by: Vara Bonthu <vara.bonthu@gmail.com>
Co-authored-by: Karanbir Bains <166257900+bainskb@users.noreply.github.com>
Co-authored-by: Karanbir Bains <bainskb@amazon.com>
Co-authored-by: Apoorva Kulkarni <kuapoorv@amazon.com>
lindarr915 added a commit to lindarr915/data-on-eks that referenced this pull request Sep 4, 2024
* fix: bump data on eks addons to 1.33 to support karpenter helm resources with bottlerocket

* feat: RayServe with vLLM using AWS Neuron on Amazon EKS (awslabs#607)

Co-authored-by: Vara Bonthu <vara.bonthu@gmail.com>

* feat: Mountpoint S3 for loading additional Spark Jars (awslabs#606)

Co-authored-by: Karanbir Bains <bainskb@amazon.com>

* fixes for pre-commit

* fix pre-commit on the merged main

* chore: Delete ai-ml/kubeflow directory (awslabs#619)

* feat: Updated mountpoint-s3 for spark readme (awslabs#618)

Co-authored-by: Karanbir Bains <bainskb@amazon.com>

* feat: Trainium blueprint upgrade (awslabs#622)

* feat: Neuron scheduler update for trainium-inferentia blueprints (awslabs#624)

* feat: Website Updates (awslabs#626)

* feat: Updates to the sidebar (awslabs#627)

* feat: Added deprecating notes; added Jark stack doc;added warnings for ML p… (awslabs#628)

* feat: NVIDIA NIM Updates (awslabs#631)

* feat: Udate NVIDIA NIM blueprint with grafana dashboard and docs (awslabs#633)

* feat: Add OpenWebUI for vllm-rayserve-inf2 blueprint (awslabs#635)

---------

Co-authored-by: Ratnopam Charabarti <ratnopamc@yahoo.com>
Co-authored-by: Vara Bonthu <vara.bonthu@gmail.com>
Co-authored-by: Karanbir Bains <166257900+bainskb@users.noreply.github.com>
Co-authored-by: Karanbir Bains <bainskb@amazon.com>
Co-authored-by: Apoorva Kulkarni <kuapoorv@amazon.com>
lindarr915 pushed a commit to lindarr915/data-on-eks that referenced this pull request Sep 6, 2024
Co-authored-by: Vara Bonthu <vara.bonthu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants