-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: RayServe with vLLM using AWS Neuron on Amazon EKS #607
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! I think it's going to be really helpful. I left some comments, hopefully they make sense.
data: | ||
hf-token: $HUGGING_FACE_HUB_TOKEN | ||
--- | ||
apiVersion: ray.io/v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to be a little more consistent with how we set up the resource configuration. We have 3 max replicas. We should be consistent with deciding if we want to scale nodes or if we want to scale actors. If we want to allow for intra node scaling, I believe you want to set the
neuron_codes = (6 * 2) / 3
or
(NUM_NEURON_DEVICES * 2) / NUM_REPLICAS_PER_NODE
We also want to make sure we set the num_cpus = total_node_cpus / num_replicas
If you want to use only full node scaling, we should max out the single node replica by setting the denominator to 1
gnupg2 \ | ||
&& sudo rm -rf /var/lib/apt/lists/* | ||
|
||
RUN sudo wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB > ./GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can be grouped into one layer by having only 1 RUN
command and using && \
between all of the commands, like:
RUN sudo wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB > ./GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB && \
sudo gpg --no-default-keyring --keyring ./aws_neuron_keyring.gpg --import ./GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB && \
sudo gpg --no-default-keyring --keyring ./aws_neuron_keyring.gpg --export > ./.etc/apt/trusted.gpg.d/aws_neuron.gpg && \
sudo rm ./GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB
RUN sudo mv ./aws_neuron.gpg /etc/apt/trusted.gpg.d/ | ||
RUN sudo rm ./GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | ||
|
||
RUN sudo add-apt-repository -y "deb https://apt.repos.neuron.amazonaws.com jammy main" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also group all the apt commands together and follow it up with rm -rf /var/lib/apt/lists/*
RUN sudo apt-get -y install aws-neuronx-runtime-lib=2.* | ||
RUN sudo apt-get -y install aws-neuronx-tools=2.* | ||
|
||
RUN pip3 config set global.extra-index-url https://pip.repos.neuron.amazonaws.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please group all the pip installs
ENV VLLM_TARGET_DEVICE=neuron | ||
RUN git clone https://github.com/vllm-project/vllm.git | ||
RUN cd vllm && git checkout v0.5.0 | ||
COPY patches/vllm_v0.5.0_neuron.patch vllm/vllm_v0.5.0_neuron.patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put the copy ahead of the RUN
, you can chain all the RUN
s together
schedulerName: my-scheduler # Correct placement | ||
containers: | ||
- name: worker | ||
image: public.ecr.aws/data-on-eks/vllm-ray-neuron-mistral7b:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ratnopamc Update this image with new one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will create a new ecr repo and image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vara-bonthu , Pushed the below changes -
- Fixed Indentation, whitespacing and formatting of the neuron patch file.
- Created a public repository
vllm-ray2.32.0-inf2-llama3
under data-on-eks and pushed imagevllm-ray2.32.0-inf2-llama3
built with the above patch. - Updated deployment.yaml to include openai install command.
- Tested autoscaling across nodes, works fine!
Please review.
Next Steps once this PR is merged 1/ Add HF Token to deployment yaml and config map serving script to handle gated models |
rm -rf /var/lib/apt/lists/* && \ | ||
wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | gpg --no-default-keyring --keyring ./aws_neuron_keyring.gpg --import && \ | ||
gpg --no-default-keyring --keyring ./aws_neuron_keyring.gpg --export > /etc/apt/trusted.gpg.d/aws_neuron.gpg && \ | ||
add-apt-repository -y "deb https://apt.repos.neuron.amazonaws.com jammy main" && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd want to separate this into another layer to make it easier to update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ratnopamc Could you please update this in your second PR along with Website doc?
* fix: bump data on eks addons to 1.33 to support karpenter helm resources with bottlerocket * feat: RayServe with vLLM using AWS Neuron on Amazon EKS (awslabs#607) Co-authored-by: Vara Bonthu <vara.bonthu@gmail.com> * feat: Mountpoint S3 for loading additional Spark Jars (awslabs#606) Co-authored-by: Karanbir Bains <bainskb@amazon.com> * fixes for pre-commit * fix pre-commit on the merged main * chore: Delete ai-ml/kubeflow directory (awslabs#619) * feat: Updated mountpoint-s3 for spark readme (awslabs#618) Co-authored-by: Karanbir Bains <bainskb@amazon.com> * feat: Trainium blueprint upgrade (awslabs#622) * feat: Neuron scheduler update for trainium-inferentia blueprints (awslabs#624) * feat: Website Updates (awslabs#626) * feat: Updates to the sidebar (awslabs#627) * feat: Added deprecating notes; added Jark stack doc;added warnings for ML p… (awslabs#628) * feat: NVIDIA NIM Updates (awslabs#631) * feat: Udate NVIDIA NIM blueprint with grafana dashboard and docs (awslabs#633) * feat: Add OpenWebUI for vllm-rayserve-inf2 blueprint (awslabs#635) --------- Co-authored-by: Ratnopam Charabarti <ratnopamc@yahoo.com> Co-authored-by: Vara Bonthu <vara.bonthu@gmail.com> Co-authored-by: Karanbir Bains <166257900+bainskb@users.noreply.github.com> Co-authored-by: Karanbir Bains <bainskb@amazon.com> Co-authored-by: Apoorva Kulkarni <kuapoorv@amazon.com>
* fix: bump data on eks addons to 1.33 to support karpenter helm resources with bottlerocket * feat: RayServe with vLLM using AWS Neuron on Amazon EKS (awslabs#607) Co-authored-by: Vara Bonthu <vara.bonthu@gmail.com> * feat: Mountpoint S3 for loading additional Spark Jars (awslabs#606) Co-authored-by: Karanbir Bains <bainskb@amazon.com> * fixes for pre-commit * fix pre-commit on the merged main * chore: Delete ai-ml/kubeflow directory (awslabs#619) * feat: Updated mountpoint-s3 for spark readme (awslabs#618) Co-authored-by: Karanbir Bains <bainskb@amazon.com> * feat: Trainium blueprint upgrade (awslabs#622) * feat: Neuron scheduler update for trainium-inferentia blueprints (awslabs#624) * feat: Website Updates (awslabs#626) * feat: Updates to the sidebar (awslabs#627) * feat: Added deprecating notes; added Jark stack doc;added warnings for ML p… (awslabs#628) * feat: NVIDIA NIM Updates (awslabs#631) * feat: Udate NVIDIA NIM blueprint with grafana dashboard and docs (awslabs#633) * feat: Add OpenWebUI for vllm-rayserve-inf2 blueprint (awslabs#635) --------- Co-authored-by: Ratnopam Charabarti <ratnopamc@yahoo.com> Co-authored-by: Vara Bonthu <vara.bonthu@gmail.com> Co-authored-by: Karanbir Bains <166257900+bainskb@users.noreply.github.com> Co-authored-by: Karanbir Bains <bainskb@amazon.com> Co-authored-by: Apoorva Kulkarni <kuapoorv@amazon.com>
Co-authored-by: Vara Bonthu <vara.bonthu@gmail.com>
What does this PR do?
Adds capability to deploy LLMs for inference on AWS Inferentia with ray and vLLM
🛑 Please open an issue first to discuss any significant work and flesh out details/direction - we would hate for your time to be wasted.
Consult the CONTRIBUTING guide for submitting pull-requests.
Motivation
#591
More
website/docs
orwebsite/blog
section for this featurepre-commit run -a
with this PR. Link for installing pre-commit locallyFor Moderators
Additional Notes