Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepull ray images to all cluster nodes #739

Merged
merged 2 commits into from
Jul 6, 2023

Conversation

psschwei
Copy link
Collaborator

@psschwei psschwei commented Jul 5, 2023

Summary

Fixes #728
Add a daemonset to prepull the ray images to all nodes in the cluster.

Details and comments

A daemonset ensures that all nodes run a copy of a pod. Our daemonset is set up using the ray node (using the value from the gateway ray image key) as an init container and the google "pause" container as the main one (the "pause" container more or less just sleeps and has minimal footprint). By using a daemonset, we ensure that anytime a new node is added to the cluster, the ray image will be pulled to it.

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
@akihikokuroda
Copy link
Collaborator

It may be too much but I wonder if we want to enable node selector or node affinity in the values.yaml file in some environment.

@psschwei
Copy link
Collaborator Author

psschwei commented Jul 5, 2023

I'd think I'd lean towards too much for now, unless there's a specific use case that needs it.

Copy link
Collaborator

@akihikokuroda akihikokuroda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@psschwei psschwei merged commit e2965a1 into Qiskit:main Jul 6, 2023
@psschwei psschwei deleted the prepull-ray-node branch July 6, 2023 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Infrastructure: ray cluster image caching
2 participants