Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request TPU v4 support #247

Open
rivershah opened this issue Oct 20, 2022 · 3 comments
Open

Feature request TPU v4 support #247

rivershah opened this issue Oct 20, 2022 · 3 comments

Comments

@rivershah
Copy link

With tpu v4, google has really cleaned up the user experience around tpu vms. Does the google-cls-v2 provider allow provisioning of tpu v4 machines? If so, is there any example that can please be shown that illustrates provisioning, and loading up any drivers to make the tpu v4 accelerator types visible to jobs submitted via dsub.

@wnojopra
Copy link
Contributor

Thanks @rivershah ! This is something I'll be looking into more but looking at some of the TPU documentation, it sounds like something we should be able to support - but possibly there might be some Life Sciences API changes needed.

Would you mind clarifying what you mean by how google "cleaned up" the experience? I'd be interested to hear about your experience. Is there any specific documentation you follow for your work with TPUs?

@rivershah
Copy link
Author

@wnojopra Please take a look here: https://www.youtube.com/watch?v=W7A-9MYvPwI&t=301s

Now tpus follow the same provisioning model as gpus. Root access to host vm with the accelerators on the host. I am not sure how relevant this provisioning model refactor is as far as using the Life Sciences AP, but seems like a unification with existing gpu provisioning model which works very nicely with dsub

@wnojopra
Copy link
Contributor

wnojopra commented Oct 24, 2022

Thanks for sharing. Wanted to highlight an important bit from the video for others to read:

"""
In the past, for using TPUs on Cloud, a network attached architecture was used. The user would connect to a VM and then interact with the TPUs through GRPC calls. This was difficult to debug and sometimes introduced delays in the experience. With all new TPU VM Architecture, you have root access to every TPU VM you create. So you can install and run any software you wish in a tight loop with your TPU accelerators. You can use local storage, execute custom code in your input pipelines, and more easily integrate Cloud TPUs into your research and production workflows.
"""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants