-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with RunPod as an offline provider #1118
Comments
@TheBits |
@Bihan |
@TheBits B. For the case when user provides gpu_count and region filter. Eg: Note: I have observed that Runpod has changed its web layout. May be it might have changed api too. I am checking it. |
@peterschmidt85 Yes that is true. |
There is also option C – make our version of the static catalog that includes all the offers. |
@peterschmidt85 Yes we can do that, but what if catalog changes faster than trigger interval?. I want to share an idea to make Runpod online. Basically the idea is to follow the flow in which Runpod's web console works. Eg: Case A: User requests with gpu argument I can explore the cases and try implementation. |
Not sure I'm fond of this one TBH. |
@peterschmidt85 Getting datacenter information requires 8(no of datacenter) api calls and is taking 2s. If 2s is a acceptable performance, then I can make Runpod online. |
@Bihan But what about the option C I suggested above? |
@peterschmidt85 What if Runpod changes its catalog before the trigger happens? |
In my opinion, 2 seconds is not a significant lag.
@peterschmidt85 The number of offers with availability fluctuates frequently. At night, there were 207 offers. Right now, the number of offers between 185 and 189. |
This issue is stale because it has been open for 30 days with no activity. |
@peterschmidt85 The solution is to implement Runpod as as online provider. However to implement as an online provider, we require an API which returns all machine types across all data centers. Such API is not offered by Runpod. We do have a workaround to implement Runpod as an online provider, but the workaround comes with a performance issue. The performance issue is about the response time to get all the offers. It takes 2s to respond with all the offers. |
Currently dstack uses gpuhunt runpod catalog collected daily. It includes only the offers available at the time of catalog generation. Since runpod availability changes throughout the day, some offer may appear/disappear when user runs A potentially good and simple solution could be to start collecting the runpod catalog more frequently (e.g. every hour). Some offers might still be missing but it won't be critical. The specific interval is to be determined. Making runpod online provider is not an option at the moment. |
@r4victor This means we need to modify |
@Bihan, yeah, one of the possible solutions would be to separate backend catalogs. This will require refactoring of gpuhunt and also means introducing a new catalog version (v2) since the catalogs will be stored differently. We can also trigger Collect and publish catalogs workflow more frequently for all providers (e.g. every hour). @peterschmidt85, this solution won't cost us much and I'd recommend it since it's trivial to start with. |
Agree! |
Once the RunPod was added, we observed a few of issues with its functionality.
There are two issues in which the integrity tests are fixed according to the current availability (dstackai/gpuhunt#56, dstackai/gpuhunt#58).
The text was updated successfully, but these errors were encountered: