You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be great to add something that can sanity check the bot configuration. For example, when creating a cluster with Magic Castle it would be quite easy to attach the wrong type of instance to a partition name (and it would not be easy to notice, the Zen2 nodes are called Standard_HB120-16rs_v2 and the Zen4 nodes are Standard_HB176-24rs_v4 so no obvious hints in the names).
I can see how that would be a bit tricky though, given how you define arch_target_map. Perhaps a sanity check command can be part of that definition? Or maybe allow them to define a sanity check script that if run via srun with arch_target_map dict value options should return the corresponding key.
It would be great to add something that can sanity check the bot configuration. For example, when creating a cluster with Magic Castle it would be quite easy to attach the wrong type of instance to a partition name (and it would not be easy to notice, the Zen2 nodes are called
Standard_HB120-16rs_v2
and the Zen4 nodes areStandard_HB176-24rs_v4
so no obvious hints in the names).One way to verify the architecture is
This would also check that the bot can successfully submit jobs.
You'd also want to verify that the bot can talk to the target repository with the correct permissions.
The text was updated successfully, but these errors were encountered: