-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible issue in launching large number of instances and then destroying the cluster #341
Comments
Are you able to reproduce this issue consistently, or is it intermittent? Is there something special about requesting 25 instances, or do you also see the issue with smaller clusters? |
I tried to reproduce this issue by using the same command 4 more times. However, this did not happen again. I did encounter issue #340 one of the time. As for the number 25, there is nothing special other than I was trying to reproduce issue #340 with more than 20 nodes, as it was mentioned there that launching more than 20 nodes can trigger that issue. I did not encounter any issues with small clusters. |
Strange. If this happens to you again and you get some more detail on what the cause is, please post it here. From your original report, I can see that the initial problem is this:
I'm guessing that happened only once because the cause is related to AWS resource consistency. Maybe Flintrock deleted something that That's just a guess. But if that was the issue, then I suppose a general solution would be to adjust some boto settings to retry operations more times before failing (though I don't think those settings apply to |
|
Thanks for the report! |
When launching a 25 slave t2.micro spark-cluster to check whether issue #340 also happens in our case, I faced a different problem. Even though I mentioned the number of slaves to be 24, flintrock launched a total of 23 nodes at first. When I tried to destroy the cluster immediately, it encountered an error. And subsequent calls to destroy showed that the cluster had 2 running nodes and faced errors in destroying the cluster. I terminated the instances from the AWS console, after which flintrock destroy did not throw an error.
Note my current running instance vCPU limit is 32, however, that should not be an issue as the vCPU of the launched cluster was 25.
log
The text was updated successfully, but these errors were encountered: