-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storm framework consumes all mesos resources #50
Comments
This is correct, what you see |
@dlaidlaw : yeaah.... so what you're witnessing is one of the biggest problems with Storm's mesos Scheduler: it hoards resources. It does release those resources back to the cluster on a rotating basis, but it still holds onto resources for awhile. The reason it needs to hoard resources at all is that "normally" Storm's scheduling model expects that resources are statically present (as "slots" into which it can assign Storm worker processes). However, when Storm runs in mesos there are no preexisting slots. Instead we need to fabricate slots out of the resource offers, using each topology's configured resource demands (CPU, mem) to size the "slots". I've worked a bit with @tnachen on some changes to potentially avoid hoarding (only hold resources when a topology is in need of assignment). Unfortunately, when I first tested them in my company's environment they seemed to lead to very odd failure cases where the Storm Nimbus got confused about what was actually running. I need to come back to those changes at some point, but it's honestly not a very high priority for me right now. |
@brndnmtthws has made some changes in #65 that enable filtering of declined resources for a period of 2 minutes (by default) after those resources have sat idle in the MesosNimbus's Here is the derivation of 75 seconds:
So I believe this will make the hoarding problem much better, and might obviate the need for "avoiding hoarding until there are pending topologies", which I referenced above. |
After our work on PR #154 we now have much better understanding of how Storm ends up scheduling work, and we actually have control over whether workers are actually scheduled. So I believe we can work on preventing the hoarding of resources. i.e., we will experiment with only starting to accumulate Offers when we have a topology that is in need of more slots. |
Another idea that was proposed by @tnachen: limit the total number of CPUs (and maybe other resources) that the framework will use. We'd need to store that state somewhere, unclear where. |
This is fixed by the change in #200. Thanks @JessicaLHartog & @srishtyagrawal for fixing this long-standing issue that was making it hard to use Storm in a general Mesos cluster. |
v0.2.4+ has this fix. For v0.1.x (storm-0.x support) we haven't done a release yet. |
Whenever I start the storm-mesos nimbus framework in mesos, it grabs all the available resources, without a single topology submitted. In my little 3-node mesos-slave cluster I have 3.3 CPUs and 13.5GB Mem free before I start the storm-mesos nimbus, and 0 CPU's and 0 Mem immediately after. This is as reported on the mesos console.
Switching to the Frameworks tab on the mesos console shows Storm with 0 Active Tasks, 3.1 CPUs and 11.5GB Memory.
Before Storm
After Storm
The nimbus log does report that it is declining offers every 10 seconds or so ...
Is it normal to expect this? I expected to consume no resources until a topology is submitted, then it would consume whatever the topology required.
The text was updated successfully, but these errors were encountered: