Storm framework consumes all mesos resources #50

dlaidlaw · 2015-06-24T18:03:20Z

Whenever I start the storm-mesos nimbus framework in mesos, it grabs all the available resources, without a single topology submitted. In my little 3-node mesos-slave cluster I have 3.3 CPUs and 13.5GB Mem free before I start the storm-mesos nimbus, and 0 CPU's and 0 Mem immediately after. This is as reported on the mesos console.

Switching to the Frameworks tab on the mesos console shows Storm with 0 Active Tasks, 3.1 CPUs and 11.5GB Memory.

Before Storm

After Storm

The nimbus log does report that it is declining offers every 10 seconds or so ...

2015-06-24T17:59:36.377+0000 s.m.MesosNimbus [INFO] Declining offers because no topologies need assignments
2015-06-24T17:59:46.388+0000 s.m.MesosNimbus [INFO] Declining offers because no topologies need assignments

Is it normal to expect this? I expected to consume no resources until a topology is submitted, then it would consume whatever the topology required.

The text was updated successfully, but these errors were encountered:

chengweiv5 · 2015-06-25T01:06:21Z

This is correct, what you see CPUs and Mem in Activate Frameworks are resources offered, rather than used.

erikdw · 2015-06-25T02:20:24Z

@dlaidlaw : yeaah.... so what you're witnessing is one of the biggest problems with Storm's mesos Scheduler: it hoards resources. It does release those resources back to the cluster on a rotating basis, but it still holds onto resources for awhile. The reason it needs to hoard resources at all is that "normally" Storm's scheduling model expects that resources are statically present (as "slots" into which it can assign Storm worker processes). However, when Storm runs in mesos there are no preexisting slots. Instead we need to fabricate slots out of the resource offers, using each topology's configured resource demands (CPU, mem) to size the "slots".

I've worked a bit with @tnachen on some changes to potentially avoid hoarding (only hold resources when a topology is in need of assignment). Unfortunately, when I first tested them in my company's environment they seemed to lead to very odd failure cases where the Storm Nimbus got confused about what was actually running. I need to come back to those changes at some point, but it's honestly not a very high priority for me right now.

erikdw · 2015-11-18T01:26:11Z

@brndnmtthws has made some changes in #65 that enable filtering of declined resources for a period of 2 minutes (by default) after those resources have sat idle in the MesosNimbus's _offers buffer for a period of 75 seconds.

Here is the derivation of 75 seconds:

RotatingMap.DEFAULT_NUM_BUCKETS * rotate_period
- (3) * (1000ms * 2.5 * 10)
  - 75 seconds

So I believe this will make the hoarding problem much better, and might obviate the need for "avoiding hoarding until there are pending topologies", which I referenced above.

erikdw · 2016-07-30T03:27:29Z

After our work on PR #154 we now have much better understanding of how Storm ends up scheduling work, and we actually have control over whether workers are actually scheduled. So I believe we can work on preventing the hoarding of resources. i.e., we will experiment with only starting to accumulate Offers when we have a topology that is in need of more slots.

erikdw · 2016-11-06T00:16:58Z

Another idea that was proposed by @tnachen: limit the total number of CPUs (and maybe other resources) that the framework will use. We'd need to store that state somewhere, unclear where.

erikdw · 2017-08-16T05:34:46Z

This is fixed by the change in #200. Thanks @JessicaLHartog & @srishtyagrawal for fixing this long-standing issue that was making it hard to use Storm in a general Mesos cluster.

erikdw · 2017-08-16T06:12:30Z

v0.2.4+ has this fix. For v0.1.x (storm-0.x support) we haven't done a release yet.

drewrobb mentioned this issue Apr 19, 2016

Framework not declining offers #132

Closed

erikdw added the scheduler label Jul 21, 2016

JessicaLHartog mentioned this issue Jun 6, 2017

Selfless Offer Handling #200

Merged

erikdw closed this as completed Aug 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Storm framework consumes all mesos resources #50

Storm framework consumes all mesos resources #50

dlaidlaw commented Jun 24, 2015

chengweiv5 commented Jun 25, 2015

erikdw commented Jun 25, 2015

erikdw commented Nov 18, 2015

erikdw commented Jul 30, 2016

erikdw commented Nov 6, 2016

erikdw commented Aug 16, 2017 •

edited

Loading

erikdw commented Aug 16, 2017

Storm framework consumes all mesos resources #50

Storm framework consumes all mesos resources #50

Comments

dlaidlaw commented Jun 24, 2015

Before Storm

After Storm

chengweiv5 commented Jun 25, 2015

erikdw commented Jun 25, 2015

erikdw commented Nov 18, 2015

erikdw commented Jul 30, 2016

erikdw commented Nov 6, 2016

erikdw commented Aug 16, 2017 • edited Loading

erikdw commented Aug 16, 2017

erikdw commented Aug 16, 2017 •

edited

Loading