-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NEW] Schedule workers on consolidated offers #154
[NEW] Schedule workers on consolidated offers #154
Conversation
@@ -64,6 +64,8 @@ public String getHostName() { | |||
} | |||
|
|||
public void add(Protos.Offer offer) { | |||
// We are unable to aggregate offers if they are from different workers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
different workers slaves. Note that one can run multiple slaves on the same host. Not sure how we should handle the scenario when see offers coming in from different slaves running on the same worker - abort? or emit an error log and move on?
… and use it to schedule and assign workers with consolidated offers per node. Adresses the following problems. Problem 1: Types of OfferResources was limited - CPU, MEM and PORTS. Each of them where either RESERVED or UNRESERVED. We treated RESERVED or UNRESERVED resources the same way. So the logic in OfferResources was simple. But we now have three different types of RESERVED, STATICALLY_RESERVED and DYNAMICALLY_RESERVED and we are starting to have special handling on each of the types of resources. One example of such code is https://github.com/mesos/storm/pull/111/files#diff-9e83ab25db3f6d262627383d8aa8f815. Problem 2: Currently, offers related logic is spread across OfferResources and RotatingMap. What we ultimately need is an interface which could be used across various parts of the framework. This commit introduces a 'storm.mesos.resources' package which is the first step in achieving the aforementioned goal. This package enables us to define an interface like 'storm.mesos.resources.Aggregator' and have various aggregators implement 'storm.mesos.resources.Aggregator' thereby abstracting the aggegation logic(including locking) from the rest of the packages.
following error while building java7 binary. ``` testGetTasksToLaunchForOneTopologyWithOneOffer(storm.mesos.MesosNimbusTest) Time elapsed: 0.158 sec <<< ERROR! java.lang.NoSuchMethodError: java.lang.String.join(Ljava/lang/CharSequence;Ljava/lang/Iterable;)Ljava/lang/String; ``` Fixing ^^ by using org.apache.commons.lang3.StringUtils.join instead of java.lang.String.join
… Jessica Hartog. Still TODO: 1. Refactor MesosNimbus.getTasksToLaunch as per Erik's comment 2. Refactor SchedulerUtils.getPorts as per Jessica's comment
1. Refactor MesosNimbus.getTasksToLaunch as per Erik's comment 2. Refactor SchedulerUtils.getPorts as per Jessica's comment
Also improve readabiltiy to removing unnecessary verbosity of ReservationType enum identifiers, and then defining their `toString()` method to use the lowercase version of their id. These new logs retain the same information, but in half the characters. We can do even better by stripping out unnecessary trailing 0s from the resources. Example of original log: ``` 2016-06-19T07:55:05.924+0000 s.m.u.MesosCommon [INFO] Available resources at workerhostname: cpu : Resource cpus - Total available : 19.500000 Total available by reservation type : [ , [DYNAMICALLY_RESERVED : 0.0, STATICALLY_RESERVED : 0.0, UNRESERVED : 19.5] ], memory: Resource mem - Total available : 88561.000000 Total available by reservation type : [ , [DYNAMICALLY_RESERVED : 0.0, STATICALLY_RESERVED : 0.0, UNRESERVED : 88561.0] ], ports : [31003-31003,31005-32000] ``` Example of new log: ``` 2016-06-19T09:21:43.075+0000 s.m.u.MesosCommon [INFO] Available resources at workerhostname: cpus: 19.500000 (dynamic: 0.0, static: 0.0, unreserved: 19.5), mem: 72402.000000 (dynamic: 0.0, static: 0.0, unreserved: 72402.0), ports: [31000-31001,31006-32000] ```
* avoid rescheduling executors when all workers are already running * throw RuntimeException for nonsensical case where all workers are running but we have unassigned executors * improve variable naming to be more concise without losing fidelity, and thus improve readability * removing a call from schedule, since we're pretty sure we don't need it * fixing the number of slots available for each topology * spread executors by fixing an unnecessary assignment to executors * spread executors by not scaling workers down when there are less slots available than assigned slots
…ns, delete few unnecessary/irrelevant methods
…e being 1. Our floating point tabulations in AggregatedOffers can lead to confusing results where a 1.0 becomes 0.999999999996 for example. We add a fudge factor by adding 0.01 more CPU resources in one test to work around this until the following issue is fixed: * mesos#161 2. Since we are disabling the ability of allSlotsAvailableForScheduling to account for supervisors already existing when calculating resource needs, we disable the test that is validating that behavior. That will be reenabled once we fix this issue: * mesos#160 Also a few cosmetic-ish changes: 1. Use more standard hostnames with a domain of "example.org", a standard domain for documentation. See http://example.org 2. Rearrange/renumber some of the offers to prevent confusion.
0952c4b
to
1032e58
Compare
🚢 🇮🇹 🏁 😓 |
@erikdw does these changes applies to storm 0.9.6 shim ? |
@salimane : yup, this is a fundamental change to the core scheduling logic which applies to both 0.9.6 & 0.10.x. The shims are just slim "adapters" that allow the same storm-mesos codebase to work with different versions of storm's LocalState class. |
Since @dsKarthick is unavailable and the
master
branch is fundamentally broken, I am taking over #146 in his absence.At the moment, this PR includes:
reserveAndGet
method to prevent us from modifying a list we are iterating over