By default, Aurora will pick any random agent with sufficient resources in order to schedule a task. This scheduling choice can be further restricted with the help of constraints.
Data centers are often organized with hierarchical failure domains. Common failure domains include hosts, racks, rows, and PDUs. If you have this information available, it is wise to tag the Mesos agent with them as attributes.
The Mesos agent --attributes
command line argument can be used to mark agents with
static key/value pairs, so called attributes (not to be confused with --resources
, which are
dynamic and accounted).
For example, consider the host cluster1-aaa-03-sr2
and its following attributes (given in
key:value format): host:cluster1-aaa-03-sr2
and rack:aaa
.
Aurora makes these attributes available for matching with scheduling constraints.
Limit constraints allow to control machine diversity using constraints. The below constraint ensures that no more than two instances of your job may run on a single host. Think of this as a "group by" limit.
Service(
name = 'webservice',
role = 'www-data',
constraints = {
'host': 'limit:2',
}
...
)
Likewise, you can use constraints to control rack diversity, e.g. at most one task per rack:
constraints = {
'rack': 'limit:1',
}
Use these constraints sparingly as they can dramatically reduce Tasks' schedulability. Further details are available in the reference documentation on Scheduling Constraints.
Value constraints can be used to express that a certain attribute with a certain value
should be present on a Mesos agent. For example, the following job would only be
scheduled on nodes that claim to have an SSD
as their disk.
Service(
name = 'webservice',
role = 'www-data',
constraints = {
'disk': 'SSD',
}
...
)
Further details are available in the reference documentation on Scheduling Constraints.
Aurora is best suited to run stateless applications, but it also accommodates for stateful services like databases, or services that otherwise need to always run on the same machines.
Most of the Mesos attributes arbitrary and available for custom use. There is one exception,
though: the dedicated
attribute. Aurora treats this specially, and only allows matching jobs to
run on these machines, and will only schedule matching jobs on these machines.
The dedicated attribute has semantic meaning. The format is $role(/.*)?
. When a job is created,
the scheduler requires that the $role
component matches the role
field in the job
configuration, and will reject the job creation otherwise. The remainder of the attribute is
free-form. We've developed the idiom of formatting this attribute as $role/$job
, but do not
enforce this. For example: a job devcluster/www-data/prod/hello
with a dedicated constraint set as
www-data/web.multi
will have its tasks scheduled only on Mesos agents configured with:
--attributes=dedicated:www-data/web.multi
.
A wildcard (*
) may be used for the role portion of the dedicated attribute, which will allow any
owner to elect for a job to run on the host(s). For example: tasks from both
devcluster/www-data/prod/hello
and devcluster/vagrant/test/hello
with a dedicated constraint
formatted as */web.multi
will be scheduled only on Mesos agents configured with
--attributes=dedicated:*/web.multi
. This may be useful when assembling a virtual cluster of
machines sharing the same set of traits or requirements.
Consider the following agent command line:
mesos-slave --attributes="dedicated:db_team/redis" ...
And this job configuration:
Service(
name = 'redis',
role = 'db_team',
constraints = {
'dedicated': 'db_team/redis'
}
...
)
The job configuration is indicating that it should only be scheduled on agents with the attribute
dedicated:db_team/redis
. Additionally, Aurora will prevent any tasks that do not have that
constraint from running on those agents.