Skip to content

Introduction to Apache YARN

Zubair Nabi edited this page Mar 24, 2014 · 1 revision

Apache YARN exposes a resource pull model in which frameworks have to explicitly request resources from its scheduler (This is in contrast to Apache Mesos, where the scheduler pushes resources to frameworks in the form of resource offers). Architecturally, it consists of a central Resource Manager which is in control of cluster wide state and resources, and a per node daemon, Node Manager, which arbitrates node-local resources. The unit of resource allocation is a Container which is a Linux process with limits enforced via cgroups. At present, each container is a two-dimensional vector with CPU (in number of virtual CPUs) and memory (in MB). In addition, container requests can optionally specify locality constraints at the node and rack level. To join the YARN world, each framework first registers a framework-specific Application Master with the Resource Manager. It is the job of this Application Master to negotiate containers with the Resource Manager on behalf of the framework. For instance, in case of MapReduce, its Application Master acquires containers from the Resource Manager to execute map and reduce tasks. Once a container has been allocated to an Application Master, it can directly contact the Node Manager in charge of that container to set up the execution environment and the binary executable for the task that it intends on executing. For instance, focussing on MapReduce, the Application Master would need to set up the Java classpath, put required files (if any) on the local filesystem, and provide the executable JAR, if it wants to execute a map or a reduce task.

More details about YARN can be found at its homepage.