Apache Nemo (incubating) v0.1
Introduction
Apache Nemo is an in-memory distributed data processing framework that supports flexible optimization of scheduling and communication according to resource and data characteristics. This release includes implementation of policy layer, modular runtime and several example policies.
Main Features
Policy Layer
optimizer
package includes compiler passes that can be used to compose a policy. In this release, we provide annotation passes that allow policy writers to annotate metadata in IR-level DAG, reshaping passes that modifies the structure of the DAG.
Example Policies
examples
package include policies that optimize scheduling and communication according to resource and data characteristics, such asTransientResourcePolicy
andDataSkewPolicy
.
Modular Runtime
- Runtime of Nemo has a modular nature, where each module can be configured according to the applied policy. IR-level DAG is translated to physical-level DAG, which is launched by a single Master and executed in parallel with multiple Executors. According to the optimization encoded in the applied policy, scheduling and communication module of the Runtime is auto-configured, and optimized physical-level DAG is executed.