Skip to content

Apache Nemo (incubating) v0.1

Compare
Choose a tag to compare
@jeongyooneo jeongyooneo released this 27 Oct 11:30
· 22 commits to master since this release
3d46caf

Introduction

Apache Nemo is an in-memory distributed data processing framework that supports flexible optimization of scheduling and communication according to resource and data characteristics. This release includes implementation of policy layer, modular runtime and several example policies.

Main Features

Policy Layer

  • optimizer package includes compiler passes that can be used to compose a policy. In this release, we provide annotation passes that allow policy writers to annotate metadata in IR-level DAG, reshaping passes that modifies the structure of the DAG.

Example Policies

  • examples package include policies that optimize scheduling and communication according to resource and data characteristics, such as TransientResourcePolicy and DataSkewPolicy.

Modular Runtime

  • Runtime of Nemo has a modular nature, where each module can be configured according to the applied policy. IR-level DAG is translated to physical-level DAG, which is launched by a single Master and executed in parallel with multiple Executors. According to the optimization encoded in the applied policy, scheduling and communication module of the Runtime is auto-configured, and optimized physical-level DAG is executed.