Skip to content

Nodes and Arrows Model

Albert Schimpf edited this page May 19, 2020 · 6 revisions

The nodes and arrows model is based on the thesis.

Flows

We call an execution a flow. Each flow consists of multiple work steps. Each work step receives data from previous steps, processes it, and passes the result data on to following steps. We call a work step a node. A flow consists of one or more nodes which are connected by data flows.

A flow graph (FG) consists of nodes and arrows. There are two types of arrows, indicated by solid (dependent) and dashed (dispatched) arrows. A crossed arrow represents a variable number of arrows, 0 or more.

Dependent Arrows

The final result (F1) of a flow connected by dependent arrows depends on all previously executed nodes that are connected by solid arrows.

image

A ForkJoin node can be implemented only using dependent arrows. The flows which result in output data F2 and F3 are created. In contrast to the dispatch node, the output data of the ForkJoin node depends on F2 and F3. This is visualized by nesting the yellow and green flows inside the blue one. Still, dispatched flows are independent of the flow they are created in. The processing order of the dependent arrows is not specified in the graph, it is implementation specific. The labels indicate the usage.

image

Dispatched Arrows

Dashed arrows indicate creation of new flows, which have their own final result data.

image

Crossed Arrows

Crossed arrows can be used to indicate data parallelism. A node could implement the data-parallelism concept Map with crossed arrows. The map node implementation creates multiple (crossed arrow) dispatched (dashed arrow) flows depending on Fi. Since the node creates dispatched flows, the node continues immediately and F1 does not depend on any of the created flows.

image

Alternatively, MapJoin can be implemented with crossed dispatched arrows.

image

Node in Multiple Flows

A node can be part of multiple flows at the same time. As long as all nodes which participate in multiple flows are stateless, the flows do not interfere with each other and F1 and F2 are independent. Stateful nodes are not prohibited by the model. However, they should be used with care because they could introduce side effects.

image

Implementation Decisions

What data flows along the arrows, how to handle multiple dependent arrows, and how to handle crossed arrows is up to implementation of the nodes and the framework.

Scraper handles crossed arrows as new concurrent threads.