-
Notifications
You must be signed in to change notification settings - Fork 2
Nodes and Arrows Model
The nodes and arrows model is based on the thesis.
We call an execution a flow. Each flow consists of multiple work steps. Each work step receives data from previous steps, processes it, and passes the result data on to following steps. We call a work step a node. A flow consists of one or more nodes which are connected by data flows.
A flow graph (FG) consists of nodes and arrows.
There are two types of arrows, indicated by solid (dependent
) and dashed (dispatched
) arrows.
A crossed arrow represents a variable number of arrows, 0 or more.
The final result (F1
) of a flow connected by dependent arrows depends on all previously executed nodes that are connected by solid arrows.
A ForkJoin
node can be implemented only using dependent arrows.
The flows which result in output data F2
and F3
are created.
In contrast to the dispatch node, the output data of the ForkJoin
node depends on F2
and F3
.
This is visualized by nesting the yellow and green flows inside the blue one.
Still, dispatched flows are independent of the flow they are created in.
The processing order of the dependent arrows is not specified in the graph, it is implementation specific.
The labels indicate the usage.
Dashed arrows indicate creation of new flows, which have their own final result data.
Crossed arrows can be used to indicate data parallelism.
A node could implement the data-parallelism concept Map
with crossed arrows.
The map node implementation creates multiple (crossed arrow) dispatched (dashed arrow) flows depending on Fi
. Since the node creates dispatched flows, the node continues immediately and F1
does not depend on any of the created flows.
Alternatively, MapJoin
can be implemented with crossed dispatched arrows.
A node can be part of multiple flows at the same time.
As long as all nodes which participate in multiple flows are stateless,
the flows do not interfere with each other and F1
and F2
are independent.
Stateful nodes are not prohibited by the model.
However, they should be used with care because they could introduce side effects.
What data flows along the arrows, how to handle multiple dependent
arrows, and how to handle crossed
arrows is up to implementation of the nodes and the framework.
Scraper
handles crossed
arrows as new concurrent threads.