-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NEMO-332] Refactor RunTimePass #191
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've left some comments.
/** | ||
* DataSkewMetric ExecutionProperty. | ||
* Number of partitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on where this 'number of partitions' info used?
Maybe mentioning some use cases(ex. skew handling) would be helpful.
public final class PartitionerProperty extends EdgeExecutionProperty<PartitionerProperty.Value> { | ||
public final class PartitionerProperty | ||
extends EdgeExecutionProperty<Pair<PartitionerProperty.Type, Integer>> { | ||
public static final int AUTO_NUMBER_OF_PARTITIONS = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does 'auto' mean here? How about DESTINATION_PARALLELISM
?
* | ||
* @param value value of the new execution property. | ||
* @return the newly created execution property. | ||
* Use the automatic number of partitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
@@ -23,13 +23,13 @@ | |||
/** | |||
* Parallelism ExecutionProperty. | |||
*/ | |||
public final class ParallelismProperty extends VertexExecutionProperty<Integer> { | |||
public final class MinParallelismProperty extends VertexExecutionProperty<Integer> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What differs 'MinParallelism' and 'Parallelism'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added class-level comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that this property would be meaningful even when it doesn't show the accurate values of the parallelism of each vertex? Why not update the value of the ParallelismProperty
instead when the cases described above occur?
@@ -16,28 +16,26 @@ | |||
* specific language governing permissions and limitations | |||
* under the License. | |||
*/ | |||
package org.apache.nemo.common; | |||
package org.apache.nemo.common.ir.vertex.system; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about having basic
, optimization
(or specialized
?) 2-package structure?
optimization
package forMessageAggregatorVertex
andMessageBarrierVertex
, as they're used in DAG optimizationbasic
package forStreamVertex
,OperatorVertex
, ... etc other thanIRVertex
, since they all extend IRVertex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've explicitly added the SystemIRVertex
abstract class to make it clearer that Stream/MessageBarrier/MessageAggregate vertices are provided by Nemo and allowed to be inserted into an IRDAG to optimize execution. (hence the "system" vertices)
@@ -28,14 +28,14 @@ | |||
* This transform can be used for merging input data into the {@link OutputCollector}. | |||
* @param <T> input/output type. | |||
*/ | |||
public final class RelayTransform<T> implements Transform<T, T> { | |||
public final class StreamTransform<T> implements Transform<T, T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StreamTransform
intuitively sounds like transform used in stream operation(other than batch).
How about ShuffleDataStreamTransform
or something like that to keep the nuance of 'relaying promptly'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this vertex can be decoupled from Shuffle
, as it can be inserted regardless of the communication pattern, but I can't think of a better name at the moment. I do think that Stream
is better than Relay
in capturing the behavior of "immediately forwarding the input element without modifying it".
How about we revisit this later, after we add more system vertices and streaming functionalities?
@@ -33,7 +33,7 @@ | |||
* Constructor. | |||
*/ | |||
public DedicatedKeyPerElementPartitioner() { | |||
// TODO #288: DedicatedKeyPerElementPartitioner should not always return 0 | |||
// TODO #288: DedicatedKeyPerElement should not always return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DedicatedKeyPerElement
type Partitioner should not always return 0?
partitioner = new DedicatedKeyPerElementPartitioner(); | ||
break; | ||
case Hash: | ||
final int numOfPartitions = edgeProperties.get(PartitionerProperty.class).get().right(); | ||
// If AUTO, use the number of destination parallelism to minimize the number of partitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe copy this comment to the definition of AUTO_NUMBER_OF_PARTITIONS?
i'll take a look after #190 |
Thanks @jeongyooneo, I've addressed the comments. |
common/src/main/java/org/apache/nemo/common/ir/vertex/system/SystemIRVertex.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/nemo/common/ir/vertex/executionproperty/MinParallelismProperty.java
Outdated
Show resolved
Hide resolved
...in/java/org/apache/nemo/compiler/optimizer/pass/compiletime/reshaping/SkewReshapingPass.java
Outdated
Show resolved
Hide resolved
Thanks @wonook, I've addressed the comments. |
@jeongyooneo @wonook Ping. 😄 |
LGTM on my side. @jeongyooneo anything else you want to take a look? |
Thanks! I'll merge this. |
JIRA: NEMO-332: Refactor RunTimePass
Major changes:
Minor changes to note:
Tests for the changes:
Closes #191