-
Notifications
You must be signed in to change notification settings - Fork 104
feat(amber): Add Basic Ramen Support for UDF Operators #3674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR is a bit too large. better to split it into two PRs: one for user interface change, one for new allocator implementation.
} | ||
|
||
|
||
private def readStatsFromUri(uriStr: String): Map[String, (Double, Int)] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method is not clear at all. what stats? what is the uri pointing to? please clarify by renaming and add comments.
val document = DocumentFactory.openDocument(uri) | ||
|
||
document._1.get().foldLeft(Map.empty[String, (Double, Int)]) { (acc, tuple) => | ||
val record = tuple.asInstanceOf[Tuple] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is a record
? please give meaningful naming.
* represented as a Double value (currently set to 0, but will be | ||
* updated in the future). | ||
* @param region Region to allocate. | ||
* @return (updated Region, estimated cost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per comments in #3660, we hope to only return resourceConfig instead of the updated region.
operatorConfigs: Map[PhysicalOpIdentity, OperatorConfig], | ||
seedLinkPartitions: Map[PhysicalLink, PartitionInfo] = Map.empty | ||
): Map[PhysicalLink, PartitionInfo] = { | ||
val linkPartitionInfos = mutable.HashMap[PhysicalLink, PartitionInfo]() ++= seedLinkPartitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you saving a copy of the link partitions inside this method? the return type is already a map of partition infos. why do you need to pass an input map seedLinkPartitions
?
|
||
schedule-generator { | ||
max-concurrent-regions = 1 | ||
max-concurrent-regions = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we change this default value?
@JsonProperty(required = true, defaultValue = "true") | ||
@JsonSchemaTitle("Parallelizable?") | ||
@JsonPropertyDescription("Default: True") | ||
@JsonSchemaInject(json = """{"toggleHidden" : ["advanced"]}""") | ||
val parallelizable: Boolean = Boolean.box(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit against this three-step design. Why do we ask users to click a check box (parallelizible
), then click another one (advanced
), then provide a number? This is way too complicated. Can we simplify it?
PhysicalOp | ||
.oneToOnePhysicalOp( | ||
workflowId, | ||
executionId, | ||
operatorIdentifier, | ||
OpExecWithCode(code, "java") | ||
) | ||
.withDerivePartition(_ => UnknownPartition()) | ||
.withInputPorts(operatorInfo.inputPorts) | ||
.withOutputPorts(operatorInfo.outputPorts) | ||
.withPartitionRequirement(partitionRequirement) | ||
.withIsOneToManyOp(true) | ||
.withParallelizable(true) | ||
.withSuggestedWorkerNum(workers) | ||
.withPropagateSchema(SchemaPropagationFunc(propagateSchema)) | ||
} else { | ||
PhysicalOp | ||
.oneToOnePhysicalOp( | ||
workflowId, | ||
executionId, | ||
operatorIdentifier, | ||
OpExecWithCode(code, "java") | ||
) | ||
.withDerivePartition(_ => UnknownPartition()) | ||
.withInputPorts(operatorInfo.inputPorts) | ||
.withOutputPorts(operatorInfo.outputPorts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge the common code. only apply a difference part (i.e., .withParallelizable(true)
) to different cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see pythonUDFSourceOpDescV2 for example.
val physicalOp = if (parallelizable) { | ||
if (advanced) { | ||
PhysicalOp | ||
.oneToOnePhysicalOp( | ||
workflowId, | ||
executionId, | ||
operatorIdentifier, | ||
OpExecWithCode(code, "python") | ||
) | ||
.withSuggestedWorkerNum(workers) | ||
} else { | ||
PhysicalOp | ||
.oneToOnePhysicalOp( | ||
workflowId, | ||
executionId, | ||
operatorIdentifier, | ||
OpExecWithCode(code, "python") | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
val physicalOp = if (parallelizable) { | ||
if (advanced) { | ||
PhysicalOp | ||
.oneToOnePhysicalOp( | ||
workflowId, | ||
executionId, | ||
operatorIdentifier, | ||
OpExecWithCode(code, "python") | ||
) | ||
.withSuggestedWorkerNum(workers) | ||
} else { | ||
PhysicalOp | ||
.oneToOnePhysicalOp( | ||
workflowId, | ||
executionId, | ||
operatorIdentifier, | ||
OpExecWithCode(code, "python") | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
if (parallelizable) { | ||
if (advanced) { | ||
PhysicalOp | ||
.oneToOnePhysicalOp( | ||
workflowId, | ||
executionId, | ||
operatorIdentifier, | ||
OpExecWithCode(code, r_operator_type) | ||
) | ||
} else { | ||
PhysicalOp | ||
.oneToOnePhysicalOp( | ||
workflowId, | ||
executionId, | ||
operatorIdentifier, | ||
OpExecWithCode(code, r_operator_type) | ||
) | ||
.withSuggestedWorkerNum(workers) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
This PR implements basic operator-level parallelism optimization by modifying the GUI interface for UDFs (User Defined Functions). It corresponds to [PR 2] in the Basic Ramen plan. The details and context are discussed in issue #3605.
The Basic Ramen strategy assumes that between two executions of the same workflow, the workflow structure remains unchanged. This allows us to reuse past runtime statistics for optimizing operator-level resource allocation (e.g., worker count).
The full implementation will be split into two PRs:
Key Changes in This PR
Updated UDF UI:
Backend Modifications:
Configuration Support: