feat(amber): Add Basic Ramen Support for UDF Operators #3674

yunyad · 2025-08-19T06:31:43Z

This PR implements basic operator-level parallelism optimization by modifying the GUI interface for UDFs (User Defined Functions). It corresponds to [PR 2] in the Basic Ramen plan. The details and context are discussed in issue #3605.

The Basic Ramen strategy assumes that between two executions of the same workflow, the workflow structure remains unchanged. This allows us to reuse past runtime statistics for optimizing operator-level resource allocation (e.g., worker count).

The full implementation will be split into two PRs:

PR 1 (this PR): Add UI and backend support for operator-level parallelism in UDFs
PR 2: Extend support to all other parallelizable operators

Key Changes in This PR

Updated UDF UI:

Added number-of-workers input field to the UDF operator panel
Ensures users can configure parallelism directly through the UI

Backend Modifications:

Refactored ResourceAllocator to support configurable parallelism
Implemented GreedyResourceAllocator to select parallelism level based on historical runtime
Integrated GreedyResourceAllocator with the UDF operator execution logic

Configuration Support:

Added workflow-level flags to enable/disable GreedyResourceAllocator
Allows flexible toggling of Basic Ramen mode

Yicong-Huang

I think the PR is a bit too large. better to split it into two PRs: one for user interface change, one for new allocator implementation.

Yicong-Huang · 2025-08-20T06:11:27Z

.../uci/ics/amber/engine/architecture/scheduling/resourcePolicies/GreedyResourceAllocator.scala

+  }
+
+
+  private def readStatsFromUri(uriStr: String): Map[String, (Double, Int)] = {


this method is not clear at all. what stats? what is the uri pointing to? please clarify by renaming and add comments.

Yicong-Huang · 2025-08-20T06:11:54Z

.../uci/ics/amber/engine/architecture/scheduling/resourcePolicies/GreedyResourceAllocator.scala

+    val document = DocumentFactory.openDocument(uri)
+
+    document._1.get().foldLeft(Map.empty[String, (Double, Int)]) { (acc, tuple) =>
+      val record = tuple.asInstanceOf[Tuple]


what is a record? please give meaningful naming.

Yicong-Huang · 2025-08-20T06:14:08Z

...la/edu/uci/ics/amber/engine/architecture/scheduling/resourcePolicies/ResourceAllocator.scala

-    *         represented as a Double value (currently set to 0, but will be
-    *         updated in the future).
+    * @param region Region to allocate.
+    * @return (updated Region, estimated cost)


per comments in #3660, we hope to only return resourceConfig instead of the updated region.

Yicong-Huang · 2025-08-20T06:21:45Z

...la/edu/uci/ics/amber/engine/architecture/scheduling/resourcePolicies/ResourceAllocator.scala

+      operatorConfigs: Map[PhysicalOpIdentity, OperatorConfig],
+      seedLinkPartitions: Map[PhysicalLink, PartitionInfo] = Map.empty
+  ): Map[PhysicalLink, PartitionInfo] = {
+    val linkPartitionInfos = mutable.HashMap[PhysicalLink, PartitionInfo]() ++= seedLinkPartitions


why are you saving a copy of the link partitions inside this method? the return type is already a map of partition infos. why do you need to pass an input map seedLinkPartitions?

Yicong-Huang · 2025-08-20T06:23:22Z

core/config/src/main/resources/application.conf


 schedule-generator {
-    max-concurrent-regions = 1
+    max-concurrent-regions = 2


why do we change this default value?

Yicong-Huang · 2025-08-20T06:26:47Z

core/workflow-operator/src/main/scala/edu/uci/ics/amber/operator/udf/java/JavaUDFOpDesc.scala

+  @JsonProperty(required = true, defaultValue = "true")
+  @JsonSchemaTitle("Parallelizable?")
+  @JsonPropertyDescription("Default: True")
+  @JsonSchemaInject(json = """{"toggleHidden" : ["advanced"]}""")
+  val parallelizable: Boolean = Boolean.box(true)


I am a bit against this three-step design. Why do we ask users to click a check box (parallelizible), then click another one (advanced), then provide a number? This is way too complicated. Can we simplify it?

Yicong-Huang · 2025-08-20T06:28:01Z

core/workflow-operator/src/main/scala/edu/uci/ics/amber/operator/udf/java/JavaUDFOpDesc.scala

+        PhysicalOp
+          .oneToOnePhysicalOp(
+            workflowId,
+            executionId,
+            operatorIdentifier,
+            OpExecWithCode(code, "java")
+          )
+          .withDerivePartition(_ => UnknownPartition())
+          .withInputPorts(operatorInfo.inputPorts)
+          .withOutputPorts(operatorInfo.outputPorts)
+          .withPartitionRequirement(partitionRequirement)
+          .withIsOneToManyOp(true)
+          .withParallelizable(true)
+          .withSuggestedWorkerNum(workers)
+          .withPropagateSchema(SchemaPropagationFunc(propagateSchema))
+      } else {
+        PhysicalOp
+          .oneToOnePhysicalOp(
+            workflowId,
+            executionId,
+            operatorIdentifier,
+            OpExecWithCode(code, "java")
+          )
+          .withDerivePartition(_ => UnknownPartition())
+          .withInputPorts(operatorInfo.inputPorts)
+          .withOutputPorts(operatorInfo.outputPorts)


merge the common code. only apply a difference part (i.e., .withParallelizable(true)) to different cases

see pythonUDFSourceOpDescV2 for example.

Yicong-Huang · 2025-08-20T06:28:52Z

...r/src/main/scala/edu/uci/ics/amber/operator/udf/python/DualInputPortsPythonUDFOpDescV2.scala

+    val physicalOp = if (parallelizable) {
+      if (advanced) {
+        PhysicalOp
+          .oneToOnePhysicalOp(
+            workflowId,
+            executionId,
+            operatorIdentifier,
+            OpExecWithCode(code, "python")
+          )
+          .withSuggestedWorkerNum(workers)
+      } else {
+        PhysicalOp
+          .oneToOnePhysicalOp(
+            workflowId,
+            executionId,
+            operatorIdentifier,
+            OpExecWithCode(code, "python")
+          )
+      }


Yicong-Huang · 2025-08-20T06:29:02Z

...rkflow-operator/src/main/scala/edu/uci/ics/amber/operator/udf/python/PythonUDFOpDescV2.scala

+    val physicalOp = if (parallelizable) {
+      if (advanced) {
+        PhysicalOp
+          .oneToOnePhysicalOp(
+            workflowId,
+            executionId,
+            operatorIdentifier,
+            OpExecWithCode(code, "python")
+          )
+          .withSuggestedWorkerNum(workers)
+      } else {
+        PhysicalOp
+          .oneToOnePhysicalOp(
+            workflowId,
+            executionId,
+            operatorIdentifier,
+            OpExecWithCode(code, "python")
+          )
+      }


Yicong-Huang · 2025-08-20T06:29:19Z

core/workflow-operator/src/main/scala/edu/uci/ics/amber/operator/udf/r/RUDFOpDesc.scala

+    if (parallelizable) {
+      if (advanced) {
+        PhysicalOp
+          .oneToOnePhysicalOp(
+            workflowId,
+            executionId,
+            operatorIdentifier,
+            OpExecWithCode(code, r_operator_type)
+          )
+      } else {
+        PhysicalOp
+          .oneToOnePhysicalOp(
+            workflowId,
+            executionId,
+            operatorIdentifier,
+            OpExecWithCode(code, r_operator_type)
+          )
+          .withSuggestedWorkerNum(workers)
+      }


initial commit

5ba6133

yunyad changed the title ~~feat(amber): add basic operator-level resource allocator~~ feat(amber): [PR 1/2] Add Basic Ramen Support for UDF Operators Aug 19, 2025

header

bc6b0eb

aglinxinyuan requested a review from Yicong-Huang August 19, 2025 06:51

yunyad self-assigned this Aug 20, 2025

Yicong-Huang requested changes Aug 20, 2025

View reviewed changes

yunyad changed the title ~~feat(amber): [PR 1/2] Add Basic Ramen Support for UDF Operators~~ feat(amber): Add Basic Ramen Support for UDF Operators Aug 20, 2025

		}


		private def readStatsFromUri(uriStr: String): Map[String, (Double, Int)] = {

feat(amber): Add Basic Ramen Support for UDF Operators #3674

Are you sure you want to change the base?

feat(amber): Add Basic Ramen Support for UDF Operators #3674

Uh oh!

Conversation

yunyad commented Aug 19, 2025

Uh oh!

Yicong-Huang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants