-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-40883][CONNECT] Support Range in Connect proto #38347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -44,6 +44,7 @@ message Relation { | |
| Sample sample = 12; | ||
| Offset offset = 13; | ||
| Deduplicate deduplicate = 14; | ||
| Range range = 15; | ||
|
|
||
| Unknown unknown = 999; | ||
| } | ||
|
|
@@ -217,3 +218,23 @@ message Sample { | |
| int64 seed = 1; | ||
| } | ||
| } | ||
|
|
||
| // Relation of type [[Range]] that generates a sequence of integers. | ||
| message Range { | ||
| // Optional. Default value = 0 | ||
| int32 start = 1; | ||
| int32 end = 2; | ||
| // Optional. Default value = 1 | ||
| Step step = 3; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes let me follow up. I guess I was looking at python side API somehow thus confused myself on the types.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updating in #38460. |
||
| // Optional. Default value is assigned by 1) SQL conf "spark.sql.leafNodeDefaultParallelism" if | ||
| // it is set, or 2) spark default parallelism. | ||
| NumPartitions num_partitions = 4; | ||
|
||
|
|
||
| message Step { | ||
| int32 step = 1; | ||
| } | ||
|
|
||
| message NumPartitions { | ||
| int32 num_partitions = 1; | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -19,6 +19,7 @@ package org.apache.spark.sql.connect | |||
| import scala.collection.JavaConverters._ | ||||
| import scala.language.implicitConversions | ||||
|
|
||||
| import org.apache.spark.connect.proto | ||||
| import org.apache.spark.connect.proto._ | ||||
| import org.apache.spark.connect.proto.Join.JoinType | ||||
| import org.apache.spark.connect.proto.SetOperation.SetOpType | ||||
|
|
@@ -34,6 +35,8 @@ import org.apache.spark.sql.connect.planner.DataTypeProtoConverter | |||
|
|
||||
| package object dsl { | ||||
|
|
||||
| class MockRemoteSession {} | ||||
|
|
||||
| object expressions { // scalastyle:ignore | ||||
| implicit class DslString(val s: String) { | ||||
| def protoAttr: Expression = | ||||
|
|
@@ -175,6 +178,28 @@ package object dsl { | |||
| } | ||||
|
|
||||
| object plans { // scalastyle:ignore | ||||
| implicit class DslMockRemoteSession(val session: MockRemoteSession) { | ||||
| def range( | ||||
| start: Option[Int], | ||||
| end: Int, | ||||
| step: Option[Int], | ||||
| numPartitions: Option[Int]): Relation = { | ||||
| val range = proto.Range.newBuilder() | ||||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that I need to keep
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've been explicitly requesting this a couple of times already, as a coding style to always prefix the proto generated classes with their
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It makes sense for However this is the Connect DSL that only deal with protos. No Catalyst included in this package: spark/connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala Line 17 in 9fc3aa0
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As long as no catalyst is in this package this is good with me. Thanks for clarifying. |
||||
| if (start.isDefined) { | ||||
| range.setStart(start.get) | ||||
| } | ||||
| range.setEnd(end) | ||||
| if (step.isDefined) { | ||||
| range.setStep(proto.Range.Step.newBuilder().setStep(step.get)) | ||||
| } | ||||
| if (numPartitions.isDefined) { | ||||
| range.setNumPartitions( | ||||
| proto.Range.NumPartitions.newBuilder().setNumPartitions(numPartitions.get)) | ||||
| } | ||||
| Relation.newBuilder().setRange(range).build() | ||||
| } | ||||
| } | ||||
|
|
||||
| implicit class DslLogicalPlan(val logicalPlan: Relation) { | ||||
| def select(exprs: Expression*): Relation = { | ||||
| Relation | ||||
|
|
||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
end is not optional, but how do we know if the client forgets to set it? 0 is a valid end value as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this becomes tricky. Ultimately we can wrap every such field into a message so we always know if that field is set or not set. However that might complicate entire proto too much.. Let's have a discussion on that.