Skip to content

Conversation

@beliefer
Copy link
Contributor

What changes were proposed in this pull request?

Spark SQL not supports to create function of Aggregator yet and deprecated UserDefinedAggregateFunction.
If we want remove UserDefinedAggregateFunction, Spark SQL should provide a new option.

Why are the changes needed?

We need to provide a new way to create user defined aggregate function so as remove UserDefinedAggregateFunction in future.

Does this PR introduce any user-facing change?

Yes. Users will create user defined aggregate function by implement Aggregator.

How was this patch tested?

New tests.

@github-actions github-actions bot added the SQL label Oct 17, 2021
@SparkQA
Copy link

SparkQA commented Oct 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48809/

@SparkQA
Copy link

SparkQA commented Oct 17, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48809/

@SparkQA
Copy link

SparkQA commented Oct 17, 2021

Test build #144330 has finished for PR 34303 at commit 3702b9b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…veSQLViewSuite.scala

Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
@beliefer
Copy link
Contributor Author

ping @cloud-fan

@SparkQA
Copy link

SparkQA commented Oct 18, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48824/

@SparkQA
Copy link

SparkQA commented Oct 18, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48824/

@SparkQA
Copy link

SparkQA commented Oct 18, 2021

Test build #144346 has finished for PR 34303 at commit 7b11c6e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// ScalaAggregator or Hive UDF/UDAF/UDTF with function definition. Otherwise,
// we just throw it earlier.
// Unfortunately we need to use reflection here because Aggregator
// and ScalaAggregator are defined in sql/core module.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classes in sql/core are available in sql/hive. What's the problem you hit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or you want to move the code to SessionCatalog? then reflection makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Thank you for your remind.

@SparkQA
Copy link

SparkQA commented Oct 18, 2021

Test build #144365 has finished for PR 34303 at commit 136b8f5.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Oct 18, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48840/

@SparkQA
Copy link

SparkQA commented Oct 18, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48840/

@SparkQA
Copy link

SparkQA commented Oct 18, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48841/

@SparkQA
Copy link

SparkQA commented Oct 18, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48841/

@SparkQA
Copy link

SparkQA commented Oct 18, 2021

Test build #144366 has finished for PR 34303 at commit 136b8f5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

deserializer,
ClassTag(cls))

val e = classOf[ScalaAggregator[_, _, _]].getConstructor(classOf[Seq[Expression]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just new ScalaAggregator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reminder.

val baseClassType = typeOf[Aggregator[_, _, _]].typeSymbol.asClass
val baseType = internal.thisType(classType).baseType(baseClassType)
val tpe = baseType.typeArgs.head
val cls = mirror.runtimeClass(tpe)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you copy the code above from somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code references

val serializer = ScalaReflection.serializerForType(tpe)

functionName -> true) {
// create a function in default database
sql("USE DEFAULT")
sql(s"CREATE FUNCTION $functionName AS '$avgFuncClass'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do some basic test to make sure the function can be called? and with compatible input types to test implicit cast, incompatible input types to make sure the type check works.

@SparkQA
Copy link

SparkQA commented Oct 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48867/

@SparkQA
Copy link

SparkQA commented Oct 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48867/

@SparkQA
Copy link

SparkQA commented Oct 19, 2021

Test build #144393 has finished for PR 34303 at commit 815c65c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

cloud-fan added a commit that referenced this pull request Oct 21, 2021
… sql/core

### What changes were proposed in this pull request?

This PR adds a new internal interface `FunctionExpressionBuilder`, to replace `SessionCatalog.makeFunctionExpression`. Then we can put the interface implementation in sql/core, to avoid using reflection in `SessionCatalog.makeFunctionExpression`, because the class `UserDefinedAggregateFunction` is not available in sql/catalyst.

### Why are the changes needed?

code cleanup, and make it easier to support using `Aggregator` as UDAF later (#34303).

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #34340 from cloud-fan/function.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@beliefer
Copy link
Contributor Author

Because #34340 reactor the architecture of register user-defined function, I opened #34352 replaces this one.

@beliefer beliefer closed this Oct 21, 2021
yhcast0 pushed a commit to yhcast0/spark that referenced this pull request Apr 5, 2024
…sql/core

This PR adds a new internal interface `FunctionExpressionBuilder`, to replace `SessionCatalog.makeFunctionExpression`. Then we can put the interface implementation in sql/core, to avoid using reflection in `SessionCatalog.makeFunctionExpression`, because the class `UserDefinedAggregateFunction` is not available in sql/catalyst.

code cleanup, and make it easier to support using `Aggregator` as UDAF later (apache#34303).

no

existing tests

Closes apache#34340 from cloud-fan/function.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants