Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Aug 1, 2017

What changes were proposed in this pull request?

This pr added documents about unsupported functions in Hive UDF/UDTF/UDAF.
This pr relates to #18768 and #18527.

How was this patch tested?

N/A

@maropu
Copy link
Member Author

maropu commented Aug 1, 2017

@gatorsmile If you get time, could you check this? Thanks!

Some of them are meaningless in Spark and the others are rarely used by users.
Below is a list of major APIs we don't support in Spark SQL:

* `getRequiredJars` and `getRequiredFiles` (`UDF` and `GenericUDF`) are functions to to automatically
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to to -> to

* `initialize(StructObjectInspector)` in `GenericUDTF` is not supported yet. Spark SQL currently uses
a deprecated interface `initialize(ObjectInspector[])` only.
* `configure` (`GenericUDF`, `GenericUDTF`, and `GenericUDAFEvaluator`) is a function to initialize
functions with `MapredContext`. But, Spark SQL does not use `MapredContext` internally.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functions with MapredContext, which is inapplicable to Spark.

* `configure` (`GenericUDF`, `GenericUDTF`, and `GenericUDAFEvaluator`) is a function to initialize
functions with `MapredContext`. But, Spark SQL does not use `MapredContext` internally.
* `close` (`GenericUDF` and `GenericUDAFEvaluator`) is a function to release associated resources.
Spark SQL does not call this function when tasks finished.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finished -> finish

* `reset` (`GenericUDAFEvaluator`) is a function to re-initialize aggregation for reusing the same aggregation.
Spark SQL currently does not support the reuse of aggregation.
* `getWindowingEvaluator` (`GenericUDAFEvaluator`) is a function to optimize aggregation by evaluating
an aggregate over a fixed window. Spark SQL does not support this optimization yet.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove Spark SQL does not support this optimization yet

@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80103 has finished for PR 18792 at commit 1434bde.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


Spark SQL implements the basic functionality of the Hive UDF/UDTF/UDAF, but does not support all the APIs for users.
Some of them are meaningless in Spark and the others are rarely used by users.
Below is a list of major APIs we don't support in Spark SQL:
Copy link
Member

@gatorsmile gatorsmile Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about simplifying the whole paragraph to?

Not all the APIs of the Hive UDF/UDTF/UDAF are supported by Spark SQL. Below are the unsupported APIs:

@gatorsmile
Copy link
Member

Thanks for working on it! Just left some minor comments.

@maropu
Copy link
Member Author

maropu commented Aug 1, 2017

@gatorsmile ok, fixed.

@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80106 has finished for PR 18792 at commit 7d07e6b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80107 has finished for PR 18792 at commit c703d57.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* `initialize(StructObjectInspector)` in `GenericUDTF` is not supported yet. Spark SQL currently uses
a deprecated interface `initialize(ObjectInspector[])` only.
* `configure` (`GenericUDF`, `GenericUDTF`, and `GenericUDAFEvaluator`) is a function to initialize
functions with `MapredContext`, which is inapplicable to Spark. But, Spark SQL does not use `MapredContext` internally.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: But looks redundant here, because there's inapplicable before. Looks like negative to negative...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed. Thanks!

@gatorsmile
Copy link
Member

LGTM pending Jenkins

@SparkQA
Copy link

SparkQA commented Aug 1, 2017

Test build #80109 has finished for PR 18792 at commit 29f1108.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 110695d Aug 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants