Skip to content

Conversation

@bdrillard
Copy link

What changes were proposed in this pull request?

This PR is a working followup to the expression work begun in #20085. It provides necessary Expression definitions to support custom encoders (see this discussion in the Spark-Avro project).

It adds the following expressions:

  • ObjectCast - performs explicit casting of an Expression result to a DataType
  • StaticField - retrieves a static field against a class that otherwise has no accessor method
  • InstanceOf - an Expression for the Java instanceof operation

Modifies NewInstance to take a sequence of method-name and arguments initialization tuples, which are executed against the newly constructed object instance.

Removes InitializeJavaBean, as the generalized NewInstance subsumes its use-case.

How was this patch tested?

Adds unit test for NewInstance supporting post-constructor initializations. All previous "JavaBean" tests were refactored to use NewInstance.

Additional examples of working encoders that would use these new expressions can be seen in the Spark-Avro and Bunsen projects.

cls: Class[_],
arguments: Seq[Expression],
dataType: DataType,
initializations: Seq[(String, Seq[Expression])] = Nil,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan, see here an implementation of the modifications you described previously.

@SparkQA
Copy link

SparkQA commented May 17, 2018

Test build #90699 has finished for PR 21348 at commit f8643e3.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • s\"in any enclosing class nor any supertype of $

@SparkQA
Copy link

SparkQA commented May 17, 2018

Test build #90700 has finished for PR 21348 at commit cd7e6f3.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 17, 2018

Test build #90701 has finished for PR 21348 at commit ab51b9f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@bdrillard bdrillard changed the title [SPARK-22739][Catalyst Additional Expression Support for Objects [SPARK-22739][Catalyst] Additional Expression Support for Objects May 17, 2018
override def children: Seq[Expression] = value :: Nil

override def eval(input: InternalRow): Any =
throw new UnsupportedOperationException("Only code-generated evaluation is supported.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any problem on implementing none code-generated evaluation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a pattern I'd seen for eval at the time of writing this PR. Is there another expression that has an example? I could refactor and find out.

@xuanyuanking
Copy link
Member

The jira SPARK-25789 guide me here, thanks for @bdrillard your great job, we also meet the requirement on supporting dataset of avro during Structure Streaming. I'm adapting your code in databricks/spark-avro#217 to newer spark version, it seems work well. Could you give me credit to give a following up work of dataset of avro support? Look forward to your reply :)

@bdrillard
Copy link
Author

bdrillard commented Oct 26, 2018

Hi @xuanyuanking, if you'd like to take on the work to fold my prior work in databricks/spark-avro#217 into Spark, that sounds good to me. Please do include me in pull-requests on this topic. Our work with an Encoder for Avro fulfills particularly heavy use-cases. I'm happy to review the work and I'd also be able to test the feature branch against our workflows as it's being discussed.

The parent ticket for this PR was previously marked as Resolved. Should we go ahead and re-open it and follow through with this PR, since it's required work for SPARK-25789? cc: @cloud-fan

@xuanyuanking
Copy link
Member

@bdrillard Thanks for our reply, of cause I'll request your review and make sure I understand your implement correctly.

Should we go ahead and re-open it and follow through with this PR, since it's required work for SPARK-25789?

As avro has been included in Spark, maybe it's ok to just add these new-added expression in external?

@mazeboard
Copy link

The PR #24299 proposes another solution to add support for Avro objects; your opinions matter, we are eager to have a working solution to create Avro datasets

@bdrillard
Copy link
Author

Hi @mazeboard, thanks for bringing your work to my attention.

To clarify, this PR could be considered follow-up work for another PR addressing adding support for Dataset of arbitrary Avro Objects, see #22878 opened by @xuanyuanking.

I'll take the time to read your request as well. It'd be good for us to compare/contrast the approaches in both #24299 and #22878.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jan 11, 2020
@github-actions github-actions bot closed this Jan 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants