-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16318][SQL] Implement all remaining xpath functions #13991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can you guys take a look at this? @cloud-fan @dongjoon-hyun @gatorsmile |
|
Test build #61530 has finished for PR 13991 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems the boolean and string one doesn't share same implementation?(xpathUtil.evalBoolean and xpathUtil.evalString)
|
Could you add some test cases for the boundary values? That means, trying the maximum and minimum values for each different data type. Also, the values that are out of the boundary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which Exception will we throw? What is the message we issue? When users see the message, can they understand the cause?
|
Could you add more test cases for corrupted xml and illegal inputs? So far, the test suite |
|
Normally, for each function or data type we implement in an enterprise product, we need to document the limit/restriction. That is why I am asking for the boundary values, negative cases and error handling. |
|
+1 for @gatorsmile 's advices |
|
Thanks for the feedback. I will update this later. |
|
For this one I think we should consider supporting only foldable literals for the path component. It's probably satisfy 99.999% of the use cases, and simplify the code. |
|
Also - rather than having concrete implementations for all of these, why don't we use RuntimeReplaceable? |
|
Test build #61947 has finished for PR 13991 at commit
|
|
I pushed a new change to this. We now have better error messages and test coverage for those. These expressions also now require foldable paths. I also changed the test values to make sure we are not restricting the range of values an expression can return (e.g. for XPathInt the test case makes sure it can return a value larger than short). However, I don't think it'd make sense to test overflow behavior because those are mostly undefined throughout Spark SQL. Re: @rxin's suggestion on using RuntimeReplaceable -- I did try that, but I'm afraid the current implementation of RuntimeReplaceable isn't meant for stacking multiple expressions together, and type checking actually breaks if XPathInt replaces itself with Cast(XPathDouble). I looked into fixing that, but it seemed more complicated and it doesn't actually save much space (it actually takes more code to use RuntimeReplaceable). |
|
Test build #61953 has finished for PR 13991 at commit
|
|
Test build #61954 has finished for PR 13991 at commit
|
|
I just added the general xpath function that returns an array of string too. |
|
Test build #61957 has finished for PR 13991 at commit
|
|
Test build #61958 has finished for PR 13991 at commit
|
|
Test build #61967 has finished for PR 13991 at commit
|
|
Test build #61975 has finished for PR 13991 at commit
|
|
@hvanhovell can you take a look at this? |
|
Test build #61992 has finished for PR 13991 at commit
|
|
Test build #62000 has finished for PR 13991 at commit
|
|
Test build #3175 has finished for PR 13991 at commit
|
|
retest this please |
|
LGTM, pending jenkins |
|
@cloud-fan Jenkins already ran twice successfully before. |
|
The latest result is |
|
I guess whatever generates that message is buggy? |
|
Test build #62070 has finished for PR 13991 at commit
|
|
As a follow-up task. Can you take a look at the following query files and add useful tests in your test? Thanks. |
|
Actually I created the unit tests based on those. |
|
BTW I have to say Hive's test coverage in this area is very spotty, so I don't actually think it's great to follow, but I used those. |
|
OK. Thanks. Then, it will be good to add more tests for cases that are not covered by those hive tests. |
|
Thanks, merging to master! This doesn't merge clearly to 2.0, @petermaxlee can you submit a new PR against 2.0? thanks! |
|
@cloud-fan thanks for merging! @yhuai I think the degree to which we want to add more tests also depend on how much we trust the library we are using. XPath (with Query) is almost as complicated as SQL itself. |
This patch implements all remaining xpath functions that Hive supports and not natively supported in Spark: xpath_int, xpath_short, xpath_long, xpath_float, xpath_double, xpath_string, and xpath. Added unit tests and end-to-end tests. Author: petermaxlee <petermaxlee@gmail.com> Closes apache#13991 from petermaxlee/SPARK-16318.
|
Here it is for branch-2.0. |
## What changes were proposed in this pull request? This patch implements all remaining xpath functions that Hive supports and not natively supported in Spark: xpath_int, xpath_short, xpath_long, xpath_float, xpath_double, xpath_string, and xpath. This is based on #13991 but for branch-2.0. ## How was this patch tested? Added unit tests and end-to-end tests. Author: petermaxlee <petermaxlee@gmail.com> Closes #14131 from petermaxlee/xpath-branch-2.0.
What changes were proposed in this pull request?
This patch implements all remaining xpath functions that Hive supports and not natively supported in Spark: xpath_int, xpath_short, xpath_long, xpath_float, xpath_double, xpath_string, and xpath.
How was this patch tested?
Added unit tests and end-to-end tests.