-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11980] [SQL] Fix json_tuple and add test cases for SPARK-10621 #9977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/pyspark/sql/functions.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you test both string names and columns? e.g.
df.select(isnan("a").alias("r1"), isnan(df.a).alias("r2")).collect()There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and do the same thing for the rest of the functions
|
Test build #46709 has finished for PR 9977 at commit
|
|
Test build #46713 has finished for PR 9977 at commit
|
|
cc @davies for a final look. The changes LGTM. |
|
In the next few days, I will look at the implementation of If you compare the results of Thank you! |
|
Just did a quick check. I can confirm this is not caused by Python. I reproduced it using the scala API. |
|
Narrowed down to the following code in jsonExpressions.scala: val output = new ByteArrayOutputStream()
val matched = Utils.tryWithResource(
jsonFactory.createGenerator(output, JsonEncoding.UTF8)) { generator =>
parser.nextToken()
evaluatePath(parser, generator, RawStyle, parsed.get)
}So far, our parser returns the same results of val tuple: Seq[(String, String)] = ("5", """{"f1": null}""") :: Nil
val df: DataFrame = tuple.toDF("key", "jstring")
val res = df.select(functions.get_json_object($"jstring", "$.f1")).collect() val tuple2: Seq[(String, String)] = ("5", """{"f1": "null"}""") :: Nil
val df2: DataFrame = tuple2.toDF("key", "jstring")
val res3 = df2.select(functions.get_json_object($"jstring", "$.f1")).collect() |
|
Found a discussion about this issue: Please let me know what I should do next. Thanks! @rxin @davies @marmbrus @cloud-fan |
python/pyspark/sql/functions.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think one simple case should be enough for Python tests, other corner cases should be tested in Scala.
The Python doc tests will be part of API doc, so it's better to be read friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do. I will move the test case of get_json_object to the scala test file. Will simplify the existing test cases of get_json_object and json_tuple. Thanks!
|
I'd just simply the test case as Davies suggested, and then merge this in. In parallel you can work on a patch to fix whatever bugs you find. |
|
Test build #46750 has finished for PR 9977 at commit
|
|
Thanks - I'm going to merge this. |
Added Python test cases for the function `isnan`, `isnull`, `nanvl` and `json_tuple`. Fixed a bug in the function `json_tuple` rxin , could you help me review my changes? Please let me know anything is missing. Thank you! Have a good Thanksgiving day! Author: gatorsmile <gatorsmile@gmail.com> Closes #9977 from gatorsmile/json_tuple. (cherry picked from commit 068b643) Signed-off-by: Reynold Xin <rxin@databricks.com>
Added Python test cases for the function
isnan,isnull,nanvlandjson_tuple.Fixed a bug in the function
json_tuple@rxin , could you help me review my changes? Please let me know anything is missing.
Thank you! Have a good Thanksgiving day!