-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add missing json reader options for JsonScanRetrySuite #11898
Conversation
Signed-off-by: Jihoon Son <ghoonson@gmail.com>
858b6f0
to
66e3abf
Compare
@@ -29,7 +29,7 @@ class JsonScanRetrySuite extends RmmSparkRetrySuiteBase { | |||
|
|||
val cudfSchema = GpuColumnVector.from(StructType(Seq(StructField("a", IntegerType), | |||
StructField("b", IntegerType)))) | |||
val opts = JSONOptions.builder().withLines(true).build() | |||
val opts = GpuJsonReadCommon.baseCudfJsonOptionsBuilder().withLines(true).build() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line(true)
is default and the plugin relies on it. So probably we should better set withLines(true)
explicitly in baseCudfJsonOptionsBuilder
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do that in a separate PR. I'd like to keep the change of this PR minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you already changed the cudfJsonOptions
function right? So adding one more change like that would not be different IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am modifying the code in this function without functionality change. Your suggestion changes its behavior which may or may not impact other tests. Given that this test failure is blocking all PRs, I think it's better to make that change in a different PR without unnecessarily blocking others further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #11899.
@@ -193,18 +193,22 @@ object GpuJsonReadCommon { | |||
val allowUnquotedControlChars = options.buildJsonFactory() | |||
.isEnabled(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS) | |||
|
|||
baseCudfJsonOptionsBuilder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that this name can be something better but don't know what 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate? From what perspective can it be better? Like readability? brevity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "base" and why? The builder is not final and not a "base" for something. It is just the builder, with some options set, and before we pass in more options. So I don't like to have "base" in it. But I don't have a better name for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I used "base" to imply that it has default options. How about cudfJsonOptionsBuilderWithDefaults()
? Is that better? It's little long though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaultJsonOptionsBuilder
? No need to have such long name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I like the name, but want to have cudf
in it as there are two different JSONOptions
, one for cudf and another for catalyst. I will rename it to defaultCudfJsonOptionsBuilder
in #11899.
…into fix-JsonScanRetrySuite
build |
The main context that I am missing is why we need to disable features in the json reader at all (and all the features were disabled are allowing something to happen, but it seems they allow exceptions???). I was expecting to see disallowing "throwing exceptions". If we do need this special case for the test, we should put document it in the test/builder to say why it is configured this way. I'll +1 this for now to unblock CI. I think we just need to have a follow on to clean up docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jihoonson @ttnghia thanks for looking at this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to unblock CI thanks
Fixes #11897.
JsonScanRetrySuite
currently creates a JSON reader options that is not compatible to what the plugin uses. This PR fixes it by consolidating the logic to create a base builder for the options.