Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing json reader options for JsonScanRetrySuite #11898

Merged
merged 2 commits into from
Dec 20, 2024

Conversation

jihoonson
Copy link
Collaborator

@jihoonson jihoonson commented Dec 19, 2024

Fixes #11897. JsonScanRetrySuite currently creates a JSON reader options that is not compatible to what the plugin uses. This PR fixes it by consolidating the logic to create a base builder for the options.

Signed-off-by: Jihoon Son <ghoonson@gmail.com>
@jihoonson jihoonson force-pushed the fix-JsonScanRetrySuite branch from 858b6f0 to 66e3abf Compare December 19, 2024 23:43
@@ -29,7 +29,7 @@ class JsonScanRetrySuite extends RmmSparkRetrySuiteBase {

val cudfSchema = GpuColumnVector.from(StructType(Seq(StructField("a", IntegerType),
StructField("b", IntegerType))))
val opts = JSONOptions.builder().withLines(true).build()
val opts = GpuJsonReadCommon.baseCudfJsonOptionsBuilder().withLines(true).build()
Copy link
Collaborator

@ttnghia ttnghia Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line(true) is default and the plugin relies on it. So probably we should better set withLines(true) explicitly in baseCudfJsonOptionsBuilder instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do that in a separate PR. I'd like to keep the change of this PR minimized.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you already changed the cudfJsonOptions function right? So adding one more change like that would not be different IMO.

Copy link
Collaborator Author

@jihoonson jihoonson Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am modifying the code in this function without functionality change. Your suggestion changes its behavior which may or may not impact other tests. Given that this test failure is blocking all PRs, I think it's better to make that change in a different PR without unnecessarily blocking others further.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #11899.

@@ -193,18 +193,22 @@ object GpuJsonReadCommon {
val allowUnquotedControlChars = options.buildJsonFactory()
.isEnabled(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS)

baseCudfJsonOptionsBuilder()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this name can be something better but don't know what 😃

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? From what perspective can it be better? Like readability? brevity?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "base" and why? The builder is not final and not a "base" for something. It is just the builder, with some options set, and before we pass in more options. So I don't like to have "base" in it. But I don't have a better name for it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I used "base" to imply that it has default options. How about cudfJsonOptionsBuilderWithDefaults()? Is that better? It's little long though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaultJsonOptionsBuilder? No need to have such long name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I like the name, but want to have cudf in it as there are two different JSONOptions, one for cudf and another for catalyst. I will rename it to defaultCudfJsonOptionsBuilder in #11899.

@sameerz sameerz added the bug Something isn't working label Dec 20, 2024
@jihoonson
Copy link
Collaborator Author

build

@abellina
Copy link
Collaborator

The main context that I am missing is why we need to disable features in the json reader at all (and all the features were disabled are allowing something to happen, but it seems they allow exceptions???). I was expecting to see disallowing "throwing exceptions". If we do need this special case for the test, we should put document it in the test/builder to say why it is configured this way.

I'll +1 this for now to unblock CI. I think we just need to have a follow on to clean up docs.

Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jihoonson @ttnghia thanks for looking at this

Copy link
Collaborator

@pxLi pxLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to unblock CI thanks

@pxLi pxLi merged commit 01f9fd2 into NVIDIA:branch-25.02 Dec 20, 2024
49 of 50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] JsonScanRetrySuite is failing in the CI.
5 participants