-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add work around for string split with empty input. #11292
Conversation
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nits, lgtm.
else: | ||
return StringGen("", nullable=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: defensive programming
else: | |
return StringGen("", nullable=True) | |
elif (empty_type == EmptyStringType.MIXED): | |
return StringGen("", nullable=True) | |
else: | |
raise AssertionError("unexpected empty type " + str(empty_type)) |
or alternatively make it a map lookup, e.g.:
empty_string_gens_map = {
EmptyStringType.ALL_NULL : lambda: NullGen(StringType()),
EmptyStringType.ALL_EMPTY : lambda: StringGen("", nullable=False)
EmptyStringType.MIXED : lambda: StringGen("", nullable=True)
}
def mk_empty_str_gen(empty_type):
return empty_string_gens_map[empty_type]()
@pytest.mark.parametrize('empty_type', [ | ||
EmptyStringType.ALL_NULL, | ||
EmptyStringType.ALL_EMPTY, | ||
EmptyStringType.MIXED]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.parametrize('empty_type', [ | |
EmptyStringType.ALL_NULL, | |
EmptyStringType.ALL_EMPTY, | |
EmptyStringType.MIXED]) | |
@pytest.mark.parametrize('empty_type', list(EmptyStringType.__members__)) |
@pytest.mark.parametrize('empty_type', [ | ||
EmptyStringType.ALL_NULL, | ||
EmptyStringType.ALL_EMPTY, | ||
EmptyStringType.MIXED]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.parametrize('empty_type', [ | |
EmptyStringType.ALL_NULL, | |
EmptyStringType.ALL_EMPTY, | |
EmptyStringType.MIXED]) | |
@pytest.mark.parametrize('empty_type', list(EmptyStringType.__members__)) |
build |
@jlowe please take another look |
This fixes #11287
I traced down all of the string split calls mentioned in the original CUDF issue rapidsai/cudf#16453.
I added some tests for
GpuStringToMap
, but I could not trigger the issue there. I also added in a test formarray
because I saw some similar odd behavior forGpuStringSplit
when the number of splits = 1, which didn't callstringSplit*
at all, but I also could not trigger any issues there.I didn't touch HiveTextFile because an empty file felt invalid, but I can go back and try to test it if we want to.