Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rows.objectToNumber: Accept decimals with output type LONG. #15999

Merged
merged 4 commits into from
Mar 4, 2024

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Feb 28, 2024

PR #15615 added an optimization to avoid parsing numbers twice in cases where we know that they should definitely be longs or definitely be doubles. Rather than try parsing as long first, and then try parsing as double, it would use only the parsing routine specific to the requested outputType.

This caused a bug: previously, we would accept decimals like "1.0" or "1.23" as longs, by truncating them to "1". After that patch, we would treat such decimals as nulls when the outputType is set to LONG.

This patch retains the short-circuit for doubles: if outputType is DOUBLE, we only parse the string as a double. But for outputType LONG, this patch restores the old behavior: try to parse as long first, then double.

PR apache#15615 added an optimization to avoid parsing numbers twice in cases
where we know that they should definitely be longs or
definitely be doubles. Rather than try parsing as long first, and then
try parsing as double, it would use only the parsing routine specific to
the requested outputType.

This caused a bug: previously, we would accept decimals like "1.0" or
"1.23" as longs, by truncating them to "1". After that patch, we would
treat such decimals as nulls when the outputType is set to LONG.

This patch retains the short-circuit for doubles: if outputType is
DOUBLE, we only parse the string as a double. But for outputType LONG,
this patch restores the old behavior: try to parse as long first,
then double.
@gianm gianm added this to the 29.0.1 milestone Feb 28, 2024
@gianm
Copy link
Contributor Author

gianm commented Feb 28, 2024

This bug is triggered when Rows.objectToNumber is called with outputType set to LONG and with a decimal string input.

The only call to Rows.objectToNumber that uses nonnull outputType is RowBasedColumnSelectorFactory, in a situation where its columnInspector has type info. Most usages of RowBasedColumnSelectorFactory don't satisfy both conditions for this bug: they either don't have type info, or they do have type info but they would not provide a string with type LONG.

AFAICT, the only call sites that meet both these criteria are ExternalSegment (used for EXTERN in SQL-based ingest) and InlineSegmentWrangler (used for inline datasources). So this bug could be triggered by an EXTERN call with a column that is declared as LONG or BIGINT but is naturally string-typed (such as a column from a TSV or CSV file); or an inline datasource where a field is declared as LONG but is provided as a string in the JSON.

@pranavbhole
Copy link
Contributor

need to fix tests: "notanumber" is now 0.0 instead of 0 which should fine i guess.

Error: org.apache.druid.data.input.impl.RowsTest.test_objectToNumber_typeDouble_noThrow Error: Run 1: RowsTest.test_objectToNumber_typeDouble_noThrow:224 notanumber (nothrow) expected:<0> but was:<0.0> [INFO] Error: org.apache.druid.data.input.impl.RowsTest.test_objectToNumber_typeFloat_noThrow Error: Run 1: RowsTest.test_objectToNumber_typeFloat_noThrow:189 notanumber (nothrow) expected:<0> but was:<0.0>

@gianm
Copy link
Contributor Author

gianm commented Feb 29, 2024

Yes, right, the test cases in replace-with-default mode were not looking for the right thing. That's fixed now. Thanks.

} else if (outputType == ValueType.DOUBLE) {
return asNumber.doubleValue();
} else {
throw new ISE("Cannot read number as type[%s]", outputType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new ISE("Cannot read number as type[%s]", outputType);
throw new ISE("Cannot read number as type[%s] for field [%s]", outputType, name);

@abhishekagarwal87 abhishekagarwal87 merged commit 376a41f into apache:master Mar 4, 2024
83 checks passed
@abhishekagarwal87
Copy link
Contributor

merged it since the comment was not a blocker.

@gianm gianm deleted the otn-lenient branch March 4, 2024 16:58
gianm added a commit to gianm/druid that referenced this pull request Mar 6, 2024
…5999)

* Rows.objectToNumber: Accept decimals with output type LONG.

PR apache#15615 added an optimization to avoid parsing numbers twice in cases
where we know that they should definitely be longs or
definitely be doubles. Rather than try parsing as long first, and then
try parsing as double, it would use only the parsing routine specific to
the requested outputType.

This caused a bug: previously, we would accept decimals like "1.0" or
"1.23" as longs, by truncating them to "1". After that patch, we would
treat such decimals as nulls when the outputType is set to LONG.

This patch retains the short-circuit for doubles: if outputType is
DOUBLE, we only parse the string as a double. But for outputType LONG,
this patch restores the old behavior: try to parse as long first,
then double.
abhishekagarwal87 pushed a commit that referenced this pull request Mar 8, 2024
…16062)

* Rows.objectToNumber: Accept decimals with output type LONG.

PR #15615 added an optimization to avoid parsing numbers twice in cases
where we know that they should definitely be longs or
definitely be doubles. Rather than try parsing as long first, and then
try parsing as double, it would use only the parsing routine specific to
the requested outputType.

This caused a bug: previously, we would accept decimals like "1.0" or
"1.23" as longs, by truncating them to "1". After that patch, we would
treat such decimals as nulls when the outputType is set to LONG.

This patch retains the short-circuit for doubles: if outputType is
DOUBLE, we only parse the string as a double. But for outputType LONG,
this patch restores the old behavior: try to parse as long first,
then double.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants