feat: add rounding logic and scale zero fix parse_decimal to match parse_string_to_decimal_native behavior #7179

himadripal · 2025-02-23T08:16:21Z

Which issue does this PR close?

Closes add rounding logic and scale zero fix in parse_decimal to match parse_string_to_decimal_native behavior #7355
related to DataFusion should support casting strings such as "4e7" to decimal datafusion#10315
Follows up - Fix: Support for e notation using parse_decimal in string to decimal conversion #6905 ,based on the (Fix: Support for e notation using parse_decimal in string to decimal conversion #6905 (comment))

Few important consideration -

Existing string to decimal conversion uses parse_string_to_decimal_native
parse_string_to_decimal_native does not have support for e-notation
parse_string_to_decimal_native does rounding at scale, not truncate
parse_decimal an existing method has e-notation support and use elsewhere
Fix: Support for e notation using parse_decimal in string to decimal conversion #6905 added rounding support in parse_decimal
moved string to decimal conversion to use parse_decimal to get support for e-notation.

This PR is a 2nd one to break up #6905 , this one add rounding logic to parse_decimal to match the behavior in existing parse_string_to_decimal_native.

Closes #.

Rationale for this change

At present, string to decimal conversion does not support e-notation, in arrow, parse_string_to_decimal_native is called to get generic string to decimal. parse_decimal on the other hand is used from generic parse method and it has e-notation support. This PR is adding rounding and scale 0 handling to match the behavior or parse_string_to_decimal_native method. Then we can replace parse_string_to_decimal_native call with parse_decimal. This way, we will get e-notation support too.

What changes are included in this PR?

Are there any user-facing changes?

alamb

The title of this PR now says it adds rounding logic and makes the logic consistent which sounds good. However, I don't see a ticket that describes the problem.

This PR's description still says it

Closes apache/datafusion#10315

and I did see code changes in the e parsing code but I didn't see any tests 🤔 I don't think we can merge code without tests.

I am sorry to be so pedantic, but arrow-rs is used by many projects now and so evaluating and minimizing downstream impacts is very important. I am trying to avoid the overhead of dealing with releasing regressions like

[Regression in 54.0.0]. Decimal cast to smaller precision gives invalid (off-by-one) result in some cases #7069

And I also apologize for the length between review cycles, but as we have mentioned many times, our review bandwidth is very limited.

alamb · 2025-03-17T19:12:44Z

arrow-cast/src/parse.rs

@@ -850,7 +850,16 @@ fn parse_e_notation<T: DecimalType>(
    }

    if exp < 0 {
-        result = result.div_wrapping(base.pow_wrapping(-exp as _));
+        let result_with_scale = result.div_wrapping(base.pow_wrapping(-exp as _));


does this change the behavior of parsing e notation? If so I didn't see any tests

Yes. I missed porting the tests while splitting the large PR. It rounds instead of current behavior - truncate. I'll add the tests.

alamb · 2025-03-17T19:18:06Z

arrow-cast/src/cast/decimal.rs

@@ -598,7 +599,20 @@ mod tests {
            0_i128
        );
        assert_eq!(
+            parse_decimal::<Decimal128Type>("0", 38, 0)?,
+            parse_string_to_decimal_native::<Decimal128Type>("0", 0)?,


Having the same behavior in these two functions seems like a reasonable change to me

once we are able to move to using parse_decimal for casting and deprecate parse_string_to_decimal_native , these tests will be changed to assert the value in the message section of the assert.

himadripal · 2025-03-18T01:22:54Z

arrow-csv/src/reader/mod.rs

@@ -1286,7 +1286,7 @@ mod tests {
        assert_eq!("53.002666", lat.value_as_string(1));
        assert_eq!("52.412811", lat.value_as_string(2));
        assert_eq!("51.481583", lat.value_as_string(3));
-        assert_eq!("12.123456", lat.value_as_string(4));
+        assert_eq!("12.123457", lat.value_as_string(4));


@alamb you can see the behavior change in this test of arrow-csv reader which uses parse_decimal

himadripal · 2025-03-28T16:10:24Z

he title of this PR now says it adds rounding logic and makes the logic consistent which sounds good. However, I don't see a ticket that describes the problem.

This PR's description still says it

Closes apache/datafusion#10315

Added an issue in arrow-rs #7355

himadripal · 2025-03-28T16:11:20Z

and I did see code changes in the e parsing code but I didn't see any tests 🤔 I don't think we can merge code without tests.

Added e-notation tests

himadripal · 2025-03-28T16:12:57Z

I am sorry to be so pedantic, but arrow-rs is used by many projects now and so evaluating and minimizing downstream impacts is very important. I am trying to avoid the overhead of dealing with releasing regressions like

[Regression in 54.0.0]. Decimal cast to smaller precision gives invalid (off-by-one) result in some cases #7069

I apologize for making this extra overhead. will be careful in future

himadripal · 2025-03-28T16:14:19Z

And I also apologize for the length between review cycles, but as we have mentioned many times, our review bandwidth is very limited.

I understand, will keep this in mind in future.

kazuyukitanimura

Thanks @himadripal

kazuyukitanimura · 2025-03-31T17:02:46Z

arrow-cast/src/parse.rs

-        result = result.div_wrapping(base.pow_wrapping(-exp as _));
+        let result_with_scale = result.div_wrapping(base.pow_wrapping(-exp as _));
+        let result_with_one_scale_up =
+            result.div_wrapping(base.pow_wrapping(-exp.add_wrapping(1) as _));


I assume this logic is correct, but just for me to understand. E.g. for 12345e-5, would exp be -5? why is this adding 1?

exp in the parse_e_notation method is being overriden couple of times based on which direction the decimal needs to shift and if the original string has fractional in it. ( i.e 1.23e-2 has 2 fractional digits).

before this check, exp represents number of digits to be removed or added. In this case, exp = -3

now, result_with_scale = 12
result_with_one_scale_up=123

to round up or down, we need to capture the digit next to last digit in the result, in this case 3. How we get it is
rounding_digit= result_with_one_scale_up - result_with_scale * 10
rounding_digit=123- 12*10 = 3

if rounding_digit >=5 then we add +1 to the result
else result remains intact.

I added a debugging screenshot to help understand it more.

kazuyukitanimura · 2025-03-31T17:19:27Z

arrow-cast/src/parse.rs

+            result = result.div_wrapping(base.pow_wrapping(fractionals as u32))
+        }
+        //add one if >=5
+        if rounding_digit >= 5 {


Wondering where >= 5 came from?

first we figure out what is the rounding_digit - digit which is next to the last digit in the final result (without rounding logic applied), if the value of the rounding_digit is >=5, then we add +1 to round up the result, else it remains same.

"1265E-4" -> with scale 3 -> 0.127 in scale 3 the number would be 0.126 and rounding digit will be 5, as rounding digit >= 5, the result becomes 0.127 1264E-4" -> with scale 3 -> 0.126 here rounding_digit is 4, which is less than 5, so no need to add 1.

>= 5 is being used for rounding to the nearest integer

with scale 1 2.47 -> 2.5 2.44 -> 2.4

Ah ok, makes sense

Perhaps we could add a comment in the code to explain this point to future readers who may have the same question

comphead

Thanks @himadripal I do like tests, perhaps we can also add the tests for negative decimals roundings?

Also tests for very big numbers, or very small would be beneficial. What comes to my mind with help of chatGpt

// **Very Large Numbers**
        assert_eq!(round_to_places(1e15, 2), 1e15); // Large integer should remain unchanged
        assert_eq!(round_to_places(9999999999999.987, 2), 10000000000000.00);
        assert_eq!(round_to_places(-1e15, 3), -1e15);

        // **Very Small Numbers (Near Zero)**
        assert_eq!(round_to_places(1e-15, 10), 0.0000000000); // Rounds to zero at precision 10
        assert_eq!(round_to_places(-1e-15, 10), -0.0000000000); // Rounds to zero
        assert_eq!(round_to_places(0.000000000123456, 12), 0.000000000123); // Should retain up to 12 decimal places
        
        // **Extreme Edge Cases**
        assert_eq!(round_to_places(f64::MAX, 2), f64::MAX); // Maximum f64 value should remain the same
        assert_eq!(round_to_places(f64::MIN, 2), f64::MIN); // Minimum f64 value should remain the same
        assert!(round_to_places(f64::NAN, 2).is_nan()); // NaN should remain NaN
        assert_eq!(round_to_places(f64::INFINITY, 2), f64::INFINITY); // Infinity should remain Infinity
        assert_eq!(round_to_places(f64::NEG_INFINITY, 2), f64::NEG_INFINITY); // Negative Infinity should remain unchanged

alamb · 2025-04-01T12:01:48Z

@himadripal -- I am preparing to create a new release hopefully tomorrow. Can you please address @comphead 's testing suggestions soon so we can get this PR into that release?

himadripal · 2025-04-01T14:23:08Z

@himadripal -- I am preparing to create a new release hopefully tomorrow. Can you please address @comphead 's testing suggestions soon so we can get this PR into that release?

@comphead and @alamb there are existing edge case tests here

I'll add more from @comphead list today.

One more clarifying points - Although it is not mandatory to go together, my goal for this change was to make scientific notation support in datafusion - datafusion#10315. For that we need to also merge #7191 - this is moving the cast to use parse_decimal from parse_decimal_native.

github-actions bot added the arrow Changes to the arrow crate label Feb 23, 2025

himadripal force-pushed the fix_parse_decimal_for_rounding_scale_zero branch from bbd54d4 to 7b33527 Compare February 23, 2025 08:25

himadripal mentioned this pull request Feb 23, 2025

Fix: Support for e notation using parse_decimal in string to decimal conversion #6905

Closed

tustvold added api-change Changes to the arrow API next-major-release the PR has API changes and it waiting on the next major version labels Feb 24, 2025

himadripal mentioned this pull request Feb 25, 2025

feat:use parse_decimal for generic_string_to_decimal conversion. #7191

Draft

alamb mentioned this pull request Feb 26, 2025

Weekly Plan (Andrew Lamb) Feb 24, 2025 apache/datafusion#14850

Closed

10 tasks

himadripal mentioned this pull request Feb 26, 2025

Test : move tests for parse_string_decimal_native to parse_decimal #7177

Closed

add rounding logic and scale zero fix

bef2992

himadripal force-pushed the fix_parse_decimal_for_rounding_scale_zero branch from 7e598c9 to bef2992 Compare February 27, 2025 07:53

alamb mentioned this pull request Mar 3, 2025

Weekly Plan (Andrew Lamb) March 3, 2025 apache/datafusion#14978

Closed

12 tasks

alamb changed the title ~~feat: add rounding logic and scale zero fix fro parse_decimal to match parse_string_to_decimal_native behavior~~ feat: add rounding logic and scale zero fix parse_decimal to match parse_string_to_decimal_native behavior Mar 17, 2025

alamb reviewed Mar 17, 2025

View reviewed changes

alamb mentioned this pull request Mar 17, 2025

Weekly Plan (Andrew Lamb) March 17, 2025 apache/datafusion#15274

Closed

10 tasks

himadripal commented Mar 18, 2025

View reviewed changes

add the tests for e-notation rounding and scale zero fix.

8b81039

kazuyukitanimura reviewed Mar 31, 2025

View reviewed changes

comphead reviewed Apr 1, 2025

View reviewed changes

kazuyukitanimura approved these changes Apr 1, 2025

View reviewed changes

add more tests and convert digits to u16

7bc8ca2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add rounding logic and scale zero fix parse_decimal to match parse_string_to_decimal_native behavior #7179

feat: add rounding logic and scale zero fix parse_decimal to match parse_string_to_decimal_native behavior #7179

himadripal commented Feb 23, 2025 •

edited by alamb

Loading

alamb left a comment

alamb Mar 17, 2025

himadripal Mar 18, 2025 •

edited

Loading

alamb Mar 17, 2025

himadripal Mar 18, 2025 •

edited

Loading

himadripal Mar 18, 2025 •

edited

Loading

himadripal commented Mar 28, 2025

himadripal commented Mar 28, 2025

himadripal commented Mar 28, 2025 •

edited

Loading

himadripal commented Mar 28, 2025

kazuyukitanimura left a comment

kazuyukitanimura Mar 31, 2025

himadripal Apr 1, 2025 •

edited

Loading

kazuyukitanimura Mar 31, 2025

himadripal Apr 1, 2025 •

edited

Loading

himadripal Apr 1, 2025 •

edited

Loading

kazuyukitanimura Apr 1, 2025

alamb Apr 1, 2025

comphead left a comment •

edited

Loading

alamb commented Apr 1, 2025

himadripal commented Apr 1, 2025 •

edited

Loading

feat: add rounding logic and scale zero fix parse_decimal to match parse_string_to_decimal_native behavior #7179

Are you sure you want to change the base?

feat: add rounding logic and scale zero fix parse_decimal to match parse_string_to_decimal_native behavior #7179

Conversation

himadripal commented Feb 23, 2025 • edited by alamb Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

alamb Mar 17, 2025

Choose a reason for hiding this comment

himadripal Mar 18, 2025 • edited Loading

Choose a reason for hiding this comment

alamb Mar 17, 2025

Choose a reason for hiding this comment

himadripal Mar 18, 2025 • edited Loading

Choose a reason for hiding this comment

himadripal Mar 18, 2025 • edited Loading

Choose a reason for hiding this comment

himadripal commented Mar 28, 2025

himadripal commented Mar 28, 2025

himadripal commented Mar 28, 2025 • edited Loading

himadripal commented Mar 28, 2025

kazuyukitanimura left a comment

Choose a reason for hiding this comment

kazuyukitanimura Mar 31, 2025

Choose a reason for hiding this comment

himadripal Apr 1, 2025 • edited Loading

Choose a reason for hiding this comment

kazuyukitanimura Mar 31, 2025

Choose a reason for hiding this comment

himadripal Apr 1, 2025 • edited Loading

Choose a reason for hiding this comment

himadripal Apr 1, 2025 • edited Loading

Choose a reason for hiding this comment

kazuyukitanimura Apr 1, 2025

Choose a reason for hiding this comment

alamb Apr 1, 2025

Choose a reason for hiding this comment

comphead left a comment • edited Loading

Choose a reason for hiding this comment

alamb commented Apr 1, 2025

himadripal commented Apr 1, 2025 • edited Loading

himadripal commented Feb 23, 2025 •

edited by alamb

Loading

himadripal Mar 18, 2025 •

edited

Loading

himadripal Mar 18, 2025 •

edited

Loading

himadripal Mar 18, 2025 •

edited

Loading

himadripal commented Mar 28, 2025 •

edited

Loading

himadripal Apr 1, 2025 •

edited

Loading

himadripal Apr 1, 2025 •

edited

Loading

himadripal Apr 1, 2025 •

edited

Loading

comphead left a comment •

edited

Loading

himadripal commented Apr 1, 2025 •

edited

Loading