Skip to content

Conversation

@martin-augment
Copy link
Owner

@martin-augment martin-augment commented Oct 24, 2025

8700: To review by AI

Summary by CodeRabbit

  • Bug Fixes

    • Fixed decimal parsing to correctly handle fractional input when scale is set to zero, ensuring fractional portions are properly discarded instead of affecting the final result.
  • Tests

    • Added validation tests for zero-scale decimal parsing with various input formats.
    • Expanded overflow test coverage with additional edge cases.

@github-actions github-actions bot added the arrow label Oct 24, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 24, 2025

Walkthrough

The parse_decimal function now correctly handles zero-scale decimals by discarding fractional digits from the parsed result through division by 10^fractionals when scale is zero. Updated and new test cases validate this behavior for Decimal128 and Decimal256 types.

Changes

Cohort / File(s) Summary
Parse logic fix
arrow-cast/src/parse.rs
Modified parse_decimal to eject fractional digits when scale is zero by dividing accumulated result by 10^fractionals; cleared fractionals counter and updated digits tracking
Test expansion
arrow-cast/src/parse.rs
Added new overflow test case "12345678900.0" for zero-scale fractional handling; added zero-scale validation tests for Decimal128Type and Decimal256Type verifying fractional portions are discarded when scale=0

Sequence Diagram

sequenceDiagram
    participant Parser as parse_decimal()
    participant Accum as accumulated result
    participant Scale as scale parameter
    
    rect rgb(240, 248, 255)
    Note over Parser: Parse decimal string
    Parser->>Parser: Accumulate digits (including fractional)
    end
    
    alt is_e_notation
        Parser->>Parser: Handle exponential notation
    else not is_e_notation AND scale == 0 AND fractionals > 0
        rect rgb(255, 240, 245)
        Note over Parser: Zero-scale fractional handling
        Parser->>Accum: Divide by 10^fractionals
        Parser->>Parser: Clear fractionals counter
        Parser->>Parser: Update digits tracking
        end
    else default
        Parser->>Parser: Keep fractional precision
    end
    
    Parser->>Parser: Return parsed decimal value
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The change modifies core parsing logic in a single file combined with multiple test additions. Review requires understanding the decimal parsing flow, verifying the fractional digit ejection logic is correct, and confirming test cases adequately cover the new zero-scale behavior across different decimal types.

Poem

🐰 Fractional fragments now take their flight,
When scale is zero, we set things right!
Divide and discard what shouldn't stay,
Zero-scale parsing works today! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description "8700: To review by AI" is extremely minimal and does not follow the required template structure. The description template specifies several required sections including issue closure reference, rationale, summary of changes, testing information, and user-facing changes documentation. The provided description contains none of these elements and reads as a placeholder rather than a substantive pull request description that would help reviewers understand the purpose, scope, and testing of the changes. The PR description should be substantially expanded to follow the repository template. At minimum, add a section closing the relevant issue (e.g., "Closes apache#8700"), provide the rationale for the zero-scale decimal parsing changes, summarize what modifications were made to parse.rs, confirm that the new validation tests cover the fix, and clarify whether there are any user-facing changes or breaking API modifications.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "8700: fix: liberal parsing of zero scale decimals" clearly and specifically describes the main change in the pull request. It references the issue number and captures the core purpose of the fix—addressing how decimals with zero scale are parsed. The title is concise, avoids vague language, and accurately reflects the modifications made to the parse_decimal function and the addition of corresponding validation tests.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-8700-2025-10-24-10-56-02

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Oct 24, 2025

PR Review: Fix Liberal Parsing of Zero Scale Decimals

Summary

This PR adds support for liberal parsing of decimal values with scale=0, allowing fractional digits to be present in the input string (e.g., "123.45") and automatically truncating them to produce the integer value (123).


Code Quality & Best Practices ✅

Strengths:

  • Clean, focused implementation that addresses a specific use case
  • Code follows the existing patterns in the parse_decimal function
  • Good placement of the logic after the main parsing loop but before scaling adjustments
  • Comment clearly explains the intent of the code block

Suggestions:

  1. Variable naming consistency: The variable fractionals is used as a counter (type i8), which works but could be clearer as fractional_digits or similar to match the comment's terminology.

  2. Code organization: The new block at line 966-972 is well-positioned in the control flow, handling the zero-scale case before the general scaling logic.


Potential Bugs & Issues ⚠️

Critical Concerns:

  1. Integer underflow risk (line 970):

    digits -= fractionals as u8;

    If fractionals > digits, this will cause an underflow since both are unsigned. While this scenario seems unlikely given the parsing logic (fractionals are counted as digits are parsed), it's worth validating this invariant.

    Recommendation: Add an assertion or validation:

    debug_assert!(fractionals as u8 <= digits, "fractionals should never exceed digits");
    digits -= fractionals as u8;
  2. Overflow detection: The overflow test at line 2647 ("12345678900.0", 0) expects this to overflow with precision=10, scale=0. After the fix, the number "12345678900.0" becomes 12345678900 (11 digits), which correctly exceeds precision=10. However, the logic needs to ensure that the precision check happens AFTER the digit adjustment. Looking at line 983-986, the check digits > precision happens after your code block, so this should work correctly. ✅

  3. Edge case - all fractional input: What happens with input like "0.123" with scale=0 and precision=1?

    • Initial parsing: result=123, digits=3, fractionals=3
    • After truncation: result=0, digits=0, fractionals=0
    • This appears to work correctly but worth testing explicitly.

Performance Considerations ⚡

Impact: Minimal

  1. Division operation (line 969): The div_wrapping with pow_wrapping is performed only when scale == 0 && fractionals > 0, which is a relatively uncommon case. The performance impact is acceptable.

  2. Power computation: base.pow_wrapping(fractionals as _) computes 10^fractionals. For typical fractional digit counts (1-10), this is fast. For pathological cases with many fractional digits, this could be slower, but:

    • The input would need to pass precision checks first
    • The operation happens at most once per parse
    • No better alternative exists without maintaining a lookup table

Verdict: Performance is appropriate for the use case. ✅


Security Concerns 🔒

Status: No significant concerns identified

  1. Integer overflow: The use of wrapping operations (div_wrapping, pow_wrapping) is consistent with the rest of the codebase. The function relies on precision validation to prevent actual overflows from occurring.

  2. Input validation: The fractional digits are validated during parsing (lines 898-929), so malicious input won't reach this code block without being rejected first.

  3. Type safety: All type conversions use explicit casts and are bounded by the precision parameter (u8), limiting the range of values.

Verdict: Security posture is maintained. ✅


Test Coverage 🧪

Strengths:

  1. Good variety of zero-scale test cases (lines 2765-2779)
  2. Tests cover:
    • Single vs. multiple fractional digits
    • Various integer magnitudes
    • Consistent truncation behavior (not rounding)

Gaps & Suggestions:

  1. Missing negative number tests: All zero_scale_tests use positive numbers. Consider adding:

    ("-1.5", -1),
    ("-123.999", -123),
  2. Missing edge cases:

    • Input "0.0" with scale=0
    • Input "0.999" with scale=0 (should be 0)
    • Very large numbers: "999999999.99" with appropriate precision
    • Input with only fractional part: "0.123" → 0
  3. Overflow test enhancement: The test at line 2647 ("12345678900.0", 0) validates overflow still works correctly. Consider adding a comment explaining that this tests the interaction between the new truncation logic and overflow detection.

  4. Missing Decimal256 tests: The new test only validates Decimal128. Consider testing Decimal256 as well for consistency.

  5. Integration with e-notation: While your code correctly excludes e-notation (line 965 guard), consider adding a test case like:

    // Verify e-notation bypasses zero-scale truncation
    parse_decimal::<Decimal128Type>("1.5e2", 3, 0) // Should be 150, not 1

Additional Observations

  1. Semantic behavior: This implementation truncates rather than rounds (1.9 → 1, not 2). This is consistent with how the parser handles excess fractional digits for non-zero scales (line 919-923), so it's correct. Consider documenting this behavior in the function's doc comment.

  2. Consistency with fractionals < scale case: The existing code at lines 974-982 handles padding when there aren't enough fractional digits. Your code handles the opposite case (too many fractional digits with scale=0). These are symmetric and well-structured. ✅

  3. Alternative approaches: An alternative would be to reject fractional input when scale=0, but liberal parsing (accepting and truncating) is more user-friendly and aligns with the PR's intent.


Recommendations Summary

Must Fix:

  • None (code is functionally correct)

Should Consider:

  1. Add debug assertion for fractionals <= digits invariant
  2. Expand test coverage with negative numbers and additional edge cases
  3. Add Decimal256 tests for consistency

Nice to Have:

  1. Update function documentation to mention truncation behavior for scale=0
  2. Add comment explaining overflow test interaction with new logic
  3. Consider renaming fractionals to fractional_digits for clarity

Overall Assessment

Verdict: APPROVE with suggestions

This is a well-implemented fix that addresses a real use case (liberal parsing of zero-scale decimals). The code is clean, follows existing patterns, and includes reasonable test coverage. The suggestions above are primarily for robustness and documentation rather than correctness issues.

The implementation correctly:

  • Truncates fractional digits for zero-scale decimals
  • Maintains overflow detection
  • Preserves existing behavior for non-zero scales
  • Uses appropriate wrapping operations

Great work overall! 🎉


Review generated by Claude Code

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

if scale == 0 && fractionals > 0 {
// The input string contained some fractional digits after the decimal point despite
// the scale being zero. Eject all the fractional digits from the number.
result = result.div_wrapping(base.pow_wrapping(fractionals as _));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With scale == 0, this builds the entire fractional part into result and only then divides it out. For inputs with many fractional digits this can overflow/wrap the intermediate value (and counters), yielding an incorrect final value even though the post-division digits check passes.

🤖 React with 👍 or 👎 to let us know if the comment was useful.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:delightful; category:bug; feedback:The AI assistant is correct that the wrapping would lead to problems if there are more than 10 fractional digits if base is 32bit integer or more than 20 fractional digits if the base is 64bit integer

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
arrow-cast/src/parse.rs (1)

2765-2780: Expand test coverage for zero-scale decimals.

The tests validate the zero-scale handling logic, but there are two coverage gaps:

  1. Missing Decimal256Type tests: Earlier tests in this function consistently validate both Decimal128Type and Decimal256Type, but zero_scale_tests only tests Decimal128Type.
  2. Missing negative number tests: No validation for negative inputs like "-1.23" or "-123.45" with scale 0.

Consider enhancing the test coverage:

 let zero_scale_tests = [
     ("1.0", 1),
     ("1.2", 1),
     ("1.00", 1),
     ("1.23", 1),
     ("1.000", 1),
     ("1.123", 1),
     ("123.0", 123),
     ("123.4", 123),
     ("123.00", 123),
     ("123.45", 123),
+    ("-1.0", -1),
+    ("-1.23", -1),
+    ("-123.45", -123),
 ];
 for (s, i) in zero_scale_tests {
     let result_128 = parse_decimal::<Decimal128Type>(s, 3, 0).unwrap();
     assert_eq!(i, result_128);
+    let result_256 = parse_decimal::<Decimal256Type>(s, 3, 0).unwrap();
+    assert_eq!(i256::from_i128(i), result_256);
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b3b262 and 25bc45c.

📒 Files selected for processing (1)
  • arrow-cast/src/parse.rs (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: Test
  • GitHub Check: Build wasm32
  • GitHub Check: Test
  • GitHub Check: Check Compilation
  • GitHub Check: PySpark Integration Test (stable)
  • GitHub Check: Verify MSRV (Minimum Supported Rust Version)
  • GitHub Check: Test on Windows
  • GitHub Check: Clippy
  • GitHub Check: MIRI
  • GitHub Check: Test
  • GitHub Check: Test on Mac
  • GitHub Check: Archery test With other arrows
  • GitHub Check: Rustdocs are clean
  • GitHub Check: codex
  • GitHub Check: claude-review

Comment on lines +966 to +972
if scale == 0 && fractionals > 0 {
// The input string contained some fractional digits after the decimal point despite
// the scale being zero. Eject all the fractional digits from the number.
result = result.div_wrapping(base.pow_wrapping(fractionals as _));
digits -= fractionals as u8;
fractionals = 0;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Core logic looks correct for zero-scale decimal handling.

The implementation correctly discards fractional digits by dividing the result by 10^fractionals when scale is zero. The integer division properly truncates toward zero for both positive and negative values.

Please verify that the new behavior works correctly with negative numbers (e.g., "-1.23" with scale 0 should parse to -1). Consider adding test cases for negative numbers in the zero_scale_tests section to ensure comprehensive coverage.


🏁 Script executed:

#!/bin/bash
# Verify zero-scale decimal parsing with negative numbers
cat << 'EOF' | python
from decimal import Decimal

test_cases = [
    ("-1.0", -1),
    ("-1.2", -1),
    ("-1.9", -1),
    ("-123.45", -123),
    ("-0.99", 0),
]

print("Expected behavior for negative zero-scale decimals:")
for s, expected in test_cases:
    d = Decimal(s)
    actual = int(d)
    status = "✓" if actual == expected else "✗"
    print(f"{status} '{s}' -> {actual} (expected {expected})")
EOF

Length of output: 202



---

🏁 Script executed:

```shell
#!/bin/bash
# Check the actual test cases in zero_scale_tests section
cat arrow-cast/src/parse.rs | sed -n '2765,2800p'

Length of output: 1236


🌐 Web query:

Rust div_wrapping integer division truncation toward zero

💡 Result:

Short answer: Rust's integer division (/) truncates toward zero. wrapping_div just computes self / rhs; for unsigned types that's identical to normal division, and for signed types it only "wraps" in the single overflow case MIN / -1 (it returns MIN). [1][2][3]

Examples:

  • 7i32 / 2 == 3 (truncates toward zero). [1]
  • For unsigned: 100u32.wrapping_div(10) == 10 (same as /). [3]
  • For signed edge case: (-128i8).wrapping_div(-1) == -128 (wrap/return MIN instead of overflowing). [2]

Sources:

  • Div behavior (rounds toward zero). [1]
  • i32.wrapping_div documentation (MIN / -1 behavior). [2]
  • u32.wrapping_div documentation (wrapping == normal division for unsigned). [3]

Add test cases for negative numbers and Decimal256Type to zero_scale_tests.

The core logic correctly truncates toward zero for negative numbers (confirmed: Rust's div_wrapping truncates toward zero, matching the expected behavior). However, the test coverage at lines 2765-2780 is incomplete:

  • Missing negative test cases (e.g., "-1.2" should parse to -1, "-123.45" to -123)
  • Only tests Decimal128Type, but the implementation supports both Decimal128Type and Decimal256Type

Expand the test cases to include:

  • Negative numbers: ("-1.0", -1), ("-1.2", -1), ("-123.45", -123), ("-0.99", 0)
  • Both decimal types
🤖 Prompt for AI Agents
In arrow-cast/src/parse.rs around lines 966 and update tests at lines
~2765-2780: add missing negative-number test cases to zero_scale_tests and run
them for both Decimal128Type and Decimal256Type; specifically add cases for
("-1.0", -1), ("-1.2", -1), ("-123.45", -123), and ("-0.99", 0) and ensure the
test harness asserts parsing/truncation toward zero for each decimal type so
both Decimal128Type and Decimal256Type are covered.

@github-actions
Copy link

  • arrow-cast/src/parse.rs:865 – The new zero-scale branch divides by base.pow_wrapping(fractionals) after ingesting the fractional digits. For Decimal128 this overflows once fractionals >= 39 (10^39 doesn’t fit in i128) and, because both the accumulator and divisor have already wrapped, the final quotient becomes a bogus non-zero value instead of the expected truncated integer (e.g. parse_decimal::<Decimal128Type>("0.100000000000000000000000000000000000000", 38, 0) now yields -4). Previously this input raised parse decimal overflow. Please either keep rejecting strings where fractionals exceeds the precision or use checked arithmetic and return an error on overflow so the result cannot silently wrap.
  • arrow-json/src/writer/encoder.rs:301 – make_encoder no longer matches DataType::BinaryView, so attempting to write a BinaryViewArray now falls through to the default case and raises Unsupported data type for JSON encoding: BinaryView. This regresses functionality that existed before (and the accompanying test was deleted). Please restore a BinaryView encoder.
  • arrow-json/src/writer/encoder.rs:698 – MapEncoder::try_new used to allow Utf8View map keys; now it rejects them, again because the test and support code were removed. Existing MapArray values with StringViewArray keys will start erroring with “Only UTF8 keys supported …”. Please keep accepting DataType::Utf8View keys (and reinstate the test).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants