Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow Cast: Fixed Point Arithmetic for Interval Parsing #4291

Merged
merged 24 commits into from
Jun 5, 2023

Conversation

mr-brobot
Copy link
Contributor

@mr-brobot mr-brobot commented May 26, 2023

Which issue does this PR close?

Closes #3809

Rationale for this change

See #3809

What changes are included in this PR?

  • Renamed IntervalType to IntervalUnit to disambiguate with interval precision types (e.g., MonthDayNanos, YearMonths)
  • Added IntervalUnit w/ trait implementation for string parsing
  • Added Interval with methods for parsing, adding, and converting to other interval types
  • Added parse_interval_components function for parsing interval strings into amount/unit pairs
  • Added IntervalParseConfig to house future configuration options we may want to add to interval parsing

Are there any user-facing changes?

  • Some slightly different error messages (see changes to tests in cast.rs)

@github-actions github-actions bot added the arrow Changes to the arrow crate label May 26, 2023
@tustvold
Copy link
Contributor

tustvold commented May 30, 2023

Thank you for this, I realise the issue I filed did you a disservice as it was perhaps a touch underspecified.

I think this PR would be dramatically simplified if instead of implementing variable FixedPoint arithmetic as this PR does, instead did something along the lines of

/// Chosen based on the number of decimal digits in 1 week in nanoseconds
const INTERVAL_PRECISION: u32 = 15;

struct IntervalComponent {
    /// The whole interval component
    integer: i64,
    /// The non-integer component multiplied by 10^INTERVAL_PRECISION
    frac: i64,
}

impl FromStr for IntervalComponent {
    type Err = Error;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        Ok(match s.split_once('.') {
            Some((integer, frac)) => {
                if frac.len() > INTERVAL_PRECISION {
                    return Err(..)
                }
                // TODO: Handle sign
                let frac_s: i64 = frac.parse()?;
                let frac = frac_s * 10.pow(INTERVAL_PRECISION - frac.len());
                Self {
                    integer: integer.parse()?,
                    frac,
                }
            }
            None => {
                Self {
                    integer: s.parse()?,
                    frac: 0,
                }
            }
        })
    }
}

#[derive(Default)]
struct Interval {
    months: i32,
    days: i32,
    nanos: i64,
}

impl Interval {
    fn to_year_months(self) -> i32 {
        self.months
    }

    fn to_day_time(self) -> Result<(i32, i32)> {
        let days = self.months.checked_mul(30)?.checked_add(self.days);
        let millis = self.nanos / 1_000_000;
        Ok((days, millis))
    }

    fn to_month_day_nanos(self) -> (i32, i32, i64) {
        (self.months, self.days, self.nanos)
    }

    /// Follow postgres logic
    ///
    /// Fractional parts of units greater than months are rounded to be an integer number of months
    /// Fractional parts of units less than months are computed to be an integer number of days and microseconds
    fn add(&mut self, component: IntervalComponent, unit: IntervalUnit) -> Result<()> {
        match unit {
            IntervalUnit::Second => {
                let nanos = component.integer.checked_mul(1_000_000_000)?;
                let nanos_frac = component.frac / 10.pow(INTERVAL_PRECISION - 9);
                self.nanos = self.nanos.checked_add(nanos)?.checked_add(nanos_frac)?;
            }
            IntervalUnit::Week => {
                self.days = self.days.checked_add(component.integer)?;
                let nanos_frac = component.frac * 7 * 36 / 10.pow(INTERVAL_PRECISION - 7);
                self.nanos = self.nanos.checked_add(nanos_frac)?;

            }
            IntervalUnit::Day => {
                self.days = self.days.checked_add(component.integer)?;
                let nanos_frac = component.frac * 36 / 10.pow(INTERVAL_PRECISION - 7);
                self.nanos = self.nanos.checked_add(nanos_frac)?;
            }
            IntervalUnit::Century => {
                let month_frac = component.frac_digits * 100 * 12 / 10.pow(component.frac_digits);
                let months = component.integer.checked_mul(12)?;
                self.months = self.months.checked_add(months)?.checked_add(month_frac)?;
            }
        }
    }
}

Where the precision is fixed.

This is not only less code but avoids expensive integer division and modulus operations, which are by far the slowest arithmetic operations on modern processors, with LLVM automatically converting division by a constant to a fixed point multiplication. It also allows eliding some overflow checks, as the fractional component must be less than 10^INTERVAL_PRECISION

@mr-brobot
Copy link
Contributor Author

Apologies for the confusion on my part. I really appreciate the clarification and your patience. Will have this PR updated by the weekend.

@tustvold
Copy link
Contributor

tustvold commented Jun 1, 2023

I'm marking this as draft to make it clear that it isn't waiting for a review imminently. Please feel free to mark it ready for review when you'd like me to take another look 😄

@tustvold tustvold marked this pull request as draft June 1, 2023 11:48
@mr-brobot mr-brobot marked this pull request as ready for review June 4, 2023 19:18
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, looks good to me 👍

@alamb
Copy link
Contributor

alamb commented Jun 16, 2023

FWI I believe this PR introduced a regression: #4424 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use Fixed Point Arithmetic in Interval Parsing
3 participants