Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: scientific notation in BigDecimal::to_string() #135

Open
findepi opened this issue Sep 25, 2024 · 7 comments
Open

Regression: scientific notation in BigDecimal::to_string() #135

findepi opened this issue Sep 25, 2024 · 7 comments

Comments

@findepi
Copy link

findepi commented Sep 25, 2024

to_scientific_notation() is supposed to produce scientific notation, to_string() used to (and IMO should continue) to produce decimal form without scientific notation.

The following test

#[test]
fn test() {
    let big_decimal = BigDecimal::from_str("0.00000000000000000000000000000000000001").unwrap()
        .normalized();
    assert_eq!(big_decimal.to_string(), "0.00000000000000000000000000000000000001");
}

Used to pass with v 0.4.2 and no longer passes with 0.4.3:

assertion `left == right` failed
  left: "1E-38"
 right: "0.00000000000000000000000000000000000001"
@akubera
Copy link
Owner

akubera commented Sep 26, 2024

Always using full decimal representation in the default formatter is actually not what users typically want, and can lead to attacks on memory (e.g. parsing and printing the short string "1e-1000000000" would be amplified to 1GB worth of zeros, and increases exponentially per exponential digit).

So there has to be a threshold somewhere.

The default is 5 leading zeros, which is the same as Python's decimal.Decimal:

>>> from decimal import Decimal
>>> Decimal("0.000001")  # five leading zeros is decimal formatted
Decimal('0.000001')
>>> Decimal("0.0000001")  # six+ leading zeros switches to scientific notation
Decimal('1E-7')
>>> Decimal("0.00000000000000000000000000000000000001")
Decimal('1E-38')

But you can set this at compile time with environment variables

# fractional leading zeros (default 5)
RUST_BIGDECIMAL_FMT_EXPONENTIAL_LOWER_THRESHOLD=10000

# integer trailing zeros (default 15)
RUST_BIGDECIMAL_FMT_EXPONENTIAL_UPPER_THRESHOLD=10000

@findepi
Copy link
Author

findepi commented Sep 26, 2024

Thank you @akubera for quick answer! I didn't think about DOS attack vector.
My use-case is in the SQL context where the value is known to be have at most 38 digits, so I am not concerned about to_string blowing up memory.

Thanks for the tip about the env variable. I am, however, worried a bit what happens if the library is used in my code directly and also as a transitive dependency, and portion of the code that I do not control becomes dependent on the out-of-the-box behavior. Is there an option to restore original exact to_string behavior?

FWIW Java behavior is following

var tiny = new java.math.BigDecimal("0.00000000000000000000000000000000000001");
System.out.println(tiny.toString());
System.out.println(tiny.toPlainString());

prints

1E-38
0.00000000000000000000000000000000000001

@akubera
Copy link
Owner

akubera commented Sep 26, 2024

I could expose the fn format_full_scale() function as the methods:

fn to_plain_string(&self) -> String
fn write_plain_string<W: fmt::Write>(&self, wtr: &mut W) -> fmt::Result;

if the name "plain_string" makes sense (which is fine by me, alternative off the top of my head would be to_decimal_string()).

Python doesn't look like it has an equivalent, but numpy format_float_positional (and format_float_scientific).
Ruby has (deprecated) method to_digits (which I don't like; they're all "digits", right?)


I'm not sure how the transitive dependencies work, but I have a feeling there's no way to "pass" an environment variable from one dependency to another. The top level library would need to know to set the RUST_DECIMAL_* vars.

One option would be add a feature to disable the safe formatting:

bigdecimal  = { ... , features = [ "full_decimal_format" ] }   // or scarier name like "unprotected_decimal_format"

But even then I'm not sure how transitive dependencies work: if the top level project includes bigdecimal without the feature, but a dependency does, would it be compiled twice and the numbers are formatted differently depending on their source?

I've been told the environment variables are bad for reproducible builds, with transitive dependencies, but I'm not sure what my options are.

@akubera
Copy link
Owner

akubera commented Sep 26, 2024

🤔

Maybe you could put a check in your build.rs which requires the environment variable to be present and >=38 to correctly build?

@akubera
Copy link
Owner

akubera commented Sep 26, 2024

That's a big hoop for users to jump through.

Maybe a new method where you explicitly set the leading/leading zeros?
Are you SURE there's no potential for attack?

Something like:

use bigdecimal::{BigDecimal, BigDecimalRef};

struct MyBigDecimal {
   value: BigDecimal,
}

impl Display for MyBigDecimal {
   fn fmt(&self, f: ...) -> ... {
       let max_leading_zeros = 38;  // or 100 or 1000 or whatever.
       let max_trailing_zeros = 100;

       // new method overriding to_string()
       self.value.fmt_plain_decimal(f, max_leading_zeros, max_trailing_zeros);
   }
}

// This should make casting your objects to a form that most functions 
// in bigdecimal accept, if you're doing any math.
impl<'a> From<&'a MyBigDecimal> for BigDecimalRef<'a> {
  fn from(d: &MyBigDecimal) -> Self {
    d.value.to_ref()
  }
}

@findepi
Copy link
Author

findepi commented Sep 27, 2024

I could expose the fn format_full_scale() function as the methods:

fn to_plain_string(&self) -> String
fn write_plain_string<W: fmt::Write>(&self, wtr: &mut W) -> fmt::Result;

That would be awesome, thank you!

@akubera
Copy link
Owner

akubera commented Sep 29, 2024

I should mention I’m not able to work on it this weekend. Maybe next weekend or the following. Don't think I have forgotten.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants