Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: check iterator length in from_base_*e methods to prevent consuming large iters #286

Closed
wants to merge 2 commits into from

Conversation

prestwich
Copy link
Collaborator

@prestwich prestwich commented Jul 29, 2023

Draft PR for discussion purposes

max_digits known to be bugged right now. not a high prio to fix unless we're confident this PR will go in

Motivation

closes #279

Currently both from_base_be and from_base_le accept more digits then can fit in the target Uint. This leads to extra iteration/recursion. If all extra digits are 0, there is unbounded extra iteration/recursion, which results in degenerate cases:

// infinite loop
Uint::<4, 1>::from_base_be(10, std::iter::repeat(0u64))

// overflows the stack due to recursion
Uint::<4, 1>::from_base_le(10, std::iter::repeat(0u64))

Solution

Compute max_digits and bound the from_base_*e logic to operating on that many digits. Overflow if more digits are supplied.

However, this is a breaking change, as from_base_be is used in from_str_radix and from_str, and currently permits non-minimal string encodings.

// currently allowed. Broken by this PR
Uint::<16, 1>::from_str("0x00000000000001").unwrap();

Open Question:
Can we both

  • allow non-minimal byte encodings (where all extra bytes are 0)
  • avoid infinite iteration

Possible solutions:

  • use size_hint on the iterators, and error if no upper bound established
  • allow up to N extra 0s in addition to the meaningful digits. Error if more than N

These would still be breaking, but only in much more rare cases. And we would probably be comfortable doing that as a patch

PR Checklist

  • Added Tests
  • Added Documentation
  • Updated the changelog

@prestwich prestwich added enhancement New feature or request help wanted Extra attention is needed question Further information is requested refactor Can be written more cleanly labels Jul 29, 2023
@prestwich prestwich requested a review from gakonst July 29, 2023 23:50
@prestwich prestwich self-assigned this Jul 29, 2023
@prestwich prestwich changed the title feat: check from base len to prevent consuming large iters feat: check iterator length in from_base_*e methods to prevent consuming large iters Jul 29, 2023
@recmo
Copy link
Owner

recmo commented Aug 3, 2023

Currently both from_base_be and from_base_le accept more digits then can fit in the target Uint. This leads to extra iteration/recursion. If all extra digits are 0, there is unbounded extra iteration/recursion, which results in degenerate cases:

I'd consider this intentional. Leading zeros are supported by the base convertor as they make mathematical sense.

If you feed it an infinite string of zeros than I'd expect it to take infinite time. That's IMO not a bug but a consequence of the inversion of control created by Rust's iterators. If you give it an iterator that panics it will panic. If you give it an iterator that launches rockets, it will launch rockets.

We can look into optimizing the leading zero case. Handling them should be O(n) time, O(1) space.

@prestwich
Copy link
Collaborator Author

It makes sense from a mathematical perspective. On the other hand, we can infer that it is never a sane user's intention to trigger this behavior. We also know that after the max digits is reached, any further non-zero digit is guaranteed to cause an overflow. It seems like we can/should use this knowledge to prevent or handle invalid use of the API

@recmo
Copy link
Owner

recmo commented Aug 3, 2023

There are sane usecase where this would be useful, as it's pretty common to store numbers with an excessive amount of leading zeros as a way of padding. For example Solidity's ABI likes to store U160 padded up to U256. It would be nice if we can parse the U160 directly from the whole hex string and have it handle overflow correctly. The alternative would be an intermediate U256 and try_into, or messing with substrings and forgetting to check that the lead is indeed zero.

I'm proposing an alternate implementation of from_base_le that won't run out of memory/stack in #293 .

@prestwich
Copy link
Collaborator Author

The sane usecases are bounded at (generously) 100 extra digits, and there's no sane usecase for infinitely looping. It's never good behavior for a library to hang indefinitely on valid input

Maybe this is a place to use size_hint and error out if no upper bound is established?

@recmo
Copy link
Owner

recmo commented Aug 6, 2023

Can you give me an example of a well-written rust library that puts similar bounds on user provided iterators? For example min from std also hangs:

fn main() {
    println!("{:?}", core::iter::repeat(0_u64).min());
}

And here it doesn't even need to hang, because it could short-circuit as soon as it sees u64::MIN. The functions in this PR have a more valid reason to continue as it can actually affect the outcome.

Generally I don't consider this infinite looping in the library. This is entirely on the user of the API and expectations should be that iterators will be consumed in full unless noted otherwise. This is not Haskell with lazy execution.

I do appreciate making APIs idiot-proof, but would not go as far as limit power users in what they can do.

@prestwich
Copy link
Collaborator Author

closing as there's no pressing need and I don't have a good plan anyway :)

@prestwich prestwich closed this Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed question Further information is requested refactor Can be written more cleanly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OPT:from_base_le recursion depth check
2 participants