feat: check iterator length in `from_base_*e` methods to prevent consuming large iters #286

prestwich · 2023-07-29T23:50:14Z

Draft PR for discussion purposes

max_digits known to be bugged right now. not a high prio to fix unless we're confident this PR will go in

Motivation

closes #279

Currently both from_base_be and from_base_le accept more digits then can fit in the target Uint. This leads to extra iteration/recursion. If all extra digits are 0, there is unbounded extra iteration/recursion, which results in degenerate cases:

// infinite loop
Uint::<4, 1>::from_base_be(10, std::iter::repeat(0u64))

// overflows the stack due to recursion
Uint::<4, 1>::from_base_le(10, std::iter::repeat(0u64))

Solution

Compute max_digits and bound the from_base_*e logic to operating on that many digits. Overflow if more digits are supplied.

However, this is a breaking change, as from_base_be is used in from_str_radix and from_str, and currently permits non-minimal string encodings.

// currently allowed. Broken by this PR
Uint::<16, 1>::from_str("0x00000000000001").unwrap();

Open Question:
Can we both

allow non-minimal byte encodings (where all extra bytes are 0)
avoid infinite iteration

Possible solutions:

use size_hint on the iterators, and error if no upper bound established
allow up to N extra 0s in addition to the meaningful digits. Error if more than N

These would still be breaking, but only in much more rare cases. And we would probably be comfortable doing that as a patch

PR Checklist

Added Tests
Added Documentation
Updated the changelog

recmo · 2023-08-03T09:48:46Z

Currently both from_base_be and from_base_le accept more digits then can fit in the target Uint. This leads to extra iteration/recursion. If all extra digits are 0, there is unbounded extra iteration/recursion, which results in degenerate cases:

I'd consider this intentional. Leading zeros are supported by the base convertor as they make mathematical sense.

If you feed it an infinite string of zeros than I'd expect it to take infinite time. That's IMO not a bug but a consequence of the inversion of control created by Rust's iterators. If you give it an iterator that panics it will panic. If you give it an iterator that launches rockets, it will launch rockets.

We can look into optimizing the leading zero case. Handling them should be O(n) time, O(1) space.

prestwich · 2023-08-03T14:58:37Z

It makes sense from a mathematical perspective. On the other hand, we can infer that it is never a sane user's intention to trigger this behavior. We also know that after the max digits is reached, any further non-zero digit is guaranteed to cause an overflow. It seems like we can/should use this knowledge to prevent or handle invalid use of the API

recmo · 2023-08-03T17:07:08Z

There are sane usecase where this would be useful, as it's pretty common to store numbers with an excessive amount of leading zeros as a way of padding. For example Solidity's ABI likes to store U160 padded up to U256. It would be nice if we can parse the U160 directly from the whole hex string and have it handle overflow correctly. The alternative would be an intermediate U256 and try_into, or messing with substrings and forgetting to check that the lead is indeed zero.

I'm proposing an alternate implementation of from_base_le that won't run out of memory/stack in #293 .

prestwich · 2023-08-05T18:57:34Z

The sane usecases are bounded at (generously) 100 extra digits, and there's no sane usecase for infinitely looping. It's never good behavior for a library to hang indefinitely on valid input

Maybe this is a place to use size_hint and error out if no upper bound is established?

recmo · 2023-08-06T16:16:47Z

Can you give me an example of a well-written rust library that puts similar bounds on user provided iterators? For example min from std also hangs:

fn main() {
    println!("{:?}", core::iter::repeat(0_u64).min());
}

And here it doesn't even need to hang, because it could short-circuit as soon as it sees u64::MIN. The functions in this PR have a more valid reason to continue as it can actually affect the outcome.

Generally I don't consider this infinite looping in the library. This is entirely on the user of the API and expectations should be that iterators will be consumed in full unless noted otherwise. This is not Haskell with lazy execution.

I do appreciate making APIs idiot-proof, but would not go as far as limit power users in what they can do.

prestwich · 2023-08-17T14:19:05Z

closing as there's no pressing need and I don't have a good plan anyway :)

feat: check from base len to prevent consuming large iters

26f8a87

prestwich added enhancement New feature or request help wanted Extra attention is needed question Further information is requested refactor Can be written more cleanly labels Jul 29, 2023

prestwich requested a review from gakonst July 29, 2023 23:50

prestwich self-assigned this Jul 29, 2023

prestwich changed the title ~~feat: check from base len to prevent consuming large iters~~ feat: check iterator length in from_base_*e methods to prevent consuming large iters Jul 29, 2023

doc: better commenting

6a2e2dc

prestwich closed this Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: check iterator length in `from_base_*e` methods to prevent consuming large iters #286

feat: check iterator length in `from_base_*e` methods to prevent consuming large iters #286

prestwich commented Jul 29, 2023 •

edited

Loading

recmo commented Aug 3, 2023 •

edited

Loading

prestwich commented Aug 3, 2023

recmo commented Aug 3, 2023 •

edited

Loading

prestwich commented Aug 5, 2023

recmo commented Aug 6, 2023 •

edited

Loading

prestwich commented Aug 17, 2023

feat: check iterator length in from_base_*e methods to prevent consuming large iters #286

feat: check iterator length in from_base_*e methods to prevent consuming large iters #286

Conversation

prestwich commented Jul 29, 2023 • edited Loading

Motivation

Solution

PR Checklist

recmo commented Aug 3, 2023 • edited Loading

prestwich commented Aug 3, 2023

recmo commented Aug 3, 2023 • edited Loading

prestwich commented Aug 5, 2023

recmo commented Aug 6, 2023 • edited Loading

prestwich commented Aug 17, 2023

feat: check iterator length in `from_base_*e` methods to prevent consuming large iters #286

feat: check iterator length in `from_base_*e` methods to prevent consuming large iters #286

prestwich commented Jul 29, 2023 •

edited

Loading

recmo commented Aug 3, 2023 •

edited

Loading

recmo commented Aug 3, 2023 •

edited

Loading

recmo commented Aug 6, 2023 •

edited

Loading