Text is wrapped at small and inconsistent widths when attempting to wrap at a very large width. #416

olson-sean-k · 2021-11-09T21:39:12Z

When wrapping text at a very large width (e.g., usize::MAX) via textwrap::wrap, it is instead wrapped at small and inconsistent widths. This code...

fn main() {
    let text = "Hello there! This is some English text. It should not be wrapped given the extents \
                below.";
    for line in textwrap::wrap(text, usize::MAX) {
        println!("{}", line);
    }
}

...prints this:

Hello there! This is some
English text. It should
not be wrapped given the extents below.

Note that the wrapped width is not consistent, as the word "English" could fit on the first line without exceeding the width of the last line. I would not expect the text to be wrapped here at all, since it never reaches a width of usize::MAX. I'm not sure at what large widths this behavior begins to manifest.

The text was updated successfully, but these errors were encountered:

olson-sean-k · 2021-11-10T22:23:40Z

This line in the optimized-fit algorithm is suspicious. It subtracts the target width from the line width where the target width is derived from the width given to textwrap::wrap. IIUC, that's usize::MAX in this case, so underflow occurs. Indeed the bug does not occur when the smawk feature is disabled and the first-fit algorithm is used instead.

I believe that subtraction could underflow for inputs that aren't as pathological as usize::MAX. Perhaps a saturating operation should be used there instead?

mgeisler · 2021-11-13T22:54:36Z

Hi @olson-sean-k, thanks for the test case and the excellent analysis. I have a test cases for wrapping with the maximum line width, but perhaps I only checked the first-fit algorithm... I'll have to check tomorrow.

As for fixing this, I'm not sure exactly how, unfortunately. The algorithm does not lend itself well to saturated addition/subtraction (unfortunately!) and so I've been fighting with corner cases like this one for a while.

Basically, the original algorithm assumed infinite precision math (I ported it from Python) and this breaks down in Rust.

mgeisler · 2021-11-14T13:13:18Z

I found the test mentioned above and while it does use the optimal-fit algorithm, it only tested a very short string:

    #[test]
    fn max_width() {
        assert_eq!(wrap("foo bar", usize::max_value()), vec!["foo bar"]);
    }

That is of course rather unfortunate and something I will dig into.

olson-sean-k · 2021-11-20T21:00:17Z

Thanks for taking a look!

Basically, the original algorithm assumed infinite precision math (I ported it from Python) and this breaks down in Rust.

I'm not sure this would be ideal, but maybe the optimized-fit algorithm could use (or fall back to) something like num_bigint::BigUInt for intermediate computations. I believe num-bigint's integer types behave similarly to Python's integers and will grow dynamically as needed.

That may be overkill though and may decrease performance for common use cases that work properly today. If the bounds are known, maybe the algorithm could accept a fixed-sized width (e.g., u64 instead of usize) and use larger integers for intermediate computations (e.g., u128). u128 may use a software implementation on some targets, but I'm guessing that any performance impact would be smaller than using a big integer like BigUint. Since the APIs are already backed by conversions, it should be possible to keep things ergonomic despite using fixed-sized integers.

mgeisler · 2021-11-21T19:28:53Z

Yeah, you're right: we could go all-in and use a big-int crate of some sort... but I feel it's overkill as you suggest.

For "normal" terminal widths, the wrapping works fine from what I can tell, so I would rather limit the size of the input widths. I've been playing with that in the past, but I didn't finish it.

mgeisler · 2021-11-30T22:21:17Z

I've done more testing with a build where line widths are limited to u16 and where the optimal-fit computations are done with u64. The fuzz test seems to no longer crash with that setup!

I will have to clean it up a bit and try to understand the consequences of limiting the line width in this way, especially since I'm also trying to make the library work outside of the console (as the Wasm demo). See also #247 for related discussions.

The new `wrap_first_fit.rs` and `wrap_optimal_fit.rs` fuzz tests generate completely random fragments with arbitrary widths. They then check that the fragments can be wrapped without overflow. The tests currently fail in less than a second and triggers the overflows mentioned in #247 and #416.

This changes the type used for internal width computations in the wrap algorithms. Before, we used `usize` to represent the fragment widths and for the line widths. This could make the optimal-fit wrapping algorithm overflow when it tries to compute the optimal wrapping cost. The problem is that the algorithm computes a cost using integer values formed by (line_width - target_width)**2 When `line_width` is near `usize::MAX`, this computation can easily overflow. By using an `f64` for the cost computation, we achieve two things: * A much larger range for the cost computation: `f64::MAX` is about 1.8e308 whereas `u64::MAX` is only 1.8e19. Computing the cost with a fragment width in the range of `u64`, will thus not exceed 3e38, something which is easily represented with a `f64`. This means that wrapping fragments derived from a `&str` cannot overflow. Overflows can still be triggered when fragments with extreme proportions are formed directly. The boundary seems to be around 1e170 with fragment widths above this limit triggering overflows. * Applications which wrap text using proportional fonts will already be operating with widths measured in floating point units. Using such units internally makes life easier for such applications, as shown by the changes in the Wasm demo. Fixes #247 Fixes #416

mgeisler · 2022-01-03T01:15:03Z

I've been testing this a lot more and I believe the solution in #421 will work well: use f64 internally when computing the wrapping cost. With that change, I can no longer trigger overflows in my tests.

The optimization problem solved by the optimal-fit algorithm is fundamentally a minimization problem. It is therefore not sensible to allow negative penalties since all penalties are there to discourage certain features: * `nline_penalty` discourages breaks with more lines than necessary, * `overflow_penalty` discourages lines longer than the line width, * `short_last_line_penalty` discourages short last lines, * `hyphen_penalty` discourages hyphenation Making this change surfaces the overflow bug behind #247 and #416. This will be fixed next via #421 and this commit can be seen as a way of simplifying that PR.

This changes the type used for internal width computations in the wrap algorithms. Before, we used `usize` to represent the fragment widths and for the line widths. This could make the optimal-fit wrapping algorithm overflow when it tries to compute the optimal wrapping cost. The problem is that the algorithm computes a cost using integer values formed by (line_width - target_width)**2 When `line_width` is near `usize::MAX`, this computation can easily overflow. By using an `f64` for the cost computation, we achieve two things: * A much larger range for the cost computation: `f64::MAX` is about 1.8e308 whereas `u64::MAX` is only 1.8e19. Computing the cost with a fragment width in the range of `u64`, will thus not exceed 3e38, something which is easily represented with a `f64`. This means that wrapping fragments derived from a `&str` cannot overflow. Overflows can still be triggered when fragments with extreme proportions are formed directly. The boundary seems to be around 1e170 with fragment widths above this limit triggering overflows. * Applications which wrap text using proportional fonts will already be operating with widths measured in floating point units. Using such units internally makes life easier for such applications, as shown by the changes in the Wasm demo. Fixes #247 Fixes #416

mgeisler added the bug label Dec 12, 2021

mgeisler mentioned this issue Dec 12, 2021

Use u16 instead of usize for line width #418

Closed

mgeisler mentioned this issue Jan 2, 2022

Add fuzz tests with fully arbitrary fragments #423

Merged

mgeisler mentioned this issue Jan 3, 2022

Use f64 instead of usize for fragment widths #421

Merged

mgeisler mentioned this issue Jan 9, 2022

Change penalties to non-negative numbers #424

Merged

mgeisler closed this as completed in #421 Jan 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text is wrapped at small and inconsistent widths when attempting to wrap at a very large width. #416

Text is wrapped at small and inconsistent widths when attempting to wrap at a very large width. #416

olson-sean-k commented Nov 9, 2021

olson-sean-k commented Nov 10, 2021 •

edited

Loading

mgeisler commented Nov 13, 2021

mgeisler commented Nov 14, 2021

olson-sean-k commented Nov 20, 2021

mgeisler commented Nov 21, 2021

mgeisler commented Nov 30, 2021

mgeisler commented Jan 3, 2022

Text is wrapped at small and inconsistent widths when attempting to wrap at a very large width. #416

Text is wrapped at small and inconsistent widths when attempting to wrap at a very large width. #416

Comments

olson-sean-k commented Nov 9, 2021

olson-sean-k commented Nov 10, 2021 • edited Loading

mgeisler commented Nov 13, 2021

mgeisler commented Nov 14, 2021

olson-sean-k commented Nov 20, 2021

mgeisler commented Nov 21, 2021

mgeisler commented Nov 30, 2021

mgeisler commented Jan 3, 2022

olson-sean-k commented Nov 10, 2021 •

edited

Loading