Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text is wrapped at small and inconsistent widths when attempting to wrap at a very large width. #416

Closed
olson-sean-k opened this issue Nov 9, 2021 · 7 comments · Fixed by #421
Labels

Comments

@olson-sean-k
Copy link

When wrapping text at a very large width (e.g., usize::MAX) via textwrap::wrap, it is instead wrapped at small and inconsistent widths. This code...

fn main() {
    let text = "Hello there! This is some English text. It should not be wrapped given the extents \
                below.";
    for line in textwrap::wrap(text, usize::MAX) {
        println!("{}", line);
    }
}

...prints this:

Hello there! This is some
English text. It should
not be wrapped given the extents below.

Note that the wrapped width is not consistent, as the word "English" could fit on the first line without exceeding the width of the last line. I would not expect the text to be wrapped here at all, since it never reaches a width of usize::MAX. I'm not sure at what large widths this behavior begins to manifest.

@olson-sean-k
Copy link
Author

olson-sean-k commented Nov 10, 2021

This line in the optimized-fit algorithm is suspicious. It subtracts the target width from the line width where the target width is derived from the width given to textwrap::wrap. IIUC, that's usize::MAX in this case, so underflow occurs. Indeed the bug does not occur when the smawk feature is disabled and the first-fit algorithm is used instead.

I believe that subtraction could underflow for inputs that aren't as pathological as usize::MAX. Perhaps a saturating operation should be used there instead?

@mgeisler
Copy link
Owner

Hi @olson-sean-k, thanks for the test case and the excellent analysis. I have a test cases for wrapping with the maximum line width, but perhaps I only checked the first-fit algorithm... I'll have to check tomorrow.

As for fixing this, I'm not sure exactly how, unfortunately. The algorithm does not lend itself well to saturated addition/subtraction (unfortunately!) and so I've been fighting with corner cases like this one for a while.

Basically, the original algorithm assumed infinite precision math (I ported it from Python) and this breaks down in Rust.

@mgeisler
Copy link
Owner

I found the test mentioned above and while it does use the optimal-fit algorithm, it only tested a very short string:

    #[test]
    fn max_width() {
        assert_eq!(wrap("foo bar", usize::max_value()), vec!["foo bar"]);
    }

That is of course rather unfortunate and something I will dig into.

@olson-sean-k
Copy link
Author

Thanks for taking a look!

Basically, the original algorithm assumed infinite precision math (I ported it from Python) and this breaks down in Rust.

I'm not sure this would be ideal, but maybe the optimized-fit algorithm could use (or fall back to) something like num_bigint::BigUInt for intermediate computations. I believe num-bigint's integer types behave similarly to Python's integers and will grow dynamically as needed.

That may be overkill though and may decrease performance for common use cases that work properly today. If the bounds are known, maybe the algorithm could accept a fixed-sized width (e.g., u64 instead of usize) and use larger integers for intermediate computations (e.g., u128). u128 may use a software implementation on some targets, but I'm guessing that any performance impact would be smaller than using a big integer like BigUint. Since the APIs are already backed by conversions, it should be possible to keep things ergonomic despite using fixed-sized integers.

@mgeisler
Copy link
Owner

Yeah, you're right: we could go all-in and use a big-int crate of some sort... but I feel it's overkill as you suggest.

For "normal" terminal widths, the wrapping works fine from what I can tell, so I would rather limit the size of the input widths. I've been playing with that in the past, but I didn't finish it.

@mgeisler
Copy link
Owner

I've done more testing with a build where line widths are limited to u16 and where the optimal-fit computations are done with u64. The fuzz test seems to no longer crash with that setup!

I will have to clean it up a bit and try to understand the consequences of limiting the line width in this way, especially since I'm also trying to make the library work outside of the console (as the Wasm demo). See also #247 for related discussions.

@mgeisler mgeisler added the bug label Dec 12, 2021
mgeisler added a commit that referenced this issue Jan 2, 2022
The new `wrap_first_fit.rs` and `wrap_optimal_fit.rs` fuzz tests
generate completely random fragments with arbitrary widths. They then
check that the fragments can be wrapped without overflow.

The tests currently fail in less than a second and triggers the
overflows mentioned in #247 and #416.
mgeisler added a commit that referenced this issue Jan 3, 2022
This changes the type used for internal width computations in the wrap
algorithms. Before, we used `usize` to represent the fragment widths
and for the line widths. This could make the optimal-fit wrapping
algorithm overflow when it tries to compute the optimal wrapping cost.
The problem is that the algorithm computes a cost using integer values
formed by

    (line_width - target_width)**2

When `line_width` is near `usize::MAX`, this computation can easily
overflow.

By using an `f64` for the cost computation, we achieve two things:

* A much larger range for the cost computation: `f64::MAX` is about
  1.8e308 whereas `u64::MAX` is only 1.8e19. Computing the cost with a
  fragment width in the range of `u64`, will thus not exceed 3e38,
  something which is easily represented with a `f64`. This means that
  wrapping fragments derived from a `&str` cannot overflow.

  Overflows can still be triggered when fragments with extreme
  proportions are formed directly. The boundary seems to be around
  1e170 with fragment widths above this limit triggering overflows.

* Applications which wrap text using proportional fonts will already
  be operating with widths measured in floating point units. Using
  such units internally makes life easier for such applications, as
  shown by the changes in the Wasm demo.

Fixes #247
Fixes #416
@mgeisler
Copy link
Owner

mgeisler commented Jan 3, 2022

I've been testing this a lot more and I believe the solution in #421 will work well: use f64 internally when computing the wrapping cost. With that change, I can no longer trigger overflows in my tests.

mgeisler added a commit that referenced this issue Jan 9, 2022
The optimization problem solved by the optimal-fit algorithm is
fundamentally a minimization problem. It is therefore not sensible to
allow negative penalties since all penalties are there to discourage
certain features:

* `nline_penalty` discourages breaks with more lines than necessary,

* `overflow_penalty` discourages lines longer than the line width,

* `short_last_line_penalty` discourages short last lines,

* `hyphen_penalty` discourages hyphenation

Making this change surfaces the overflow bug behind #247 and #416.
This will be fixed next via #421 and this commit can be seen as a way
of simplifying that PR.
mgeisler added a commit that referenced this issue Jan 9, 2022
The optimization problem solved by the optimal-fit algorithm is
fundamentally a minimization problem. It is therefore not sensible to
allow negative penalties since all penalties are there to discourage
certain features:

* `nline_penalty` discourages breaks with more lines than necessary,

* `overflow_penalty` discourages lines longer than the line width,

* `short_last_line_penalty` discourages short last lines,

* `hyphen_penalty` discourages hyphenation

Making this change surfaces the overflow bug behind #247 and #416.
This will be fixed next via #421 and this commit can be seen as a way
of simplifying that PR.
mgeisler added a commit that referenced this issue Jan 9, 2022
The optimization problem solved by the optimal-fit algorithm is
fundamentally a minimization problem. It is therefore not sensible to
allow negative penalties since all penalties are there to discourage
certain features:

* `nline_penalty` discourages breaks with more lines than necessary,

* `overflow_penalty` discourages lines longer than the line width,

* `short_last_line_penalty` discourages short last lines,

* `hyphen_penalty` discourages hyphenation

Making this change surfaces the overflow bug behind #247 and #416.
This will be fixed next via #421 and this commit can be seen as a way
of simplifying that PR.
mgeisler added a commit that referenced this issue Jan 9, 2022
The optimization problem solved by the optimal-fit algorithm is
fundamentally a minimization problem. It is therefore not sensible to
allow negative penalties since all penalties are there to discourage
certain features:

* `nline_penalty` discourages breaks with more lines than necessary,

* `overflow_penalty` discourages lines longer than the line width,

* `short_last_line_penalty` discourages short last lines,

* `hyphen_penalty` discourages hyphenation

Making this change surfaces the overflow bug behind #247 and #416.
This will be fixed next via #421 and this commit can be seen as a way
of simplifying that PR.
mgeisler added a commit that referenced this issue Jan 9, 2022
This changes the type used for internal width computations in the wrap
algorithms. Before, we used `usize` to represent the fragment widths
and for the line widths. This could make the optimal-fit wrapping
algorithm overflow when it tries to compute the optimal wrapping cost.
The problem is that the algorithm computes a cost using integer values
formed by

    (line_width - target_width)**2

When `line_width` is near `usize::MAX`, this computation can easily
overflow.

By using an `f64` for the cost computation, we achieve two things:

* A much larger range for the cost computation: `f64::MAX` is about
  1.8e308 whereas `u64::MAX` is only 1.8e19. Computing the cost with a
  fragment width in the range of `u64`, will thus not exceed 3e38,
  something which is easily represented with a `f64`. This means that
  wrapping fragments derived from a `&str` cannot overflow.

  Overflows can still be triggered when fragments with extreme
  proportions are formed directly. The boundary seems to be around
  1e170 with fragment widths above this limit triggering overflows.

* Applications which wrap text using proportional fonts will already
  be operating with widths measured in floating point units. Using
  such units internally makes life easier for such applications, as
  shown by the changes in the Wasm demo.

Fixes #247
Fixes #416
mgeisler added a commit that referenced this issue Jan 9, 2022
This changes the type used for internal width computations in the wrap
algorithms. Before, we used `usize` to represent the fragment widths
and for the line widths. This could make the optimal-fit wrapping
algorithm overflow when it tries to compute the optimal wrapping cost.
The problem is that the algorithm computes a cost using integer values
formed by

    (line_width - target_width)**2

When `line_width` is near `usize::MAX`, this computation can easily
overflow.

By using an `f64` for the cost computation, we achieve two things:

* A much larger range for the cost computation: `f64::MAX` is about
  1.8e308 whereas `u64::MAX` is only 1.8e19. Computing the cost with a
  fragment width in the range of `u64`, will thus not exceed 3e38,
  something which is easily represented with a `f64`. This means that
  wrapping fragments derived from a `&str` cannot overflow.

  Overflows can still be triggered when fragments with extreme
  proportions are formed directly. The boundary seems to be around
  1e170 with fragment widths above this limit triggering overflows.

* Applications which wrap text using proportional fonts will already
  be operating with widths measured in floating point units. Using
  such units internally makes life easier for such applications, as
  shown by the changes in the Wasm demo.

Fixes #247
Fixes #416
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants