Replace go-runewidth with uniseg #71

mikelorant · 2024-01-26T10:32:17Z

Replace the use of RuneWidth and StringWidth from mattn/go- runewidth with equivalent functions from rivo/uniseg.

It is important to be aware that using RuneWidth will not be accurate as the width of a rune cannot be determined in isolation. This requires a shift to thinking about grapheme clusters instead.

Unfortunately due to the complexity of identifying grapheme clusters, there has been some signifcant performance regressions in two functions:

PrintableRuneWidth: 10x slower
TruncateString: 4x slower

Two other functions have had performance improvements:

MarginString: 2x faster
PaddingString: 2x faster

The documentation for rivo/uniseg mentions the use of Step and StepString performing "orders of magnitude faster" than using the NewGraphemes method. However, implementing these changes only resulted in a 10% performance increase.

mikelorant · 2024-01-26T10:43:41Z

@rivo Would you be able to help provide some guidance on how to boost the performance of these two regressed functions?

I switched the truncate package to Step as follows:

diff --git a/truncate/truncate.go b/truncate/truncate.go
index 6905a6c..4a4558f 100644
--- a/truncate/truncate.go
+++ b/truncate/truncate.go
@@ -79,20 +79,23 @@ func (w *Writer) Write(b []byte) (int, error) {
    w.width -= uint(tw)
    var curWidth uint

-   gr := uniseg.NewGraphemes(string(b))
-   for gr.Next() {
-       rs := gr.Runes()
+   by := b
+   state := -1
+   var c []byte
+   for len(by) > 0 {
+       var boundaries int
+       c, by, boundaries, state = uniseg.Step(by, state)

        switch {
-       case len(rs) == 1 && rs[0] == ansi.Marker:
+       case len(c) == 1 && rune(c[0]) == ansi.Marker:
            // ANSI escape sequence
            w.ansi = true
-       case len(rs) == 1 && w.ansi && ansi.IsTerminator(rs[0]):
+       case len(c) == 1 && w.ansi && ansi.IsTerminator(rune(c[0])):
            // ANSI sequence terminated
            w.ansi = false
        case w.ansi:
        default:
-           curWidth += uint(gr.Width())
+           curWidth += uint(boundaries>>uniseg.ShiftWidth)
        }

        if curWidth > w.width {
@@ -103,7 +106,7 @@ func (w *Writer) Write(b []byte) (int, error) {
            return n, err
        }

-       _, err := w.ansiWriter.Write([]byte(gr.Str()))
+       _, err := w.ansiWriter.Write(c)
        if err != nil {
            return 0, err
        }

Is this a mistake in my refactor or is it a case that identifying grapheme clusters just requires more CPU cycles?

rivo · 2024-01-27T13:06:56Z

Would you be able to help provide some guidance on how to boost the performance of these two regressed functions?

(I only see one function here, Write.)

I'm not quite sure what you're asking. Your patch looks like you switched from using the Graphemes class to the Step function. In the past, the benchmarks suggested that this would vastly improve performance. Until someone pointed out a bug in the benchmark... Now they are about the same. (I will adjust the documentation.)

Under the hood, the Graphemes class uses the Step function (actually the StepString function) and I was under the impression that the overhead of the Graphemes class will result in a performance penalty. That does not seem to be the case, however.

So using Step is not significantly slower than Graphemes, it's almost the same. I'm surprised that you're seeing major differences here. Maybe you can post the original code and the new code, both in a way I can run them without any other dependencies. Then I can check what the issue is.

Anyway, Step doesn't just calculate graphemes and widths, it also calculates word boundaries, sentence boundaries, and line breaks. It seems to me that you're not interested in all of that. I would suggest that you use FirstGraphemeCluster instead. This will only determine grapheme clusters. And it will produce a width a the same time. The benchmarks show that this is about three times faster than using Step.

It should be quite simple to replace Step with FirstGraphemeCluster. Here is an example. FirstGraphemeClusterInString also works fine, if that's more convenient for you. (It's also slightly faster than FirstGraphemeCluster.)

I haven't run comparison benchmarks with go-runewidth's functions so I can't comment on that. From a quick glance, they also have to look up properties in tables (also using binary search). The overhead in uniseg shouldn't be so big.

But again, if you're really just interested in the width and don't care about other types of boundaries, don't use the Graphemes class or the Step function, use FirstGraphemeCluster or FirstGraphemeClusterInString if you need to manually step through a string.

rivo · 2024-01-27T22:26:47Z

See also my follow-up comment here: charmbracelet/bubbletea#831 (comment)

mikelorant · 2024-01-28T00:34:37Z

@rivo As usual, thanks again for the very detailed explanation as well as recommendations.

A quick clarification; while I only provided the patch for a refactor of the Write function, I was focused on the benchmarks for PrintableRuneWidth and TruncateString. These were the two functions that were slower. TruncateString was relying on the Write function and that seemed easiest to switch over to using StepString.

Benchmarks

I have taken the opportunity to rerun all the benchmarks in this package to determine how the changes have impacted them.

master

Benchmark_PrintableRuneWidth-4     47716898          26.84 ns/op         0 B/op         0 allocs/op
BenchmarkWriter_Write-4             3094910          331.6 ns/op        48 B/op         1 allocs/op
BenchmarkDedent-4                   8797719          157.9 ns/op        87 B/op         1 allocs/op
BenchmarkIndentString-4             4818718          331.9 ns/op       295 B/op         4 allocs/op
BenchmarkMarginString-4              601867           2054 ns/op      1563 B/op        35 allocs/op
BenchmarkPaddingString-4             750488           1339 ns/op      1007 B/op        23 allocs/op
BenchmarkTruncateString-4           6223754          247.8 ns/op       291 B/op         3 allocs/op

refactor/go-runewidth-to-uniseg

Benchmark_PrintableRuneWidth-4      4120680         264.7 ns/op        23 B/op         1 allocs/op
BenchmarkWriter_Write-4             4554146         291.1 ns/op        48 B/op         1 allocs/op
BenchmarkDedent-4                   8991362         143.2 ns/op        87 B/op         1 allocs/op
BenchmarkIndentString-4             5173473         238.7 ns/op       295 B/op         4 allocs/op
BenchmarkMarginString-4             1192938         923.9 ns/op       763 B/op        11 allocs/op
BenchmarkPaddingString-4            2209376         550.5 ns/op       406 B/op         5 allocs/op
BenchmarkTruncateString-4           1531975         790.8 ns/op       295 B/op         4 allocs/op

refactor/go-runewidth-to-uniseg-go-1.21-uniseg-0.4.6

Benchmark_PrintableRuneWidth-4      6293344         174.3 ns/op        23 B/op         1 allocs/op
BenchmarkWriter_Write-4             4174240         282.5 ns/op        48 B/op         1 allocs/op
BenchmarkDedent-4                   8070469         163.0 ns/op        87 B/op         1 allocs/op
BenchmarkIndentString-4             4410526         255.4 ns/op       295 B/op         4 allocs/op
BenchmarkMarginString-4             1293753         838.6 ns/op       763 B/op        11 allocs/op
BenchmarkPaddingString-4            2722042         438.5 ns/op       407 B/op         5 allocs/op
BenchmarkTruncateString-4           2388880         500.8 ns/op       295 B/op         4 allocs/op

Performance

Comparing the results from before the switch to uniseg to after with the most recent performance improvements.

master -> refactor/go-runewidth-to-uniseg-go-1.21-uniseg-0.4.6

Benchmark	Speed
PrintableRuneWidth	6.5x slower
Writer_Write	15% faster
Dedent	Same
IndentString	25% faster
MarginString	60% faster
PaddingString	65% faster
TruncateString	2x slower

Considering that PrintableRuneWidth is a poorly named function, we should update the documentation to clearly state the problems and why performance has regressed. A new function should then be added called either Width or PrintableWidth that does the same thing.

The only major concern is the performance issues with TruncateString. I plan to look into this further and see what might be improvable.

mikelorant · 2024-01-29T07:01:00Z

@rivo Here are the updated benchmarks for the truncate function.

BenchmarkTruncateString-4-NewGraphemes          2514505         499.0 ns/op       279 B/op         3 allocs/op
BenchmarkTruncateString-4-FirstGraphemeCluster  5609870         211.8 ns/op       275 B/op         2 allocs/op

This has improved the truncate function to now perform faster than the original version 😄

mikelorant · 2024-01-29T09:07:42Z

Benchmarks

Before

Benchmark_PrintableRuneWidth-4     51635997          23.67 ns/op         0 B/op         0 allocs/op
BenchmarkWriter_Write-4             4415070          264.4 ns/op        48 B/op         1 allocs/op
BenchmarkDedent-4                   8735187          141.7 ns/op        87 B/op         1 allocs/op
BenchmarkIndentString-4             5119086          249.7 ns/op       295 B/op         4 allocs/op
BenchmarkMarginString-4              612964           1645 ns/op      1563 B/op        35 allocs/op
BenchmarkPaddingString-4            1045546           1150 ns/op      1007 B/op        23 allocs/op
BenchmarkTruncateString-4           6143715          188.8 ns/op       291 B/op         3 allocs/op

After

Benchmark_PrintableRuneWidth-4      9631588          122.1 ns/op        95 B/op         0 allocs/op
BenchmarkWriter_Write-4             4528722          260.0 ns/op        48 B/op         1 allocs/op
BenchmarkDedent-4                   8445904          139.2 ns/op        87 B/op         1 allocs/op
BenchmarkIndentString-4             5190386          244.7 ns/op       295 B/op         4 allocs/op
BenchmarkMarginString-4             1612345          746.7 ns/op       763 B/op        11 allocs/op
BenchmarkPaddingString-4            2926327          444.6 ns/op       407 B/op         5 allocs/op
BenchmarkTruncateString-4           5670631          210.9 ns/op       275 B/op         2 allocs/op

Performance

Benchmark	Speed
PrintableRuneWidth	400% slower
Writer_Write	Same
Dedent	Same
IndentString	Same
MarginString	50% faster
PaddingString	60% faster
TruncateString	10% faster

Outcome

All functions except PrintableRuneWidth have either stayed the same or increased in performance. PrintableRuneWidth now provides accurate results whereas before it was incorrectly calculating the width. This is a reasonable compromise and is just something that we need to make clear to the users.

All unit tests are passing which helps confirm we are getting the same results. Obviously, some emojis will provide a different result, which is to be expected and the driver for this work.

Thanks

A special thanks to @rivo for taking so much of his time for explaining and helping with this refactor. I doubt this pull request would have gotten to this state without him. Improving the performance of uniseg was well above and beyond expectations. You the real ⭐ in this work.

Nest Steps

I am hoping @muesli can help with reviewing this pull request and providing guidance on any changes necessary. I do think some mention of these changes needs to go into the documentation and some thought should be considered to bumping the major version to indicate there are breaking changes.

muesli · 2024-01-29T12:13:42Z

We'll have to bump the required Go version to 1.18.

Replace the use of `RuneWidth` and `StringWidth` from `mattn/go- runewidth` with equivalent functions from `rivo/uniseg`. It is important to be aware that using `RuneWidth` will not be accurate as the width of a rune cannot be determined in isolation. This requires a shift to thinking about grapheme clusters instead. Unfortunately due to the complexity of identifying grapheme clusters, there has been some signifcant performance regressions in two functions: - PrintableRuneWidth: 10x slower - TruncateString: 4x slower Two other functions have had performance improvements: - MarginString: 2x faster - PaddingString: 2x faster The documentation for `rivo/uniseg` mentions the use of `Step` and `StepString` performing "orders of magnitude faster" than using the `NewGraphemes` method. However, implementing these changes only resulted in a 10% performance increase. Signed-off-by: Michael Lorant <michael.lorant@nine.com.au>

mikelorant · 2024-01-29T22:16:11Z

Bumped go.mod to 1.18 as well as updated the GitHub workflow to use the correct version when building.

mikelorant · 2024-01-30T08:48:36Z

@muesli Would be great to get your sign off on this work. This is going to have some pretty big performance impacts on many of the bubbles components. I'd like get started on the code changes for Lipgloss and Bubbles.

mikelorant · 2024-02-02T02:02:41Z

@muesli Any chance you can take a look at this. Without this going to master branch I cant begin the changes to switch Lipgloss and Bubbles over to it.

To drive home the point how much of a benefit this will be, we will likely see 10x performance in Bubbles components especially textarea which is very reliant on this package.

meowgorithm · 2024-02-27T02:29:48Z

@mikelorant just a heads up that I talked to muesli and he's been looking into this. so far so good, from what I hear.

mikelorant · 2024-02-27T02:43:17Z

All good, this one should take a while because this is going to have a huge ripple effect as it updated in the other Charm tools.

We likely need a test repository to verify the suite of tools works as expected. But we also need a reference terminal to go with it (crossing fingers Ghostty is near perfect for rendering emojis).

Personally, I believe if this is accepted, it may be worth using semantic versions here to indicate breaking changes. That would make it very clear and require applications depending on it to specifically use it. Risk factor would drop to zero as developers are aware that something major has changed.

This implements an ANSI and wide-characters aware truncation algorithm that uses the newly merged [ANSI parser state machine][statemachine] and the fantastic library uniseg. Related: muesli/reflow#71 [statemachine]: https://github.com/charmbracelet/x/blob/main/exp/term/ansi/parser/transition_table.go

This implements an ANSI and wide-characters aware truncation algorithm that uses the newly merged [ANSI parser state machine][statemachine] and the fantastic library uniseg. Since this is using the ANSI state machine, it's compatible with `CSI m` (SGR) style sequence, `OSC 8` (hyperlinks), and basically any other escape sequence supported in the state machine (DCS, ESC, SOS, PM, APC). Related: muesli/reflow#71 [statemachine]: https://github.com/charmbracelet/x/blob/main/exp/term/ansi/parser/transition_table.go

Use ANSI aware, wide characters support, uniseg backed term/ansi package to calculate string widths, truncate, and wrap strings. Related: muesli/reflow#71 Fixes: #258 Fixes: #220

* feat: switch to term/ansi for text manipulation Use ANSI aware, wide characters support, uniseg backed term/ansi package to calculate string widths, truncate, and wrap strings. Related: muesli/reflow#71 Fixes: #258 Fixes: #220 * Update get.go

* feat: switch to term/ansi for text manipulation Use ANSI aware, wide characters support, uniseg backed term/ansi package to calculate string widths, truncate, and wrap strings. Related: muesli/reflow#71 Fixes: #258 Fixes: #220 * fix: combining both conditional and unconditional wrapping Uses `ansi.SmartWrap` charmbracelet/x#57 Fixes: muesli/reflow#43 * chore: update deps * Update get.go

Use ANSI aware, wide characters support, uniseg backed term/ansi package to calculate string widths, truncate, and wrap strings. Related: muesli/reflow#71 Fixes: #258 Fixes: #220

mikelorant · 2024-05-31T03:50:03Z

Withdrawing this pull request as this issue has been resolved for the Charm suite of tools by the creation of the new ansi package. This package has already replaced most references to reflow in Bubble Tea, Lip Gloss and Bubbles.

mikelorant mentioned this pull request Jan 26, 2024

perf: use uniseg.StringWidth #69

Closed

mikelorant force-pushed the refactor/go-runewidth-to-uniseg branch from 2ec4ca0 to e59a5fa Compare January 26, 2024 10:37

mikelorant force-pushed the refactor/go-runewidth-to-uniseg branch from e59a5fa to 053c46d Compare January 26, 2024 11:22

meowgorithm mentioned this pull request Jan 26, 2024

fix: Align lines with emojis correctly charmbracelet/lipgloss#163

Closed

mikelorant mentioned this pull request Jan 27, 2024

Replace go-runewidth with uniseg #73

Closed

mikelorant force-pushed the refactor/go-runewidth-to-uniseg branch 3 times, most recently from 484ab73 to fa505b9 Compare January 28, 2024 07:27

mikelorant force-pushed the refactor/go-runewidth-to-uniseg branch 2 times, most recently from a5a63a6 to 26d85b8 Compare January 29, 2024 08:39

mikelorant mentioned this pull request Jan 29, 2024

Textarea is slow when pasting things from the clipboard charmbracelet/bubbletea#831

Closed

mikelorant force-pushed the refactor/go-runewidth-to-uniseg branch from 26d85b8 to 7edce3e Compare January 29, 2024 22:15

mikelorant mentioned this pull request Feb 13, 2024

"nonspacing mark" unicode causes container's width to shift charmbracelet/lipgloss#258

Closed

meowgorithm mentioned this pull request Mar 8, 2024

feat(textarea) Add multiline placeholder charmbracelet/bubbles#302

Merged

aymanbagabas mentioned this pull request Mar 13, 2024

feat(term): ansi: implement ANSI aware truncation charmbracelet/x#50

Merged

aymanbagabas mentioned this pull request Mar 14, 2024

feat(term): ansi: implement wrap and wordwrap charmbracelet/x#51

Merged

aymanbagabas mentioned this pull request Mar 14, 2024

feat: switch to term/ansi for text manipulation charmbracelet/lipgloss#268

Merged

mikelorant closed this May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace go-runewidth with uniseg #71

Replace go-runewidth with uniseg #71

mikelorant commented Jan 26, 2024

mikelorant commented Jan 26, 2024

rivo commented Jan 27, 2024

rivo commented Jan 27, 2024

mikelorant commented Jan 28, 2024

mikelorant commented Jan 29, 2024

mikelorant commented Jan 29, 2024

muesli commented Jan 29, 2024

mikelorant commented Jan 29, 2024

mikelorant commented Jan 30, 2024

mikelorant commented Feb 2, 2024

meowgorithm commented Feb 27, 2024

mikelorant commented Feb 27, 2024

mikelorant commented May 31, 2024

Replace go-runewidth with uniseg #71

Replace go-runewidth with uniseg #71

Conversation

mikelorant commented Jan 26, 2024

mikelorant commented Jan 26, 2024

rivo commented Jan 27, 2024

rivo commented Jan 27, 2024

mikelorant commented Jan 28, 2024

Benchmarks

Performance

mikelorant commented Jan 29, 2024

mikelorant commented Jan 29, 2024

Benchmarks

Performance

Outcome

Thanks

Nest Steps

muesli commented Jan 29, 2024

mikelorant commented Jan 29, 2024

mikelorant commented Jan 30, 2024

mikelorant commented Feb 2, 2024

meowgorithm commented Feb 27, 2024

mikelorant commented Feb 27, 2024

mikelorant commented May 31, 2024