Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utf8 input #1904

Merged
merged 24 commits into from
May 12, 2019
Merged

Utf8 input #1904

merged 24 commits into from
May 12, 2019

Conversation

jerch
Copy link
Member

@jerch jerch commented Jan 20, 2019

This PR adds UTF8 input to the terminal.

API is as simple as this:

Terminal.writeUtf8(data: Uint8Array);

The typed array can be picked directly from the websocket (set binaryType to 'arraybuffer') or from node-pty for local apps. "Binary strings" are not supported as kinda all tooling in JS world expects buffers for raw UTF8 data.

To actually see this in action with the demo several changes are needed:

PR relies on #1878, changes start at commit 6bdd380.

Edit:
Note that the UTF8 input currently does no flow control, thus can easily been flooded and "stall" the terminal. Fixed.

@jerch
Copy link
Member Author

jerch commented Jan 20, 2019

The last commit (e6e5ecc) applies the changes to the demo. This will most likely not be part of this PR.

Here are some numbers:

  • without UTF8
   Context "lib/xterm_perfcases/input.js"
      Context "Terminal: ls -lR /usr/lib"
         Case "#1" : 1 - runtime: 2063.41 ms
         Case "#1" : 1 - throughput: 23.36 MB/s
         Case "#1" : 2 - runtime: 1884.72 ms
         Case "#1" : 2 - throughput: 25.57 MB/s
         Case "#1" : 3 - runtime: 1853.08 ms
         Case "#1" : 3 - throughput: 26.01 MB/s
         Case "#1" : 3 runs - average runtime: 1933.73 ms
         Case "#1" : 3 runs - average throughput: 24.98 MB/s
  • with UTF8
   Context "lib/xterm_perfcases/input_utf8.js"
      Context "Terminal: ls -lR /usr/lib"
         Context "input throughtput"
            Case "#1" : 1 - runtime: 1443.94 ms
            Case "#1" : 1 - throughput: 33.41 MB/s
            Case "#1" : 2 - runtime: 1301.88 ms
            Case "#1" : 2 - throughput: 37.05 MB/s
            Case "#1" : 3 - runtime: 1279.51 ms
            Case "#1" : 3 - throughput: 37.70 MB/s
            Case "#1" : 3 runs - average runtime: 1341.78 ms
            Case "#1" : 3 runs - average throughput: 36.06 MB/s

To make sense of those numbers: The test measures the time taken for 50MB of ls -lR /usr/lib data to be written to an offscreen terminal object, means this includes all input parsing and storing in the terminal buffer, but no rendering. An input speed of >30 MB/s is quite impressive and actually faster than many native emulators.
This speedup relies heavily on the buffer reworks, as this table shows:

Input throughput
JS Array buffer (v3.8) ~7 MB/s
TypedArray buffer (v3.9) ~18 MB/s
+ UTF32 buffer for parser (#1878, pending) ~25 MB/s
+ UTF8 input (this PR) ~35 MB/s

😸😸

One might wonder why raw UTF8 input gives that much of a further speedup compared to string input. The reason is basically the string to charCode conversion, which happens to be way more expensive than a UTF8 to UTF32 conversion if the data is already a bunch of charCodes.

For integrators this might be relevant as it will save several forth and back conversions going on from pty to xterm.js, example for our demo with websocket with string input:

  • pty spits out UTF8 bytes
  • node-pty reads them and does a UTF8 to JS string conversion (happens in Socket)
  • server.js reads the string chunks from node-pty and forwards them to the websocket server (express-ws)
  • websocket creates a string frame and actually decodes the string back to UTF8 (lol)
  • browser gets the websocket frame as string message type and again converts the UTF8 data to JS strings
  • terminal gets the JS string and converts it to UTF32 for further processing (only this is covered by the test above!)

With raw UTF8 this is more like (without going into details):

  • pty spits out UTF8 bytes
  • node-pty reads them and emits a Buffer object created on the memory
  • server.js reads the buffer chunks from node-pty and forwards them to the websocket server (express-ws)
  • websocket copies memory into the outgoing socket with some binary framing (no conversion)
  • browser gets the websocket frame as binary message type and creates an ArrayBuffer
  • terminal gets the buffer, creates a Uint8Array instance and converts it to UTF32 for further processing

Lesser conversions, lesser data shuffling around in the memory. Even for remote transport UTF8 is clearly the winner, as terminal data typically contains a high percentage of ASCII chars which will save bandwidth/time. UTF8 for president. 🚀

@jerch jerch added type/enhancement Features or improvements to existing features area/performance labels Jan 31, 2019
@jerch jerch self-assigned this Feb 1, 2019
@Tyriar
Copy link
Member

Tyriar commented May 10, 2019

Remaining for this:

  • Merge conflict (probably just messed up imports)
  • Adding time-based limiting to writeUtf8

@jerch jerch closed this May 11, 2019
@jerch jerch reopened this May 11, 2019
@jerch
Copy link
Member Author

jerch commented May 11, 2019

@Tyriar
Now with input limit logic (taken from Terminal.write). Not yet happy with this, it doubles the buffering logic and makes it impossible to mix string and utf8 byte inputs during heavy input flow (while the buffers are not empty). Any idea how to solve this? A mixed type buffer would solve it but looks quite wrong to me.

Copy link
Member

@Tyriar Tyriar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just 2 comments then we can merge. I also did a little clean up and added an API test 🙂

src/core/input/TextDecoder.ts Outdated Show resolved Hide resolved
src/core/input/TextDecoder.ts Outdated Show resolved Hide resolved
@Tyriar Tyriar added this to the 3.14.0 milestone May 12, 2019
@jerch
Copy link
Member Author

jerch commented May 12, 2019

@Tyriar Any clue why this test times out? Happened yday too, but worked in the second run.

@jerch jerch closed this May 12, 2019
@jerch jerch reopened this May 12, 2019
@jerch
Copy link
Member Author

jerch commented May 12, 2019

Reminder to myself: Remove e6e5ecc before merging.

@Tyriar
Copy link
Member

Tyriar commented May 12, 2019

Any clue why this test times out? Happened yday too, but worked in the second run.

I saw that too, but rebuilding fixed it 🤔

@jerch jerch closed this May 12, 2019
@jerch jerch reopened this May 12, 2019
@jerch jerch closed this May 12, 2019
@jerch jerch reopened this May 12, 2019
@jerch jerch merged commit eef8556 into xtermjs:master May 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance type/enhancement Features or improvements to existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants