Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom flow control and discard limit #2122

Closed
wants to merge 22 commits into from

Conversation

jerch
Copy link
Member

@jerch jerch commented May 24, 2019

This PR introduces a new flow control mechanism based on recent changes in node-pty. The flow control works similar to the current XON/XOFF way with writing special messages to the backend to indicate whether data streaming should be paused/resumed. Main differences are customizable PAUSE/RESUME messages. Those will be filtered in node-pty to pause/resume the pty slave program by buffer back pressure.

Changes:

  • track write buffer state with a watermark
  • sanity check for fast incoming data with discard option (fixing the possible OOM)
  • high and low watermarks to pause/resume data flow

TODO:

  • determine default strings for PAUSE/RESUME (should work for all common platforms/envs)
  • API options, which should go there?
  • documentation / example usage / special notes for non node-pty backends
  • tests

Fixes #2077, #1918.

- adding watermark for write buffers
- low and high watermark to toggle flow control
- sanity watermark to discard data
@jerch jerch added the work-in-progress Do not merge label May 24, 2019
@vincentwoo
Copy link
Contributor

Very interesting stuff!

src/Terminal.ts Outdated Show resolved Hide resolved
src/Terminal.ts Outdated Show resolved Hide resolved
src/Terminal.ts Outdated
this.writeBufferUtf8.push(data);
// safety measure: dont allow the backend to crash
// the terminal by writing to much data to fast.
if (this._watermark > DISCARD_WATERMARK) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this check here, jsdoc in API (and more docs when we improve that) should be enough

Copy link
Member Author

@jerch jerch May 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like no DISCARD check at all? It is meant to stop malicious or faulty backends (those that ignore the PAUSE) from crashing ppls browsers. Sure we dont want that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could discard some important things which would corrupt the terminal, for example ls -lR / && vim may not enter the alt buffer. I guess the main reason I don't think it's needed is I haven't seen the demo crash for a long time and we could avoid this critical path check if people implement correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I see with ./fast_producer in the demo on master:

  • FF: +2.5 GB / min, kernel eventually kills it for being this memory hungry (happens for me around minute 3 to 5 in several runs)
  • Chrome: +1 GB / min in the beginning, after 2-3 min this drops back to ~250 MB and is stable there, devtools die after 30s

No clue what going on with Chrome, I think they are cheating once a process hit some memory limit (killing the Websocket? - idk, did not investigate further). Firefox shows the expected linear memory exhaustion.

I am abit uneasy about not having this security setting, since it might kill the browser (under FF at least the kernel killed the whole browser with all tabs). Maybe we should set DISCARD_WATERMARK higher even higher to avoid losing data in your example?

src/Terminal.ts Outdated Show resolved Hide resolved
@jerch
Copy link
Member Author

jerch commented May 26, 2019

Did some fine tuning of the limits in the demo with the last commits (still contains debug logs).

There is a tradeoff between raw throughput speed and response to interrupts like Ctrl-C. Furthermore fast producers behave differently than slower ones due to different buffer fill states.

The numbers I have found are the compromise between no/low negative impact on frequently used semi fast producers like ls and still acceptable response time to Ctrl-C with fast ones (yes and ./fast_producer)

Currently I see the following:

  • ls string throughput slightly decreased (~ -7%)
  • ls utf8 throughput slightly increased (~ +10%)
  • Ctrl-C on ls is immediately
  • Ctrl-C on yes shows a tiny bit of latency
  • Ctrl-C on ./fast_producer shows a bigger but still acceptable latency

To get there I also had to introduce a buffer cap in the server.js pty --> xterm buffer, otherwise chunks from running ./fast_producer get to big within 5ms and the terminal shows a lag (decreased FPS).

@Tyriar Could you test if these numbers will do on MacOS? With its bigger pty buffer yes and ./fast_producer might show a big latency again for Ctrl-C.

@Tyriar
Copy link
Member

Tyriar commented May 26, 2019

Ctrl+C on yes/macOS I see the server pause immediately (logs stop) and then about 2 seconds before the renderer catches up.

@jerch
Copy link
Member Author

jerch commented May 26, 2019

@Tyriar Hmm 2s is quite long, but I cannot lower that much further without sacraficing throughput performance. Can we live with it taking this long? Imho yes is already an extrem example of a fast producer, things that typically have longer run times with quite noisy output are buildtools. Those should work pretty well with these settings.

@Tyriar
Copy link
Member

Tyriar commented May 27, 2019

@jerch I wasn't actually testing it right, it works great when I toggle useFlowControl on after
34c3a77

@Tyriar Tyriar added this to the 3.14.0 milestone May 29, 2019
@Tyriar Tyriar removed the work-in-progress Do not merge label May 29, 2019
@Tyriar
Copy link
Member

Tyriar commented May 29, 2019

I don't think this is a WIP anymore right?

demo/server.js Outdated Show resolved Hide resolved
demo/server.js Outdated Show resolved Hide resolved
demo/server.js Outdated Show resolved Hide resolved
src/Terminal.ts Outdated Show resolved Hide resolved
fast_producer.c Outdated

int main(int argc, char **argv) {
while (1) {
putchar('#');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes seems to work fine, and I see the pausing and catching up in the log of yarn start. fast_producer however really starts to chug things and it's hard to tell if the pausing is working at all as all the logs say 1024.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally I placed a console.log(this._watermark) in write to test the flowing with fast_producer.
I was expecting this to run less responsive on MacOS, MacOS has a dynamic pty buffer size depending on incoming data pressure (typically ranges from 16-64kB). Thus a single message will be like 64kB, alot to chew for xterm.js in one step.
fast_producer is an extreme example an IMHO not worth optimizing for. Since yes runs ok for you I think the numbers will do as they are?

@Tyriar Tyriar removed this from the 3.14.0 milestone May 30, 2019
@jerch
Copy link
Member Author

jerch commented May 31, 2019

@Tyriar Added a better fast producer snippet. This is fastest I was able to find with some more structured output (almost twice as fast as yes).

Edit: Wow, the PAUSE signal takes really long to get through in the demo, watermark easily goes up to 5MB, although it gets send at 120kB, WTH. This means with a slightly longer connection latency the watermark easily will go over 10MB.

Edit2: The reason is fairly simple, server.js already sent 2 - 7 MB before the PAUSE signal arrives in server.js. Once its seen the pty blocks rather fast. With worse latency the amount of already sent data will explode. Imho we can only circumvent this with the ACK idea.

src/Terminal.ts Outdated
/**
* send ACK every ACK_WATERMARK-th byte
*/
const ACK_WATERMARK = 131072;//524288; // 2^19
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to instead send some request ack sequence from the server and then once this is hit in the parser we can send back. Extra awesome super-duper benefit of this is that it could all be done via the xterm.js API via a custom handler 😮

So server asks '\x1b^reqack;1\x1b\\' (not sure if an ID, the 1, is needed or not yet), then our custom ESC handler sends back '\x1b^ack;1\x1b\\'.

@jerch
Copy link
Member Author

jerch commented Jul 6, 2019

Closing this PR for now. We still have to work out the details for the offscreen core lib (some downgraded Terminal.ts thingy), prolly with some input service like thingy as well. Any attempts towards better flow control should go there once we have that.

@jerch jerch closed this Jul 6, 2019
@jerch jerch added the reference A closed issue/pr that is expected to be useful later as a reference label Jul 6, 2019
@Tyriar Tyriar removed this from the 4.0.0 milestone Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reference A closed issue/pr that is expected to be useful later as a reference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flow control/back pressure
3 participants