added `--disable-tty` flag for interactive mode #1491

biw · 2023-05-17T02:10:24Z

Since the merge of #1040, when llama.cpp is running in interactive mode, the user input is outputted to dev/tty on POSIX systems. This causes an issue when trying to run it as a subprocess on POSIX systems in Node.js (using spawn) since it is no longer included in stderr/stdout flow.

I read through the code and the PR, but didn't understand the reason for using dev/tty when running interactive mode via ./main on the command line. So rather than replace something that I didn't understand, I added a --disable-tty flag, which uses stderr to display the user output rather than /dev/tty. This allows for developers to handle the printing of the user input without clogging up the stdout (which happens if using something like node-pty), and keeps the current behavior of stdout being reserved for generation output.

Happy to make any changes; please let me know!

cc: @DannyDaemonic

DannyDaemonic · 2023-05-17T08:33:05Z

The input was written to the tty to avoid all the control sequences being written to stdout for people who wanted to directly read the results of stdout. In my tests the previous implementation using readline didn't output anything to stdout either. Is this something that's changed for you since #1040 or is this just the patch you've identified as the point where tty started being used?

I have thought about this though and had considered one of two things, either making a --clean-output flag that uses dev/tty only for control characters and otherwise just writes to stdout. Or simply writing to stdout and let people rely on non-interactive mode when they want to capture the output.

That said, I'm surprised it's picking up the tty for you. As far as I understand, when a program is run from Node.js using child_process.spawn, it's not connected to a terminal by default. However, it can be connected to a terminal if the stdio option is set to 'inherit', as in the following example:

const spawn = require('child_process').spawn;
const child = spawn('program', [], { stdio: 'inherit' });

This will cause the child process to inherit its stdin, stdout, and stderr from the parent process, effectively connecting it to the terminal.

If the child process is being connected to a terminal in this way, and you want to prevent it, you could change the stdio option to 'pipe'. This will create a pipe between the parent and child process for each of stdin, stdout, and stderr, which won't be connected to the terminal:

const spawn = require('child_process').spawn;
const child = spawn('program', [], { stdio: 'pipe' });
// should be the same as
//const child = spawn('program', []);

It would be useful to know. Can you verify this?

biw · 2023-05-17T09:07:26Z

Thanks for the quick response @DannyDaemonic!

The input was written to the tty to avoid all the control sequences being written to stdout for people who wanted to directly read the results of stdout. In my tests the previous implementation using readline didn't output anything to stdout either. Is this something that's changed for you since #1040 or has this patch you've identified as the point where tty started being used?

Yes, I was able to identify the issue happening after #1040. I was previously staying on an old build before #1040 because I couldn't figure out what the issue was, but after upgrading for the new model formats, I started digging.

I have thought about this though and had considered one of two things, either making a --clean-output flag that uses dev/tty only for control characters and otherwise just writes to stdout. Or simply writing to stdout and let people rely on non-interactive mode when they want to capture the output.

A --clean-output flag which has control characters only on dev/tty would be great for my use case. I would like to be able to read stdout with interactive mode because of the startup penalty of the model. The node process is long-running and passes in more inputs over time.

That said, I'm surprised it's picking up the tty for you. As far as I understand, when a program is run from Node.js using child_process.spawn, it's not connected to a terminal by default. However, it can be connected to a terminal if the stdio option is set to 'inherit', as in the following example:
const spawn = require('child_process').spawn;
const child = spawn('program', [], { stdio: 'inherit' });

You are correct; stdio is not picked up by default and is handled with a callback. Because llama.cpp writes directory to /dev/tty, it "escapes" out of the child.stdio.on('data') callback and shows in the user's terminal (via tty) on standard node processes. For a general node process, this is a minor cosmetic issue IMO. However, with Electron (only on Production builds) the behavior is similar to https://github.com/microsoft/node-pty and causes anything written to /dev/tty in a spawn process to be printed to stdio. This is an issue as it's non-trivial to detect the user's input vs. model response.

DannyDaemonic · 2023-05-17T09:36:11Z

Ah, I understand now. Opening the path directly skirts the tunneling.

A --clean-output flag which has control characters only on dev/tty would be great for my use case.

Now that I think about it, perhaps the behavior intended for --clean-output should just be how it always is? Are there downsides to this? Is the reason this flag would be great because only the control characters are going to the tty?

Edit: I see that your patch sends all the user input to stderr. This is a bigger change than disabling tty and could even help people who are using main as a child process in Windows (where there is no tty but would still benefit from a cleaner stdout). Perhaps the switch should contain the name stderr so that it makes sense to Windows users as well. --echo-stderr?

kaorahi · 2023-05-17T14:17:51Z

I also noticed that rlwrap does not work with llama.cpp after 41654ef (#1040).

rlwrap --ansi-colour-aware -t dumb ./main ...

Before #1040, I can type Helo, Ctrl-b (move the cursor backward), and l to correct it to Hello. After #1040, Ctrl-b does not change the display and I just see Helol by the same operations.

DannyDaemonic · 2023-05-17T19:56:11Z

@kaorahi I switched from reading lines as a whole to reading character at a time. This should allow us to do all kinds of things like things like hit Esc instead of Ctrl-C to interrupt generation and to backspace over generated text and edit it. That's still a bit further down the road. As it is I find myself trying to compete with things like rlwrapper and Windows' built in getline stuff. I managed to program in some decent editing. I think it matches readline fairly well, but it's a rather large amount of code so I'm worried it's going to scare people away from the pull request. Maybe I should make a draft so people who are missing out on their rlwrapper or windows terminal stuff can help me compare.

biw · 2023-05-17T23:44:31Z

Ah, I understand now. Opening the path directly skirts the tunneling.

Yes, but that's actually fine for me. The issue is using electron production builds it doesn't, which requires parsing the stdout to figure out what's a re-print of the input vs. the output. It has turned out to be non-trivial in my experience.

A --clean-output flag which has control characters only on dev/tty would be great for my use case.

Now that I think about it, perhaps the behavior intended for --clean-output should just be how it always is? Are there downsides to this? Is the reason this flag would be great because only the control characters are going to the tty?

I think I misunderstood you last night :), I would like only the model response to be in stdout. If the control characters go to tty (and then onwards to stdout in electron), I can parse those out pretty easy, but having the input go to tty/stdout make it very tricky to run as a subprocess.

Edit: I see that your patch sends all the user input to stderr. This is a bigger change than disabling tty and could even help people who are using main as a child process in Windows (where there is no tty but would still benefit from a cleaner stdout). Perhaps the switch should contain the name stderr so that it makes sense to Windows users as well. --echo-stderr?

In my testing, user input already goes to stderr on Windows but I'm not well-versed in the APIs.

DannyDaemonic · 2023-05-18T00:59:33Z

In my testing, user input already goes to stderr on Windows but I'm not well-versed in the APIs.

In Windows we write the input directly to the console rather than sending to stdout. Writing to the tty is close to equivalent on POSIX, and my tests did show that when I used the tty, stdout didn't have any of the user input mixed in. I wonder if there may be a misunderstanding on what's happening; I still feel like there's a piece missing from this puzzle.

The issue is using electron production builds it doesn't, which requires parsing the stdout to figure out what's a re-print of the input vs. the output. It has turned out to be non-trivial in my experience.

Either way, it sounds like the problem with electron is the input mixing into the output (which would certainly happen when it can't open the tty to write to). For example, this is what happens when I run gdb on main.
I would be curious what perror() would say here:

con_st.tty = fopen("/dev/tty", "w+");
perror("fopen");

I think I misunderstood you last night :), I would like only the model response to be in stdout.

There may be other things that also want it completely silent on input. I'm wondering if maybe we want something like --no-echo that just hides all input entirely (sending it to /dev/null). Or I suppose I could implement a --clean-output that pipes all the input to stderr if it can't open the tty if it turns out to be the stdout failover that's causing the issue.

DannyDaemonic · 2023-05-22T23:32:03Z

Did you want to try the --simple-io from #1558 to see if that works with node-pty? If you don't know how (or prefer not) to checkout the branch, you can use this patch and apply it with patch -p1 < 1558.patch.

biw · 2023-05-26T18:54:24Z

Hey @DannyDaemonic, I can confirm that #1558 works as expected If that gets merged in, I no longer need this PR. Thank you for making that and being so responsive in this thread 😄

I'm going to close this in favor of #1558!

added --disable-tty flag for interactive mode

a13e998

DannyDaemonic mentioned this pull request May 22, 2023

Add --simple-io option for subprocesses and separate console to it's own header and cpp #1558

Merged

Merge branch 'master' of github.com:biw/llama.cpp into added-disable-tty

046def2

biw closed this May 26, 2023

DannyDaemonic mentioned this pull request Aug 3, 2023

Replaced the usage of ReadConsoleInputW and fgetwc #2477

Closed

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added `--disable-tty` flag for interactive mode #1491

added `--disable-tty` flag for interactive mode #1491

biw commented May 17, 2023 •

edited

Loading

DannyDaemonic commented May 17, 2023 •

edited

Loading

biw commented May 17, 2023

DannyDaemonic commented May 17, 2023 •

edited

Loading

kaorahi commented May 17, 2023

DannyDaemonic commented May 17, 2023

biw commented May 17, 2023

DannyDaemonic commented May 18, 2023

DannyDaemonic commented May 22, 2023

biw commented May 26, 2023 •

edited

Loading

added --disable-tty flag for interactive mode #1491

added --disable-tty flag for interactive mode #1491

Conversation

biw commented May 17, 2023 • edited Loading

DannyDaemonic commented May 17, 2023 • edited Loading

biw commented May 17, 2023

DannyDaemonic commented May 17, 2023 • edited Loading

kaorahi commented May 17, 2023

DannyDaemonic commented May 17, 2023

biw commented May 17, 2023

DannyDaemonic commented May 18, 2023

DannyDaemonic commented May 22, 2023

biw commented May 26, 2023 • edited Loading

added `--disable-tty` flag for interactive mode #1491

added `--disable-tty` flag for interactive mode #1491

biw commented May 17, 2023 •

edited

Loading

DannyDaemonic commented May 17, 2023 •

edited

Loading

DannyDaemonic commented May 17, 2023 •

edited

Loading

biw commented May 26, 2023 •

edited

Loading