Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interface improvements and --multiline-input (previously --author-mode) #1040

Merged
merged 5 commits into from
May 9, 2023

Conversation

DannyDaemonic
Copy link
Contributor

@DannyDaemonic DannyDaemonic commented Apr 18, 2023

I wanted to make it easier for authors to use this in their writing process.

I've tested the code on both Linux and Windows. I don't see why it wouldn't also work on a Mac, but it would be nice to have confirmation of that.

I also apologize for the volume of lines changed but each change seemed tied to the next so I ended up putting it through as one commit. With any code changes, my goal was to make main.cpp simpler to read. There's still a lot that can be done, but I do believe the input loop is a lot cleaner now.

Changes:

  • Adds --author-mode where we read from input until the user ends their line with a \
  • Allows the user to end with / such that inference starting from that position (without the \n)
  • Highlights the above-mentioned control characters (when --color is enabled) so it's harder to accidentally end a line with \ or /
  • Cleans up superfluous control characters and line breaks

Before:
warning

After:
A New Kind of Cheese
ai overlord

Detailed changes

Author mode allows one to write (or paste) multiple lines of text without ending each line in a \. LLM models do better the more context you give them, but the original input flow made this difficult. (Before pasting your writing you'd have to edit a \ into each line or paste it to a file, save that file, and then pass that path in with -f. If you're doing multiple revisions, that can become overwhelming.

Another useful feature for writing is the ability to start inference in the middle of a line. It’s not uncommon for writers to get stuck in the middle of a paragraph or piece of dialogue. I wanted to provide a way for the language model to generate text from that point, but previously, the best one could do was to hit enter and have the code append a \n to the buffer before starting inference.

I considered making author mode the default and creating a --chat-mode option with the previous behavior (and implying chat mode with --instruct). However, I decided against changing the default interaction people were familiar with. As a compromise, I allow the / operator to work in both modes.

The other quality of life improvement I really wanted for writing with the AI was to get rid of all the superfluous newlines and control characters that jumble up your text while working. To do this and have it display properly, I had to change the code to read one char at a time. By reading one character at a time, we don't have to print the newline when the user hits enter. This allows the use of / and \ to flow more smoothly and lets us clean up any control characters from the input before we start text inference. This also makes the Windows and Linux versions behave the same in regards to input buffering, for better and for worse. (For example, if you hit a key while it's inferring, it won't jumble up the text, but you also won't see your text while the model is loading.)

Another improvement I made for writing with the AI was to eliminate superfluous newlines and control characters that can clutter up text while working. To achieve this and display text properly, I changed the code to read one character at a time. To start with, when the user hits enter, we don’t have to print a newline. It allows for smoother use of / and and lets us remove any control characters from the input before starting text inference. This change also makes the Windows and Linux versions behave consistently in terms of input buffering, with both advantages and disadvantages. For instance, hitting a key while the model is inferring won’t mix it into the text on Linux anymore, but you also won’t see any of the input you type while the model is loading.

The other code changes are to console_state. It now holds the previous termios settings so we can safely reset the terminal on exit for POSIX systems. This also means including termios.h. The other change was to add the author_mode boolean. I thought about calling it input_mode and making an enum but unless we want to have different input modes for chat, instruct, etc, this is just simpler (and it's easily changed in the future).

I also considered finding a way of hiding that first space that's generated when priming the llama models since the original llama tokenizer also starts each prompt with a single space. Are we sure this is necessary? If so, would it be acceptable to just hide that first space?

Future possible changes

Further quality of life enhancements:

  • Full support for left, right, ins, del, home, and end keys
    • I don't foresee any issues implementing this other than there just being a larger code base, which may be unwanted
    • At some point using a library like ncurses is better, but to keep the requirements to a minimum, adding this ourselves doesn't feel unreasonable
  • Use color by default for interactive terminals (that aren't being piped)
    • Implemented with isatty on POSIX systems and _isatty on Windows
    • Programs like grep already do this
    • I would suggest adding --color=always,never,auto just like grep has but perhaps switch from color=never to color=auto on any interactive mode (for compatibility --color without an option would just indicate always)
  • Change the cursor to indicate when the user is waiting on inference
    • We could probably only do this when use_color is true since changing the cursor will leave artifacts in stdout if piped
  • Choose a different color for control characters \ and /
    • I think the current colors are clear enough but if someone feels otherwise, it would be easy to use a different color
  • Read input on another thread
    • This would give us the ability to process each line as it is added as opposed to waiting until all input has been entered. This will also allow us to take input a little earlier as the model is being primed with " "
    • Would make the program "feel" a little faster

Further code improvements:

  • Replacing last_n_tokens with a ring-buffer
  • Abstract away signal handling code to common to remove the last of the #ifdef blocks in main.cpp code
  • Evaluate conversions from 'time_t' to 'int32_t' and 'time_t' to 'int32_t' and static cast them

Of the future possible changes, I think using the other control keys is most appealing to me. Followed by putting the input on a separate thread to allow context chugging on previous lines as the user writes subsequent lines. I'm not very active on github so I have no way of proving it, but I'm quite experience with threading and synchronization. Both of these (particularly the multithreading) may add more complexity to the code than is desired for an "example" program.

Edit: This is also a first step towards moving away from Ctrl-C to interrupt inference if that's desired. Once we are handling keys one a time like this, we can more easily make Esc (or any other key) interrupt inference.

@TheBloke
Copy link
Contributor

Compiles and appears to work fine on Intel macOS

@DannyDaemonic
Copy link
Contributor Author

Looking through the pull requests I've noticed others wanting to add their own options. If we're going to keep adding options, it might be nice to separate common options from advanced options. When someone runs main without any parameters or we get an invalid_param, we could display the common options and when someone uses -h or --help we could show both the regular and advanced options. Programs like grep and wget do this.

The "advanced" options would be things your average user isn't likely to use. Things such as --ignore-eos, --perplexity, and --mtest. I'd even go as far as saying --author-mode belongs there just to keep things shorter and less overwhelming.

I was even hesitant myself in adding another command switch when making this pull request. If this goes through, I'll submit another pull request to divide these options up. Although perhaps there's already enough options for such a separation.

@DannyDaemonic
Copy link
Contributor Author

DannyDaemonic commented Apr 23, 2023

I think it's best to rename this to something that more clearly describes what it does. VIM has a paste mode. My two alternative names are --paste-mode and --multiline-input. I'm going to go with the later since that's the most descriptive. I'm still open to other ideas.

And this commit still contains quality of life improvements authors may like. I'm worried the size of this pull request is keeping reviewers away. I may also try to separate the console changes into its own pull request but it's less justified without the different line ending support added here.

@DannyDaemonic DannyDaemonic changed the title Add author mode and other related quality of life improvements Add --multiline-input (previously --author-mode) and other related quality of life improvements Apr 23, 2023
@DannyDaemonic DannyDaemonic changed the title Add --multiline-input (previously --author-mode) and other related quality of life improvements Add --multiline-input (previously --author-mode) Apr 23, 2023
@DannyDaemonic DannyDaemonic force-pushed the author-mode branch 2 times, most recently from e3e5f8c to 5d4c019 Compare April 25, 2023 08:51
@Senemu
Copy link
Contributor

Senemu commented Apr 29, 2023

This prevents any non‐ASCII input for me on GNU/Linux. I tried with various terminals, shells, locales, and input methods. I verified that the input is not simply invisible (not echoed), but that it is not being put into the input buffer.

Here are some strings that become impossible to enter:

  • français
  • non‐ASCII
  • 👱🏻‍♀️

@DannyDaemonic
Copy link
Contributor Author

I did have this working, at least partially. This was the bug:

if (input_char < 32)

It should have been:

if ((unsigned)input_char < 32)

It's working with unicode for me if you want to try again @Senemu.

@Senemu
Copy link
Contributor

Senemu commented Apr 30, 2023

It is working for me now too. 👏

@Senemu
Copy link
Contributor

Senemu commented Apr 30, 2023

I can enter all characters now, but when deleting them, the terminal output does not match if the characters are fullwidth or combining.

For example, if I type Test and press Backspace, is deleted, but the cursor goes back only half its width. Then, if I press Backspace again is not deleted from the terminal, but it is deleted from the input buffer, so there is a mismatch between what I see and what LLaMA sees.

Something similar happens when using combining characters: If I type and press Backspace once, the character is visually deleted, but, in reality, a remains. This even allows me to seemingly alter text I didnʼt type, since I can then press Backspace again, putting the cursor behind its initial position.

@ggerganov
Copy link
Member

@DannyDaemonic

This change is great - merge it when it is ready

@DannyDaemonic
Copy link
Contributor Author

DannyDaemonic commented May 5, 2023

Ok, pretty sure I got it this time.

I tested it with "Čüßtøm åçćéñtš ğḥḯṯ ḵḷṃñṓṕ ŕśẗŭṿẅẋýź Жизнь Волга ありがとう 你好".

This string includes Latin-based characters with diacritics, Cyrillic characters ("Жизнь Волга" means "Life Volga"), and phrases in Japanese ("ありがとう" means "Thank you") and Chinese ("你好" means "Hello"). These characters should display correctly in most modern terminals and fonts. No problems in Linux but the default Windows command prompt has an issue with a couple characters (the newer Windows Terminal shows them all). Either way, the code seems to handle everything correctly.

@Senemu Did you want to try breaking it again? 😄 Pretty sure I got it this time.

@DannyDaemonic DannyDaemonic changed the title Add --multiline-input (previously --author-mode) Interface improvements and --multiline-input (previously --author-mode) May 5, 2023
@DannyDaemonic DannyDaemonic changed the title Interface improvements and --multiline-input (previously --author-mode) Interface improvements and --multiline-input (previously --author-mode) May 5, 2023
@Senemu
Copy link
Contributor

Senemu commented May 6, 2023

Did you want to try breaking it again? 😄 Pretty sure I got it this time.

I think that you did! 😃

The only thing I have found is that if I type a combining character by itself at the start of an input, it applies to the preceding character instead of being treated as an isolated combining character. As such, it is impossible to (visually) delete it. Ideally, a combining character without a corresponding base character (which is what conceptually happens in this case, even if there happens to be a character before it in the terminal output) should display by itself (as if combined with a blank space). But note that wanting to do this and expecting it to work is weird, and that the unpatched main does not handle it any better.

Combining characters work as expected in any other circumstance I tested. I couldnʼt make what I see and what LLaMA sees mismatch. 👏

@DannyDaemonic
Copy link
Contributor Author

Oh, good catch. That's something else I didn't consider. The original getline doesn't handle this any better, but it would be easy to prevent the user from starting with a combining character since they are clearly defined.

I guess the bigger question is, is there any utility in starting your input with a combining character? If the text ended on "cafe", you could turn it into "café", and it does seem LLaMA models can tokenize the combining character separately, but when tokenizing, it doesn't break it out automatically, it's got to be written like that. I wonder if the text used for training was normalized in some way or if it's been trained on both representations of the letters.

text: 'cafe'
number of tokens = 2
  1113 -> 'ca'
  1725 -> 'fe'

text: 'café' (without combining character)
number of tokens = 3
  1113 -> 'ca'
 29888 -> 'f'
 29948 -> 'é'

text: 'café' (with combining character)
number of tokens = 3
  1113 -> 'ca'
  1725 -> 'fe'
 30103 -> '́'

I guess since it "breaks" the output on the screen, I feel like the correct thing to do is to prevent user input from starting with a combining character. But at the same time I'm hesitant to take that away if someone depends on that use case. Any thoughts?

@DannyDaemonic DannyDaemonic merged commit 41654ef into ggml-org:master May 9, 2023
@mirek190 mirek190 mentioned this pull request May 9, 2023
@newTomas
Copy link
Contributor

After this update, the program began to hang after entering the input. I'm using w64devkit to run.

@DannyDaemonic
Copy link
Contributor Author

Can you explain hang? Like it just waits? Is your CPU going? Can you put more input in?

@newTomas
Copy link
Contributor

I enter an input, press enter and for some reason the cursor does not go to a new line. After that, my 6 core 6 thread processor is loaded by 20% by this process, although 3 cores are specified in the arguments. I can’t add anything more to the input, all my additional inputs are simply ignored. I tried to wait for 5 minutes, nothing changes, the load on the processor remains the same, the RAM is filled in the same way. Changing the model does not change anything.
If you press ctrl + c, then it will display the timings as before.
I compiled the program with make LLAMA_OPENBLAS=1, if you just build it through make without arguments, then for some reason it complains that it cannot find cblas_sgemm, although it used to compile normally without openblas, but I can’t say for sure when it stopped compiling without openblas, maybe even more before your commit.

@Senemu
Copy link
Contributor

Senemu commented May 10, 2023

@newTomas: Try make clean before compiling again without OpenBLAS.

You could also pass the argument --batch_size 8 (or any other value below 32) to prevent the compiled program from using BLAS.

@newTomas
Copy link
Contributor

@Senemu make clean helped build the version without blas, also added --batch_size 8 just in case. The problem did not go away, besides, I was able to run it in a regular cmd and powershell and the same problem was there.

@newTomas
Copy link
Contributor

I added a few lines to the code to find where the problem might be. Looks like getwchar hangs when hitting enter.
image
image

@DannyDaemonic
Copy link
Contributor Author

It's strange getwchar would get stuck. Can you make it also print directly before the call as well? If it's really stopping in getwchar you should always just be able to press another character to get things moving.

I see "Ярлык" in your title bar, so your system is configured for Russian? I wonder if it's related to that or if it could be due to something misconfigured in w64devkit. I wonder if it would still get stuck with a different keyboard layout.

@ejones ejones mentioned this pull request May 11, 2023
@newTomas
Copy link
Contributor

Yes, I have a Russian version of Windows. Here are screenshots with output to the console before calling getwchar. From the layout and the console, nothing changes.
image
image
image
image
image
image
image

@DannyDaemonic
Copy link
Contributor Author

Well, it looks like you've narrowed it down to the wgetchar call. There's no reason it should be hanging on Enter.

w64devkit uses Mingw, so my suspicion is there's a bug in their version of getwchar. All these wchar functions are kludgy. Again, the other difference is that you're running the Russian version of Windows. Have you tried an official build to see if it happens there?

Either way, I may be able to replace getwchar in Windows with _getwch which is similar but designed to work with non-blocking functions like _kbhit so it may work better.

@newTomas
Copy link
Contributor

Indeed, the official build works correctly. If you make a version for w64devkit with mingw, I'll test it.

@DannyDaemonic
Copy link
Contributor Author

DannyDaemonic commented May 11, 2023

If it's only happening in w64devkit then this has to be a mingw bug. More precisely, w64devkit uses the mingw-w64 toolchain, which is a fork of the original mingw project. Are you able to try other versions of w64devkit/mingw-w64? It might be fixed in a later version or not be there in an earlier version.

I'll try to swap out getwchar for a similar function to see if we can work around the bug, but we're pretty limited when it comes to Unicode support in Windows. You are forced to use the wide character functions.

Are you a programmer or developer yourself? We could file a bug report with w64devkit/mingw-w64, but you typically want the smallest example code that features the bug. I don't have time to test it right now but I believe I've boiled down the bug to just this code:

#include <windows.h>
#include <winnls.h>
#include <fcntl.h>
#include <wchar.h>
#include <stdio.h>
#include <io.h>

int main() {
    // Initialize for reading wchars and writing out UTF-8
    DWORD dwMode = 0;
    HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
    if (hConsole == INVALID_HANDLE_VALUE || !GetConsoleMode(hConsole, &dwMode)) {
        hConsole = GetStdHandle(STD_ERROR_HANDLE);
        if (hConsole != INVALID_HANDLE_VALUE && (!GetConsoleMode(hConsole, &dwMode))) {
            hConsole = NULL;
        }
    }
    if (hConsole) {
        SetConsoleMode(hConsole, dwMode | ENABLE_VIRTUAL_TERMINAL_PROCESSING);
        SetConsoleOutputCP(CP_UTF8);
    }
    HANDLE hConIn = GetStdHandle(STD_INPUT_HANDLE);
    if (hConIn != INVALID_HANDLE_VALUE && GetConsoleMode(hConIn, &dwMode)) {
        _setmode(_fileno(stdin), _O_WTEXT);
        dwMode &= ~(ENABLE_LINE_INPUT | ENABLE_ECHO_INPUT);
        SetConsoleMode(hConIn, dwMode);
    }

    // Echo input
    while (1) {
        // Read
        wchar_t wcs[2] = { getwchar(), L'\0' };
        if (wcs[0] == WEOF) break;
        if (wcs[0] >= 0xD800 && wcs[0] <= 0xDBFF) { // If we have a high surrogate...
            wcs[1] = getwchar(); // Read the low surrogate
            if (wcs[1] == WEOF) break;
        }
        // Write
        char utf8[5] = {0};
        int result = WideCharToMultiByte(CP_UTF8, 0, wcs, (wcs[1] == L'\0') ? 1 : 2, utf8, 4, NULL, NULL);
        if (result > 0) {
            printf("%s", utf8);
        }
    }
    return 0;
}

There might be other settings we change that trigger the bug in mingw-w64, but I'm guessing it's in this snippet. (We might even be able to make it smaller by cutting out some of that initialization.) If you're able to test that and see if the bug still happens, we could consider opening a bug report with mingw-w64.

@newTomas
Copy link
Contributor

I have quite a bit of experience with C/C++, my main language is typescript. I created a folder with the main.cpp file, put your code into it and compiled it with g++ main.cpp -o main
The bug is reproducing!
g++ --version returns
g++ (GCC) 12.2.0

I also have the latest version of cygwin64, the problem is also reproduced here.
g++ --version in cygwin64
g++.exe (x86_64-win32-seh-rev0, Built by MinGW-W64 project) 12.2.0

I also found a separate mingw64 and the same problem there.
g++.exe --version in mingw64
g++.exe (x86_64-win32-seh-rev0, Built by MinGW-W64 project) 12.2.0

I also found mingw64 in msys64 and the same problem there.
g++.exe --version in msys64
g++.exe (Rev4, Built by MSYS2 project) 12.2.0

Then I downloaded the latest version of w64devkit, but this bug is still there.
g++.exe --version in lastest w64devkit
g++ (GCC) 13.1.0

Is this a bug in the compiler itself and has been for a long time?

@DannyDaemonic
Copy link
Contributor Author

DannyDaemonic commented May 11, 2023

The bug is reproducing!

That's great for two reasons. First, there's nothing crazy we're doing. This is a perfectly acceptable way to read Unicode in Windows. And second, that means we can open a bug report with that small snippet. You spent the time isolating the function at large, so if you'd like you can take credit and open a bug report. (I edited the code in my previous reply to use CP_UTF8 and _O_WTEXT directly. I wasn't sure where they were defined but it's probably included in windows.h and will look less hacky if we just use those values directly.)

Is this a bug in the compiler itself and has been for a long time?

It even happens in cygwin? Is that compiler even based on mingw? Either way it could be the compiler itself or their implementation (or wrapper around) Windows' getwchar. If they are expecting/waiting for a particular key it could even be a \r\n\ vs \n issue.

Edit: Oh, in your cygwin output it says "Built by MinGW-W64". For some reason I thought they used regular GCC internally or something.

@newTomas
Copy link
Contributor

If you remove the line _setmode(_fileno(stdin), _O_WTEXT); then the Cyrillic alphabet stops working, but after pressing enter it does not freeze, but moves the cursor to the beginning of the line and the subsequent input overwrites the previous one.
I tried this code:

setlocale(LC_ALL, "UTF-8");
wint_t wc;
while (WEOF != (wc = getwc(stdin)))
  printf("wc = %lc\n", wc);

There are no freezes when pressing enter, but Cyrillic is not displayed when outputting.
image
Google gives such a simple code for reading unicode, but whichever I took, none of them worked. Why c++ programming is such a pain? I'll probably rewrite main.cpp to nodejs.

@DannyDaemonic
Copy link
Contributor Author

_setmode(_fileno(stdin), _O_WTEXT); is how we tell Windows we want to read Unicode characters. So if you take it out, you won't be able to read Cyrillic.

I like to test with these two strings:
Hello, こんにちは, Привет, Bonjour, 你好, Hola, السلام عليكم, Guten Tag, 안녕하세요!
And
å̄b̈çd̦ȩḟğḥḯṯḵḷṃñṓṕŕśẗŭṿẅẋýź

You may have to use Windows Terminal for some of these characters to show up properly. Limitations in the Windows Command Prompt prevent some of them from displaying there.

@newTomas
Copy link
Contributor

newTomas commented May 12, 2023

It's strange, I installed windows terminal and the code that I provided suddenly worked with Cyrillic in both cmd and windows terminal, but other languages (こんにちは, 你好, السلام عليكم, 안녕하세요) still don't work either in cmd or in windows terminal.
I also tried the following code:

setlocale(LC_ALL, "UTF-8");
wint_t wc[2] = {0, 0};
while (WEOF != (wc[0] = getwc(stdin))){
    if (wc[0] >= 0xD800 && wc[0] <= 0xDBFF){
        wc[1] = getwc(stdin);
    }
    printf("wc = %ls\n", wc);
    wc[1] = 0;
}

Everything is the same in it.
cmd:
image
windows terminal with powershell:
image|
windows terminal with cmd:
image

What do we do? Any ideas how to make it work with all characters and new line?

@DannyDaemonic
Copy link
Contributor Author

I think the best thing to do is open a bug report. We aren't using it wrong. It shouldn't be hanging.

@newTomas
Copy link
Contributor

Okay, will you open it?

@DannyDaemonic
Copy link
Contributor Author

I've tried it in Debian on my Linux system, in an Ubuntu WSL install, and in Windows with the MS compilers. I haven't been able to reproduce the bug. I believe you, but when filling out the bug report is asks you about your system and setup and versions and stuff. Also, if they ask me questions about something or suggest I try something different, I want to be in a position where I can check. So I'll probably get around to it, but it's going to take a little time.

@newTomas
Copy link
Contributor

As I understand it, you don’t really want to deal with this bug report, can you then help open the bug report itself, and then I will test possible fixes myself? Can we discuss this, for example, in a telegram?

@DannyDaemonic
Copy link
Contributor Author

Don't worry, I haven't abandoned you. I caught COVID. I'm sure I'll be fine but it's wiping me out a bit.

I'm just going to rewrite the console reads to process the Windows key events. We already have #define block for Windows so we might as well just use the Windows functions. The wchar_t functions aren't used much in Windows since they are different from pretty much everywhere else.

@DannyDaemonic
Copy link
Contributor Author

@newTomas Can you try out #1462 and verify that it fixes the problem for you?

If you don't know how to checkout a pull request you can apply this patch with patch -p1 < 1462.patch.

@newTomas
Copy link
Contributor

@DannyDaemonic Get well. Let me know when you can start creating a bug report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants