Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to get a response in interactive mode #1423

Closed
re11ding opened this issue May 13, 2023 · 7 comments · Fixed by #1462
Closed

Unable to get a response in interactive mode #1423

re11ding opened this issue May 13, 2023 · 7 comments · Fixed by #1462

Comments

@re11ding
Copy link

re11ding commented May 13, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [x ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [x ] I carefully followed the README.md.
  • [x ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [x ] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I am expecting AI responses to my responses, allowing for a 2-way conversation.

Current Behavior

Once it's my turn to provide a prompt and I press enter, the CPU will reach around 30% and then never generate a response at any point, no longer how long it's left to run. I'm always forced to sigint using Ctrl+C in order to terminate llama.cpp

I've also tried it with 7B, but the result is sadly still the same.

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

  • Physical (or virtual) hardware you are using, e.g. for Linux:

Intel(R) Core(TM) i7-6820HK CPU @ 2.70GHz with 32GB of RAM at 2400MHz

  • Operating System, e.g. for Linux:

Windows 10 v1909

  • SDK version, e.g. for Linux:
Python 3.10.4
GNU Make 4.4
G++ (GCC) 13.1.0

Steps to Reproduce

run parameters as per usual, attempt to respond, and simply wait. --keep is not necessary, that was merely the result of my last test run to see if it changed anything.

Failure Logs

E:\LLaMA\llama.cpp>main -m models/30B/ggml-model-q4_0.bin -n -1 -c 2048 -i -r "User:" --color --keep -1 --prompt "Hello!How are you? Please answer in less than 5 words."
main: build = 0 (unknown)
main: seed  = 1683935758
llama.cpp: loading model from models/30B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 127.27 KB
llama_model_load_internal: mem required  = 21695.48 MB (+ 3124.00 MB per state)
llama_init_from_file: kv self size  = 3120.00 MB

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: 'User:'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = -1, n_keep = 16


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 Hello!How are you? Please answer in less than 5 words.
I'm ok,how are you? Answer please in less than five words.
Good question. Here is my answer: 'How am I ? '
i'm doing great, what about u?
Not good at all! How are you?
Hi, whats up? how are you?
I am fine thanks! And you?
User:I'm doing great, thank you!

llama_print_timings:        load time = 19669.43 ms
llama_print_timings:      sample time =    57.24 ms /    75 runs   (    0.76 ms per run)
llama_print_timings: prompt eval time = 17329.88 ms /    16 tokens ( 1083.12 ms per token)
llama_print_timings:        eval time = 86146.71 ms /    75 runs   ( 1148.62 ms per run)
llama_print_timings:       total time = 850995.00 ms
@Folko-Ven
Copy link
Contributor

Hi. I also encountered this error(in Windows), for temporal solution i use main.exe from Releases, it work fine.

@wro52
Copy link

wro52 commented May 13, 2023

The same here, using Win10 64bit, Codeblocks, TDM64-gcc, never had a problem till last severe changes- using last format, after return main freezes, control c works as described before. Maine.exe from Release works as expected.

@wro52
Copy link

wro52 commented May 14, 2023

Dirty patch for issue (#1423) and (#1432)

Line numbers refer to file main.cpp version 13c351a and 79b2d5b

========
Step 1

line 63 comment out

63 // console_init(con_st);

========
Step 2

line 543 and 544 comment out
543 // another_line = console_readline(con_st, line);
544 // buffer += line;

========
Step 3

insert the patch between line 544 and 546

This is between
do {
and
} while (another_line);

//--------- Patch Start ------------------------
#if defined(_WIN32)
std::wstring wline;
if (!std::getline(std::wcin, wline)) {
// input stream is bad or EOF received
return 0;
}
win32_utf8_encode(wline, line);
#else
if (!std::getline(std::cin, line)) {
// input stream is bad or EOF received
return 0;
}
#endif
if (!line.empty()) {
if (line.back() == '\') {
line.pop_back(); // Remove the continue character
} else {
another_line = false;
}
buffer += line + '\n'; // Append the line to the result
}
//---------------- Patch End --------------------------------

========
Step 4

Insert the following function before int main(.....

// Convert a wide Unicode string to an UTF8 string
void win32_utf8_encode(const std::wstring & wstr, std::string & str) {
int size_needed = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), NULL, 0, NULL, NULL);
std::string strTo(size_needed, 0);
WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strTo[0], size_needed, NULL, NULL);
str = strTo;
}

========
Step 5

Recompile

That'it - quick and dirty, color is missing but helps to check out models and a starter for the
bug hunters....

@wro52
Copy link

wro52 commented May 15, 2023

Less dirty and faster patch for issue (#1423) and (#1432)

Line numbers refer to file common.cpp version 63d2046

525 HANDLE hConIn = GetStdHandle(STD_INPUT_HANDLE);
526 if (hConIn != INVALID_HANDLE_VALUE && GetConsoleMode(hConIn, &dwMode)) {
527 // Set console input codepage to UTF16
528 _setmode(_fileno(stdin), _O_WTEXT);
529
530 // Turn off ICANON (ENABLE_LINE_INPUT) and ECHO (ENABLE_ECHO_INPUT)
531 // dwMode &= ~(ENABLE_LINE_INPUT | ENABLE_ECHO_INPUT);
532 // SetConsoleMode(hConIn, dwMode);
533 }

So comment lines 531 and 532 out.
Recompile and color is back and the input is repeated by llama.cpp

@DannyDaemonic
Copy link
Contributor

I think this is related to the Windows version of GCC (mingw?). When built with the MS or Intel compilers the code works as expected.

I'm working on a a fix that just deals with the Windows key events when in Windows. Windows already got its own #define block so we might as well just use regular (well tested) Windows functions.

If anyone want to open a bug report with them, here's the code that should work, but gets stuck on Enter in mingw compilers:

#include <windows.h>
#include <winnls.h>
#include <fcntl.h>
#include <wchar.h>
#include <stdio.h>
#include <io.h>

int main() {
    // Initialize for reading wchars and writing out UTF-8
    DWORD dwMode = 0;
    HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
    if (hConsole == INVALID_HANDLE_VALUE || !GetConsoleMode(hConsole, &dwMode)) {
        hConsole = GetStdHandle(STD_ERROR_HANDLE);
        if (hConsole != INVALID_HANDLE_VALUE && (!GetConsoleMode(hConsole, &dwMode))) {
            hConsole = NULL;
        }
    }
    if (hConsole) {
        SetConsoleMode(hConsole, dwMode | ENABLE_VIRTUAL_TERMINAL_PROCESSING);
        SetConsoleOutputCP(CP_UTF8);
    }
    HANDLE hConIn = GetStdHandle(STD_INPUT_HANDLE);
    if (hConIn != INVALID_HANDLE_VALUE && GetConsoleMode(hConIn, &dwMode)) {
        _setmode(_fileno(stdin), _O_WTEXT);
        dwMode &= ~(ENABLE_LINE_INPUT | ENABLE_ECHO_INPUT);
        SetConsoleMode(hConIn, dwMode);
    }

    // Echo input
    while (1) {
        // Read
        wchar_t wcs[2] = { getwchar(), L'\0' };
        if (wcs[0] == WEOF) break;
        if (wcs[0] >= 0xD800 && wcs[0] <= 0xDBFF) { // If we have a high surrogate...
            wcs[1] = getwchar(); // Read the low surrogate
            if (wcs[1] == WEOF) break;
        }
        // Write
        char utf8[5] = {0};
        int result = WideCharToMultiByte(CP_UTF8, 0, wcs, (wcs[1] == L'\0') ? 1 : 2, utf8, 4, NULL, NULL);
        if (result > 0) {
            printf("%s", utf8);
        }
    }
    return 0;
}

@DannyDaemonic
Copy link
Contributor

Can you try out #1462 and verify that it fixes the problem for you?

If you don't know how to checkout a pull request you can apply this patch with patch -p1 < 1462.patch.

@wro52
Copy link

wro52 commented May 15, 2023

Hello,

patch #1462 works with code::blocks and TDM64-gcc.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants