Main.exe from release VS Main.exe build locally => different behaviour #926

sergedc · 2023-04-12T22:02:11Z

Expected Behavior

I expect to get the same behavior when I use the main.exe from llama-master-e7f6997-bin-win-avx2-x64.zip and when I download the source code and build the main myself.

FYI, I run the main.exe with these parameters:
main -m ../../model/ggml-gpt4-x-13b-q4_1.bin --color -f ./alpaca.txt -ins -b 512 -c 2048 -n 2048 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7

Current Behavior

On the main.exe from the zip: after I ask a question, the AI answers, and I am presented with ">" for my next question
On the main.exe that I build myself (windows, cmake, gcc and g++), following the exact instructions provided: after I ask a question, the AI answers, and ">" appears and then the AI keep writing (usually about a sport event , e.g. fifa, that happened in 2018)

Environment and Context

Windows 11

32 GB ram
Ryzen CPU
GCC/G++:
-- The C compiler identification is GNU 11.2.0
-- The CXX compiler identification is GNU 11.2.0
cmake version 3.26.3

rabidcopy · 2023-04-12T22:22:57Z

--seed?

sergedc · 2023-04-12T22:33:31Z

--seed?

I just tried both again with "--seed 5" and same behaviour.
I tried on another build from 5 days ago, and same behaviour.

I should also mentioned that the downloaded main.exe from the zip is 216kb, while the one I build myself is 3,262kb....

tcristo · 2023-04-12T23:38:59Z

The size of the exe makes me think you are building a debug version and not a release version.

With regards to it continuing to output after it should stop, I have noticed all the models seem to have a sweet spot for the parameters depending upon what you are trying to get it to do. For yours, try changing n to -1

sergedc · 2023-04-13T00:06:09Z

Hi, regarding exe file size my instructions were, from the build folder:
cmake -G "MinGW Makefiles" ..
cmake --build . --config Release

With regards to "--n -1" , it did not change anything. It still keeps talking after I got the answer.

sergedc · 2023-04-13T18:41:19Z

What is the expected way of building this?
Would installing visual studio and using that possibly solve the problem?

I am trying to debug the problem: in the main.cpp:

 if (!std::getline(std::wcin, wline)) {
                        // input stream is bad or EOF received
                        return 0;
                    }
                    win32_utf8_encode(wline, line);
                    std::ofstream outfile2("test.csv");
                    outfile2 << line.c_str();
                    outfile2.close();

In the cvs file, there is not thing! No space, just 1 line 1 column with nothing.

sergedc · 2023-04-13T18:59:54Z

Interesting: the flow goes like this:

1st question
1st answer
Random answer
2nd question
2nd answer
Random answer (the exact same random answer as in 3.)

tcristo · 2023-04-13T19:03:42Z

What is the expected way of building this? Would installing visual studio and using that possibly solve the problem?

I am trying to debug the problem: in the main.cpp:
 if (!std::getline(std::wcin, wline)) {
                        // input stream is bad or EOF received
                        return 0;
                    }
                    win32_utf8_encode(wline, line);
                    std::ofstream outfile2("test.csv");
                    outfile2 << line.c_str();
                    outfile2.close();
In the cvs file, there is not thing! No space, just 1 line 1 column with nothing.

I use vs2022 to build my windows executable. My release version is around 200k and my debug is around 1M. My release version is considerably faster than my debug version.

I don't currently use the same model as you. I'm using variants of both Vicuna and Koala right now. Once the parameters are fine tuned, they are as consistent as the current models of LLMs allow them to be.

sergedc · 2023-04-13T20:31:32Z

Thanks. I will try with VS2022

If I might ask, what parameters are you using for what purpose?

Me, I am using it to summarize 5 to 10 lines of text. As well as comparing short pieces of text for similarities and difference. What parameters would you recommend?

Could you point me to the Vicuna and Koala variants you are using? Are they unfiltered? I think that also makes an impact on quality.

Thanks!

sergedc · 2023-04-22T16:10:28Z

FYI, using VS2022, this problem disappeared.

sergedc closed this as completed Apr 17, 2023

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Main.exe from release VS Main.exe build locally => different behaviour #926

Main.exe from release VS Main.exe build locally => different behaviour #926

sergedc commented Apr 12, 2023

rabidcopy commented Apr 12, 2023

sergedc commented Apr 12, 2023 •

edited

Loading

tcristo commented Apr 12, 2023

sergedc commented Apr 13, 2023

sergedc commented Apr 13, 2023 •

edited

Loading

sergedc commented Apr 13, 2023

tcristo commented Apr 13, 2023

sergedc commented Apr 13, 2023

sergedc commented Apr 22, 2023

Main.exe from release VS Main.exe build locally => different behaviour #926

Main.exe from release VS Main.exe build locally => different behaviour #926

Comments

sergedc commented Apr 12, 2023

Expected Behavior

Current Behavior

Environment and Context

rabidcopy commented Apr 12, 2023

sergedc commented Apr 12, 2023 • edited Loading

tcristo commented Apr 12, 2023

sergedc commented Apr 13, 2023

sergedc commented Apr 13, 2023 • edited Loading

sergedc commented Apr 13, 2023

tcristo commented Apr 13, 2023

sergedc commented Apr 13, 2023

sergedc commented Apr 22, 2023

sergedc commented Apr 12, 2023 •

edited

Loading

sergedc commented Apr 13, 2023 •

edited

Loading