Closed
Description
As per https://github.com/ggerganov/llama.cpp/blob/da5303c1ea68aa19db829c634f1e10d08d409680/main.cpp#L1066 the EOS flag in interactive mode simply causes is_interacting
to switch on, and so it serves as a way to end the current series of tokens and wait for user input. Is there any reason to actually avoid sampling it in the first place then?