Skip to content

Add exit call to interactive mode. #910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

wbpxre150
Copy link
Contributor

Problem:
The timing stats do not print when using "ctrl+c" to quit interactive mode. With all the problems I have been having with model speed I needed the stats to print out when I was finished running prompts.

Solution:
There is almost never a time the prompt is only a single word. So we check for input length equals 5, and if this is true check the word "exit" was entered in lower case.

This to me was the simplest solution to the problem, but maybe I am wrong. Maybe there is an easier way to handle this, I am open to discussion and changes. However using this solution its possible to add more commands to the program as required.

@sw
Copy link
Contributor

sw commented Apr 12, 2023

I think it might be better to add llama_print_timings to the Ctrl-C signal handler and enable that also for non-interactive mode. That way you could also get statistics if you decide to abort inference.

It would require ctx be made module-global, but we already have an assortment of such variables, precisely for the console handling.

Signal handlers really ought to have a way to have a void * pointer passed to them, that would make this a bit cleaner. (am I wrong in thinking that's not possible with POSIX signals?)

@wbpxre150
Copy link
Contributor Author

My first thought was to put this code into the signal handler, but I couldn't find where to place it. If you could point me in the right place that would be helpful!!

@wbpxre150
Copy link
Contributor Author

Never mind I worked it out. You cannot pass a void to the signal handler. It needs a global ctx variable.

I still think we need to have a way to pass commands to the input. It could do all kinds of things. Maybe we need to have some kind of prefix to the line to signal a command? For example ??? CMD params

For example, you could change the command line arguments in real time, then call stats print after a prompt. But maybe I'm over thinking it, as a reload of the program is pretty fast already now.

@wbpxre150
Copy link
Contributor Author

This PR can probably be closed, as #924 includes the fix for signal handler stats print and also adds more functionality. Now the stats call is in the signal handler, we no longer have an "exit" command.

@wbpxre150 wbpxre150 closed this Apr 14, 2023
jeroen-mostert pushed a commit to jeroen-mostert/llama.cpp that referenced this pull request Aug 30, 2024
* GradientAI Auto ROPE Base calculation

https://gradient.ai/blog/scaling-rotational-embeddings-for-long-context-language-models
has a formula that better fits the ideal rope scaling. 

Tested with Lllama3, checked calculation is correct for llama2. Retains logic for not scaling rope if under trained CTX.

* add in solar scaling logic

Solar based models require the context values to be multiplied by 8. This is (i'm guessing) because the positions as based on a 32k context, but sliding window of 4k.

* Update model_adapter.h

adding in tensor count to identify solar models based on tensor count of 435.

* Update model_adapter.cpp

add in n_tensor count for solar identification

* refactor and cleanup GradientAI rope scaling

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants