server : remove self-extend features #9860

ggerganov · 2024-10-12T08:01:38Z

Drop support for the self-extend related arguments:

| `-gan, --grp-attn-n N` | group-attention factor (default: 1)<br/>(env: LLAMA_ARG_GRP_ATTN_N) |
| `-gaw, --grp-attn-w N` | group-attention width (default: 512.0)<br/>(env: LLAMA_ARG_GRP_ATTN_W) |

ggerganov · 2024-10-12T08:10:12Z

examples/server/server.cpp

+                if (!params.ctx_shift) {
+                    // this check is redundant (for good)
+                    // we should never get here, because generation should already stopped in process_token()
+                    slot.release();
+                    send_error(slot, "context shift is disabled", ERROR_TYPE_SERVER);
+                    continue;
+                }


@ngxson I think the comment is not entirely correct because in process_token() we check agains the training context length (n_ctx_train), while the slot's context slot.n_ctx could be smaller. What do you think?

Hmm no, I did add a check against slot.n_ctx. Is this what you're looking for?

llama.cpp/examples/server/server.cpp

Lines 1184 to 1191 in 11ac980

// if context shift is disabled, we stop when it reaches the context limit

if (slot.n_decoded >= slot.n_ctx) {

slot.truncated = true;

slot.stopped_limit = true;

slot.has_next_token = false;

SLT_DBG(slot, "stopped due to running out of context capacity, n_decoded = %d, n_ctx = %d\n", slot.n_decoded, slot.n_ctx);

}

Ah, I missed that, thanks.

Shouldn't we check this actually:

if (slot.n_prompt_tokens + slot.n_decoded >= n_ctx) {

Hmm, or maybe:

if (slot.n_past + slot.n_decoded >= n_ctx) {

Anyway, I will figure it out as I'm looking into this logic currently.

~~Ah yeah I misunderstood n_decoded. Yeah, maybe we even need (int) system_tokens.size() + slot.n_prompt_tokens because system_tokens is already in KV cache before the first decode.~~

Thanks for looking into this.

No sorry I haven't see #9811

ggml-ci

* server : remove self-extend ggml-ci * server : fix context limit check to use slot.n_past ggml-ci

github-actions bot added examples server labels Oct 12, 2024

ggerganov commented Oct 12, 2024

View reviewed changes

Base automatically changed from gg/server-remove-system-prompt to master October 12, 2024 11:51

server : remove self-extend

8a1f439

ggml-ci

ggerganov force-pushed the gg/server-remove-self-extend branch from e5f74fe to 8a1f439 Compare October 12, 2024 11:54

server : fix context limit check to use slot.n_past

b75afe3

ggml-ci

ggerganov mentioned this pull request Oct 12, 2024

changelog : llama-server REST API #9291

Open

ggerganov merged commit 1bde94d into master Oct 12, 2024
57 checks passed

ggerganov deleted the gg/server-remove-self-extend branch October 12, 2024 13:06

drollings pushed a commit to drollings/llama.cpp that referenced this pull request Oct 18, 2024

server : remove self-extend features (ggerganov#9860)

0b2de92

* server : remove self-extend ggml-ci * server : fix context limit check to use slot.n_past ggml-ci

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024

server : remove self-extend features (ggerganov#9860)

6d90a5b

* server : remove self-extend ggml-ci * server : fix context limit check to use slot.n_past ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : remove self-extend features #9860

server : remove self-extend features #9860

ggerganov commented Oct 12, 2024

ggerganov Oct 12, 2024

ngxson Oct 12, 2024

ggerganov Oct 12, 2024 •

edited

Loading

ngxson Oct 12, 2024 •

edited

Loading

ngxson Oct 12, 2024

	// if context shift is disabled, we stop when it reaches the context limit
	if (slot.n_decoded >= slot.n_ctx) {
	slot.truncated = true;
	slot.stopped_limit = true;
	slot.has_next_token = false;

	SLT_DBG(slot, "stopped due to running out of context capacity, n_decoded = %d, n_ctx = %d\n", slot.n_decoded, slot.n_ctx);
	}

server : remove self-extend features #9860

server : remove self-extend features #9860

Conversation

ggerganov commented Oct 12, 2024

ggerganov Oct 12, 2024

Choose a reason for hiding this comment

ngxson Oct 12, 2024

Choose a reason for hiding this comment

ggerganov Oct 12, 2024 • edited Loading

Choose a reason for hiding this comment

ngxson Oct 12, 2024 • edited Loading

Choose a reason for hiding this comment

ngxson Oct 12, 2024

Choose a reason for hiding this comment

ggerganov Oct 12, 2024 •

edited

Loading

ngxson Oct 12, 2024 •

edited

Loading