Skip to content

Conversation

@ServeurpersoCom
Copy link
Collaborator

Add nosubs|optimize flags to std::regex constructors to prevent catastrophic backtracking when processing prompts with repeated identical characters (e.g., 'A' * 10000).

The nosubs flag disables subgroup capture, significantly reducing memory usage and backtracking on uniform token sequences

Make sure to read the contributing guidelines before submitting a PR

Before :

/root/llama.cpp.pascal/build/bin/llama-server --port 8088 -m /var/www/ia/models/lmstudio-community/gpt-oss-20b-GGUF/gpt-oss-20b-MXFP4.gguf

You are a helpful assistant<|end|><|start|>user<|message|>Hello<|end|><|start|>assistant<|channel|>final<|message|>Hi there<|end|><|start|>user<|message|>How are you?<|end|><|start|>assistant'
main: model loaded
main: server is listening on http://127.0.0.1:8088
main: starting the main loop...
srv  update_slots: all slots are idle
Erreur de segmentation

(segfault)

After :

(root|~/llama.cpp.pascal) curl -X POST http://localhost:8088/v1/chat/completions   -H "Content-Type: application/json"   -d '{"messages":[{"role":"user","content":"'"$(python3 -c "print('A'*10000)")"' Say OK"}]}'
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","reasoning_content":"The user typed a long string of \"A\" and then says \"Say OK\". So they likely want the assistant to respond with \"OK\". The instruction: \"Say OK\". So we just reply \"OK\". But also must obey system instruction: We should not mention policy. Just reply \"OK\".","content":"OK"}}],"created":1764928092,"model":"gpt-oss-20b-MXFP4.gguf","system_fingerprint":"b7321-147310d71","object":"chat.completion","usage":{"completion_tokens":72,"prompt_tokens":1319,"total_tokens":1391},"id":"chatcmpl-i1GMuuGvb2X3irH73aTbLP0ZkwBYbJdf","timings":{"cache_n":0,"prompt_n":1319,"prompt_ms":171.43,"prompt_per_token_ms":0.12996967399545112,"prompt_per_second":7694.102549145423,"predicted_n":72,"predicted_ms":199.156,"predicted_per_token_ms":2.7660555555555555,"predicted_per_second":361.52563819317515}}

Close #17636

Add nosubs|optimize flags to std::regex constructors to prevent
catastrophic backtracking when processing prompts with repeated
identical characters (e.g., 'A' * 10000).

The nosubs flag disables subgroup capture, significantly reducing
memory usage and backtracking on uniform token sequences
@aviallon
Copy link
Contributor

aviallon commented Dec 5, 2025

Ah, so this is the bug I was hitting.

@ggerganov ggerganov merged commit 1be9783 into ggml-org:master Dec 5, 2025
70 of 78 checks passed
JayZenith pushed a commit to JayZenith/llama.cpp that referenced this pull request Dec 7, 2025
…rg#17786)

Add nosubs|optimize flags to std::regex constructors to prevent
catastrophic backtracking when processing prompts with repeated
identical characters (e.g., 'A' * 10000).

The nosubs flag disables subgroup capture, significantly reducing
memory usage and backtracking on uniform token sequences
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: llama-server crashes (segfault) when processing prompts with repeated identical characters

3 participants