Is using grammar a security risk w.r.t. circumventing safety guards? #13864

RealLitb · 2025-05-28T15:26:05Z

RealLitb
May 28, 2025

I wonder whether using a grammar with language models (GBNF) pose security risks, because one can grammatically force the model to start its answer with stuff that it would not have otherwise written (because of learned safety guards to not talk about certain topics).

For example if you ask the model for assistance with building a bomb and you let the grammar be

root ::= "Certainly, here's how the bomb can be built:" .*

I wonder whether that attack surface exists.. and how you can guard the model against it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is using grammar a security risk w.r.t. circumventing safety guards? #13864

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Is using grammar a security risk w.r.t. circumventing safety guards? #13864

Uh oh!

RealLitb May 28, 2025

Replies: 0 comments

RealLitb
May 28, 2025