Mitigate prompt injection attacks by supporting "safe" encoding (encoding without special tokens) #1347

bilelomrani1 · 2023-09-24T02:28:22Z

There may already exist a way of accomplishing what I'm going to describe but I didn't find it by reading the documentation.

In certain applications, we should be careful about how special tokens are encoded as they can be used to trigger special capabilities in models, or give them special positional clues (system prompt, etc.). Hence, when serving a model to end users, we need to prevent injection attacks, in which the user sends the representation of a special token as plain text (eg. <SYSTEM>), and the tokenizer interprets this text as a special token. In this regard, OpenAI's tiktoken tokenizer has a very safe default of raising an exception if it encounters text that corresponds to a special token, see the corresponding docstring. This effectively forces the developer to be very intentional about how special tokens should be handled, thus preventing injection attacks.

Such a default behavior would break existing code, an alternative would be to have a .safe_encode method that would throw an exception if it encounters text that corresponds to a special token, mirroring what tiktoken is doing, and allowing/disallowing special tokens using a whitelist/blacklist argument. Disallowed special tokens should be treated as plain text and NOT as representation of special tokens, ie. <SYSTEM> should be tokenized as ["<", "SYSTEM", ">"] or otherwise depending on the vocabulary, but most importantly it should NOT be interpreted as the <SYSTEM> token unless explicitly enabled by the developer.

Is there an existing way of mirroring tiktoken behavior, and if not, would such a feature be useful to the library?

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2023-09-26T17:04:20Z

Hey! This is planned! The equivalent was merged for transformers in this PR. The changes for rust are a little bit more advanced, but definitely on my todo!
(if anyone wants to start and have this faster, feel free to open a PR and ping me for a review!)

bilelomrani1 · 2023-09-26T19:26:31Z

Great news, thank you for the update @ArthurZucker, I would have loved to help but I'm unfortunately not very knowledgeable in Rust. I'll gladly follow the topic and test the feature when it comes out!

Narsil · 2023-09-26T20:23:30Z

of raising an exception if it encounters text that corresponds to a special token

I feel like this is not a sane default. It has merits in certain context, possibly, but I wouldn't call that safe by any means. Prompt injection is by far not limited to injecting special tokens. Basically, any form of text can escape already. This is feels like a very weak form of safety, and defeats the purpose of having a very flexible input ground (where users can create arbitrary complex prompts, like for chat, without having to handle any special new API in this lib).

We can definitely add it, it should be quite easy, since we should only be skipping the added_vocabulary step I think (depends if the special tokens are also in the core vocab, this might vary from tokenizer to tokenizer).

bilelomrani1 · 2023-09-26T20:44:44Z

Hi @Narsil, I'm not sure I'm understanding your point, I also think that injecting special tokens is not the only way to do prompt injection, but it's at least one failure mode that should be addressable, the proposal was not meant to solve the entire issue.

As for the safety part of it, I see your point, I think the rationale behind having this default is that there is an inherent ambiguity with respect to how special tokens should be handled. Without an explicit intent from the developer, no sane default can be inferred because there are situations in which one way to handle special tokens is desired but not the other. In these sort of cases, I tend to think that throwing an exception is a sane default, at least a better one than silently making a wrong assumption.

I'm not particularly attached to the idea of throwing an exception though, having a warning, requiring a mandatory keyword argument or even just explicitly documenting the default behavior and providing an alternative behavior also accomplish roughly the same goal.

imoneoi · 2023-10-01T11:06:50Z

@ArthurZucker Great idea. I also like "split_special_tokens" to handle special tokens huggingface/transformers#26468.

imoneoi · 2023-11-10T17:32:06Z

Any updates?

github-actions · 2023-12-11T01:50:32Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

bilelomrani1 · 2023-12-11T08:27:49Z

The issue is still relevant

ArthurZucker · 2023-12-12T11:02:46Z

Yep sorry, I'll finally have time to pick it up!

bilelomrani1 · 2023-12-12T13:52:47Z

Hi @ArthurZucker, great news, no need to be sorry, keep up the amazing work 🚀 If you need help during testing don't hesitate to reach out!

ArthurZucker · 2024-01-12T16:05:56Z

Sorry for the delay have a lot on my plate but prioritizing a release next week! Including this

github-actions bot added the Stale label Dec 11, 2023

github-actions bot removed the Stale label Dec 12, 2023

ArthurZucker mentioned this issue Dec 19, 2023

add option to skip special tokens #1419

Closed

github-actions bot added the Stale label Jan 12, 2024

huggingface deleted a comment from github-actions bot Jan 12, 2024

github-actions bot removed the Stale label Jan 13, 2024

ArthurZucker mentioned this issue Jan 19, 2024

Encode special tokens #1437

Merged

ArthurZucker closed this as completed in #1437 Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitigate prompt injection attacks by supporting "safe" encoding (encoding without special tokens) #1347

Mitigate prompt injection attacks by supporting "safe" encoding (encoding without special tokens) #1347

bilelomrani1 commented Sep 24, 2023 •

edited

Loading

ArthurZucker commented Sep 26, 2023

bilelomrani1 commented Sep 26, 2023

Narsil commented Sep 26, 2023

bilelomrani1 commented Sep 26, 2023 •

edited

Loading

imoneoi commented Oct 1, 2023

imoneoi commented Nov 10, 2023

github-actions bot commented Dec 11, 2023

bilelomrani1 commented Dec 11, 2023

ArthurZucker commented Dec 12, 2023

bilelomrani1 commented Dec 12, 2023

ArthurZucker commented Jan 12, 2024

Mitigate prompt injection attacks by supporting "safe" encoding (encoding without special tokens) #1347

Mitigate prompt injection attacks by supporting "safe" encoding (encoding without special tokens) #1347

Comments

bilelomrani1 commented Sep 24, 2023 • edited Loading

ArthurZucker commented Sep 26, 2023

bilelomrani1 commented Sep 26, 2023

Narsil commented Sep 26, 2023

bilelomrani1 commented Sep 26, 2023 • edited Loading

imoneoi commented Oct 1, 2023

imoneoi commented Nov 10, 2023

github-actions bot commented Dec 11, 2023

bilelomrani1 commented Dec 11, 2023

ArthurZucker commented Dec 12, 2023

bilelomrani1 commented Dec 12, 2023

ArthurZucker commented Jan 12, 2024

bilelomrani1 commented Sep 24, 2023 •

edited

Loading

bilelomrani1 commented Sep 26, 2023 •

edited

Loading