Make Outlines' cache reusable across startup by making the cache's key as string #1129

Lap1n · 2024-09-04T19:55:27Z

When using vllm and outlines, in our use cases, it seems that the diskcache functionality is not working. Every time the server is started, it doesn't seem to be able to reuse the previously computed FSM cache.

One way that can fix this issue is to serialize the cache key object as a string.

See this issue.

stas00 · 2024-09-10T21:45:45Z

I second that with vllm the cache is being ignored and FSM recompiled on every use.

I manually applied your patch and validated it works. Shaves about 3 secs off from 4 secs now on the first call.

thank you for the fix, @Lap1n

cpfiffer · 2024-09-11T16:26:13Z

Is there a way to add a test for this issue?

stas00 · 2024-09-11T16:31:00Z

That's a great idea: run the request twice, capture the log, and check that the "Compiling FSM index for all state transitions" message appears only once.

cpfiffer · 2024-09-11T16:33:58Z

Seems like it's prone to failure if the message ever changes. I'm not super familiar with this part of the codebase, but could we just check the cache directly somehow?

Lap1n · 2024-09-11T16:49:34Z

Good idea, I am looking into adding a test case for it

stas00 · 2024-09-11T18:58:12Z

Seems like it's prone to failure if the message ever changes.

"Compiling FSM index for all state transitions" is printed by outlines so there should be no problem whatsoever to anchor on this string, since if it changes, the test will fail and it will be fixed in the same PR changing the info log.

But, of course, this is just a proposal, if there are other ways - that works for me.

cpfiffer · 2024-09-11T23:42:24Z

"Compiling FSM index for all state transitions" is printed by outlines so there should be no problem whatsoever to anchor on this string, since if it changes, the test will fail and it will be fixed in the same PR changing the info log.

Imo this is just kicking the can down the road. We should try and find a good testing case early so we don't run into this again way later. If it's super hard to do, it's super hard to do, that's fine -- but it is a band-aid test rather than a solid one.

Lap1n · 2024-09-12T15:00:06Z

@cpfiffer I added a test that fails on main and works on this PR and in vLLM. It seems that the issue is with using class method as the function to cache (vLLM use it similarly).

cpfiffer

Awesome, thank you! I think this should work. Let me get eyes from the outlines adults.

brandonwillard · 2024-09-12T17:04:40Z

It seems that the issue is with using class method as the function to cache (vLLM use it similarly).

What is it about a class method that's causing this? Is it adding the class instance argument to the cache information/key?

Lap1n · 2024-09-12T17:37:54Z

Yes, in addition, the reinstantiation of the object arguments also causes the issue. Probably an issue with the binary data being different (although the object attributes are the same). That is why it is most likely safer to serialize the args as a string instead of a binary key. So in short, I was able to reproduce the issue with using classmethod function and objects as arguments.

brandonwillard · 2024-09-12T17:47:36Z

That is why it is most likely safer to serialize the args as a string instead of a binary key. So in short, I was able to reproduce the issue with using classmethod function and objects as arguments.

From what I recall about the caching setup/dependencies, forcing conversion to a string is problematic in different ways. For example, it removes the ability to set and/or use the underlying serialization implementations for the objects involved, and it leans on __str__ implementations that aren't usually designed for serialization and the like.

Lap1n · 2024-09-12T18:00:13Z

I see, but it would still be better than the current implementation which is not using as well the underlying serialization implementations and is instead storing it as a binary python object (which is most likely unsafe), correct? Please correct me if I missed something, new to diskcache's internal working.

brandonwillard · 2024-09-12T18:07:57Z

I see, but it is still better than the current implementation which won't use as well the underlying serialization implementations and will instead store it as a binary python object (which is most likely unsafe), correct?

That's not clear. It could start failing again if a __str__ implementation changes in any of the objects involved, among other things.

It sounds like we need to directly address the issues involved with applying this to a method.

Lap1n · 2024-09-12T18:08:47Z

But if str impl change (and I guess it would be ok that it fails), I guess it would be failing as well with binary implementation as well?

Lap1n · 2024-09-12T18:16:57Z

I understand your point, we could indeed take a look at a better solution. I didn't see this fix as a final one as well, but mostly as an intermediate one as it currently supports all the features of the previous approach, with this additional benefit of using serialized data as key and making vLLM with Outlines usable in a production setting which is currently unusable without it (with autoscaling etc). FSM recompilation can take >1min for some prompts.

Although, another alternative to quickly fix could maybe be to modify the usage of outlines's cache in vLLM.

brandonwillard · 2024-09-12T18:35:04Z

Another alternative to quickly fix this would be to modify the usage of outlines's cache in vLLM.

That sounds like the best approach, because—otherwise—this becomes a feature request to make cache support [class]methods.

If we're specifically talking about vllm's use of *.outlines_logits_processors.*LogitsProcessor._get_guide, it seems like this change could be very straightforward, as well.

Lap1n · 2024-09-16T17:06:09Z

@brandonwillard seems like the error is not related to class methods after all. I've tried to extract the classmethod into an external function and it still doesn't work in vLLM (code here and here). It seems that it is very hard to reproduce with unit tests, but fairly easy when running and restarting any Python script using Outline's cache.

I wonder if it could be related to Diskcache's caveats of the default key serialization implementation.

Thoughts?

brandonwillard · 2024-09-16T17:42:51Z

@brandonwillard seems like the error is not related to class methods after all. I've tried to extract the classmethod into an external function and it still doesn't work in vLLM (code here and here). It seems that it is very hard to reproduce with unit tests, but fairly easy when running and restarting any Python script using Outline's cache.

We'll need to know exactly where the cache misses are. There's a nested application of the cache in the call to create_states_mapping from RegexGuide.__init__, so we need to know if the misses you're observing are there or in vllm's direct use of cache. Also, the latter changes the tokenizer object through _adapt_tokenizer, which could affect the hashing.

brandonwillard · 2024-09-16T17:49:15Z

I wonder if it could be related to Diskcache's caveats of the default key serialization implementation.

This mostly emphasises the need to implement consistent serialization for the objects involved; otherwise, I don't remember where the cloudpickle we're using stands with regard to the tuple-related issues.

brandonwillard · 2024-09-16T17:58:52Z

@Lap1n, I just noticed that cache is being applied to a tokenizer argument in vllm that's typed as PreTrainedTokenizerBase. In outlines, the tokenizer arguments are of Tokenizer type, and, more specifically, the TransformerTokenizer subtype that has custom serialization methods (i.e. __[get|set]state__ methods). That alone could be the issue (either from within outlines itself or vllm).

In other words, vllm may need to wrap those PreTrainedTokenizerBase arguments in a type that implements a cacheable serialization like TransformerTokenizer.

Are you able to reproduce this with only outlines, and, if so, which tokenizer? It could be that the transformers serialization we're effectively leaning on isn't working for this use case.

Lap1n · 2024-09-17T15:11:04Z

@brandonwillard I see, I previously tried Hugging Face's tokenizers as arguments (e.g. the tokenizer for llama3 8b) in the test case when I was trying to reproduce the issue, but it seems to work with them as well with the current implementation🤔

I will try to investigate on vLLM side now to see if it could be the tokenizer issue.

Thanks for investigating btw!

brandonwillard · 2024-09-17T17:54:59Z

Thanks for investigating btw!

Thank you for helping find and debug this! Also, don't hesitate to keep us in the loop regarding the vLLM side of things.

Lap1n added 2 commits September 4, 2024 15:49

fix: cache key as string

8ae4086

remove comment

8c881d2

Lap1n mentioned this pull request Sep 4, 2024

Outlines's cache not reusable across vllm startup #1130

Open

Lap1n marked this pull request as ready for review September 4, 2024 20:03

Lap1n changed the title ~~fix: cache key as string~~ Make Outlines' cache reusable across startup by making the cache's key as string Sep 9, 2024

Lap1n added 2 commits September 9, 2024 17:10

Merge branch 'main' into fix/cache-key-as-string

3123fbd

Merge branch 'main' into fix/cache-key-as-string

05ab96c

Lap1n mentioned this pull request Sep 10, 2024

[Performance]: guided generation is very slow in offline mode vllm-project/vllm#8313

Open

Merge branch 'main' into fix/cache-key-as-string

1d6ceb5

Lap1n and others added 3 commits September 12, 2024 10:47

add tests

058f505

Merge branch 'main' into fix/cache-key-as-string

ab4ddd1

remove not need line

9851ec6

cpfiffer approved these changes Sep 12, 2024

View reviewed changes

remove unused code section

7496ffe

Merge branch 'main' into fix/cache-key-as-string

af69e91

brandonwillard mentioned this pull request Sep 12, 2024

Make cache support methods #1145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Outlines' cache reusable across startup by making the cache's key as string #1129

Make Outlines' cache reusable across startup by making the cache's key as string #1129

Lap1n commented Sep 4, 2024 •

edited

Loading

stas00 commented Sep 10, 2024 •

edited

Loading

cpfiffer commented Sep 11, 2024

stas00 commented Sep 11, 2024

cpfiffer commented Sep 11, 2024

Lap1n commented Sep 11, 2024

stas00 commented Sep 11, 2024 •

edited

Loading

cpfiffer commented Sep 11, 2024

Lap1n commented Sep 12, 2024

cpfiffer left a comment

brandonwillard commented Sep 12, 2024 •

edited

Loading

Lap1n commented Sep 12, 2024 •

edited

Loading

brandonwillard commented Sep 12, 2024

Lap1n commented Sep 12, 2024 •

edited

Loading

brandonwillard commented Sep 12, 2024 •

edited

Loading

Lap1n commented Sep 12, 2024

Lap1n commented Sep 12, 2024 •

edited

Loading

brandonwillard commented Sep 12, 2024

Lap1n commented Sep 16, 2024 •

edited

Loading

brandonwillard commented Sep 16, 2024

brandonwillard commented Sep 16, 2024

brandonwillard commented Sep 16, 2024 •

edited

Loading

Lap1n commented Sep 17, 2024 •

edited

Loading

brandonwillard commented Sep 17, 2024

Make Outlines' cache reusable across startup by making the cache's key as string #1129

Are you sure you want to change the base?

Make Outlines' cache reusable across startup by making the cache's key as string #1129

Conversation

Lap1n commented Sep 4, 2024 • edited Loading

stas00 commented Sep 10, 2024 • edited Loading

cpfiffer commented Sep 11, 2024

stas00 commented Sep 11, 2024

cpfiffer commented Sep 11, 2024

Lap1n commented Sep 11, 2024

stas00 commented Sep 11, 2024 • edited Loading

cpfiffer commented Sep 11, 2024

Lap1n commented Sep 12, 2024

cpfiffer left a comment

Choose a reason for hiding this comment

brandonwillard commented Sep 12, 2024 • edited Loading

Lap1n commented Sep 12, 2024 • edited Loading

brandonwillard commented Sep 12, 2024

Lap1n commented Sep 12, 2024 • edited Loading

brandonwillard commented Sep 12, 2024 • edited Loading

Lap1n commented Sep 12, 2024

Lap1n commented Sep 12, 2024 • edited Loading

brandonwillard commented Sep 12, 2024

Lap1n commented Sep 16, 2024 • edited Loading

brandonwillard commented Sep 16, 2024

brandonwillard commented Sep 16, 2024

brandonwillard commented Sep 16, 2024 • edited Loading

Lap1n commented Sep 17, 2024 • edited Loading

brandonwillard commented Sep 17, 2024

Lap1n commented Sep 4, 2024 •

edited

Loading

stas00 commented Sep 10, 2024 •

edited

Loading

stas00 commented Sep 11, 2024 •

edited

Loading

brandonwillard commented Sep 12, 2024 •

edited

Loading

Lap1n commented Sep 12, 2024 •

edited

Loading

Lap1n commented Sep 12, 2024 •

edited

Loading

brandonwillard commented Sep 12, 2024 •

edited

Loading

Lap1n commented Sep 12, 2024 •

edited

Loading

Lap1n commented Sep 16, 2024 •

edited

Loading

brandonwillard commented Sep 16, 2024 •

edited

Loading

Lap1n commented Sep 17, 2024 •

edited

Loading