-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Outlines' cache reusable across startup by making the cache's key as string #1129
base: main
Are you sure you want to change the base?
Conversation
I second that with vllm the cache is being ignored and FSM recompiled on every use. I manually applied your patch and validated it works. Shaves about 3 secs off from 4 secs now on the first call. thank you for the fix, @Lap1n |
Is there a way to add a test for this issue? |
That's a great idea: run the request twice, capture the log, and check that the "Compiling FSM index for all state transitions" message appears only once. |
Seems like it's prone to failure if the message ever changes. I'm not super familiar with this part of the codebase, but could we just check the cache directly somehow? |
Good idea, I am looking into adding a test case for it |
"Compiling FSM index for all state transitions" is printed by But, of course, this is just a proposal, if there are other ways - that works for me. |
Imo this is just kicking the can down the road. We should try and find a good testing case early so we don't run into this again way later. If it's super hard to do, it's super hard to do, that's fine -- but it is a band-aid test rather than a solid one. |
@cpfiffer I added a test that fails on main and works on this PR and in vLLM. It seems that the issue is with using class method as the function to cache (vLLM use it similarly). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you! I think this should work. Let me get eyes from the outlines adults.
What is it about a class method that's causing this? Is it adding the class instance argument to the cache information/key? |
Yes, in addition, the reinstantiation of the object arguments also causes the issue. Probably an issue with the binary data being different (although the object attributes are the same). That is why it is most likely safer to serialize the args as a string instead of a binary key. So in short, I was able to reproduce the issue with using classmethod function and objects as arguments. |
From what I recall about the caching setup/dependencies, forcing conversion to a string is problematic in different ways. For example, it removes the ability to set and/or use the underlying serialization implementations for the objects involved, and it leans on |
I see, but it would still be better than the current implementation which is not using as well the underlying serialization implementations and is instead storing it as a binary python object (which is most likely unsafe), correct? Please correct me if I missed something, new to diskcache's internal working. |
That's not clear. It could start failing again if a It sounds like we need to directly address the issues involved with applying this to a method. |
But if str impl change (and I guess it would be ok that it fails), I guess it would be failing as well with binary implementation as well? |
I understand your point, we could indeed take a look at a better solution. I didn't see this fix as a final one as well, but mostly as an intermediate one as it currently supports all the features of the previous approach, with this additional benefit of using serialized data as key and making vLLM with Outlines usable in a production setting which is currently unusable without it (with autoscaling etc). FSM recompilation can take >1min for some prompts. Although, another alternative to quickly fix could maybe be to modify the usage of outlines's cache in vLLM. |
That sounds like the best approach, because—otherwise—this becomes a feature request to make If we're specifically talking about |
@brandonwillard seems like the error is not related to class methods after all. I've tried to extract the classmethod into an external function and it still doesn't work in vLLM (code here and here). It seems that it is very hard to reproduce with unit tests, but fairly easy when running and restarting any Python script using Outline's cache. I wonder if it could be related to Diskcache's caveats of the default key serialization implementation. Thoughts? |
We'll need to know exactly where the cache misses are. There's a nested application of the cache in the call to |
This mostly emphasises the need to implement consistent serialization for the objects involved; otherwise, I don't remember where the |
@Lap1n, I just noticed that In other words, Are you able to reproduce this with only |
@brandonwillard I see, I previously tried Hugging Face's tokenizers as arguments (e.g. the tokenizer for llama3 8b) in the test case when I was trying to reproduce the issue, but it seems to work with them as well with the current implementation🤔 I will try to investigate on vLLM side now to see if it could be the tokenizer issue. Thanks for investigating btw! |
Thank you for helping find and debug this! Also, don't hesitate to keep us in the loop regarding the vLLM side of things. |
When using vllm and outlines, in our use cases, it seems that the diskcache functionality is not working. Every time the server is started, it doesn't seem to be able to reuse the previously computed FSM cache.
One way that can fix this issue is to serialize the cache key object as a string.
See this issue.