- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
Description
🚀 The feature, motivation and pitch
I know this feature request sort of already exists: #5950
(and older, semi related requests) #3594 #1857
This is a similar pitch but I am creating a new issue as I noticed newer developments in the codebase. The pitch is to support returning hidden states when generating sequences. This enables many potential behaviors such as output classification, guardrails, etc. Whereas #5950 suggested a different step for embedding, I would suggest building it in as an option to EngineArgs or as an option that can be passed in with each generation request.
I see that in v0.5.1 there is already some new code in ModelDriverBase to support return_hidden_states. However, I don't see that supported yet in the LLM engine yet (not an input to EngineArgs). Basically, it seems like this feature is under development. I am mainly wondering what the timeline is for that? And what is the approach being taken so that I and the community can develop accordingly?
Alternatives
No response
Additional context
No response