Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify Logits Processors, Ensure Tokenizers Have Identical Interfaces #676

Closed
wants to merge 2 commits into from

Conversation

lapp0
Copy link
Contributor

@lapp0 lapp0 commented Feb 16, 2024

Implements Step 1 of #678

Changes:

  • Ensure LlamaCpp and vLLM use the same logits processors
  • Ensure LlamaCpp and vLLM both use a tokenizer subclassing outlines.models.tokenizer.Tokenizer
    • don't monkeypatch vLLM tokenizer in the logits processor itself

Benefits:

  • No more bespoke logits processors, a single canonical logits processor for all operations which share identical logic across inference engines.
    • Enables a path forward for easily implementing repetition penalty logits processor among others.
  • Provides a path forward for implementing outlines.models.vllm.

TODO

  • Unify vLLM and LlamaCpp Logits Processors
  • outlines.models.llamacpp using subclass of outlines.models.tokenizer.Tokenizer
  • New outlines.models.vllm with only the Tokenizer
  • Test cases for all interfaces of outlines.models.tokenizer.Tokenizer using all inference engines
  • smoke test vLLM
  • document new vLLM logits processor instantiation logic

Provides us a method of easily a single implementation of logits processors usable by any model. Enables a path forward for repetition penalty logits processor among others.

@rlouf
Copy link
Member

rlouf commented Feb 17, 2024

Please open an issue before opening a PR when the change is about the design of the library. Here I don't think it is worth generalising the processors at this point. Let's wait until we have another integration (TensorRT?).

@lapp0 lapp0 mentioned this pull request Feb 17, 2024
@lapp0
Copy link
Contributor Author

lapp0 commented Feb 17, 2024

Please open an issue before opening a PR when the change is about the design of the library. Here I don't think it is worth generalising the processors at this point. Let's wait until we have another integration (TensorRT?).

Created and linked an issue.

I disagree, unifying their implementation will make it easier to integrate more inference engines while ensuring they function properly with the rest of the system and don't accumulate additional technical debt.

@lapp0 lapp0 changed the title Unified outlines.processors Unify Logits Processors, Ensure Tokenizers Have Identical Interfaces Feb 17, 2024
@rlouf
Copy link
Member

rlouf commented Feb 20, 2024

Beam search is broken with ExLlamaV2, will address in Step 3.

Unless it has implications for the redesign please open a separate issue and remove this comment. Issues are actionable items, comments imply that this needs to be addressed in the current PR.

As the number of contributions increases we have to be more disciplined when it comes to issues, discussions and PRs. I will add contribution guidelines.

@lapp0
Copy link
Contributor Author

lapp0 commented Feb 20, 2024

Sounds good, I'll open a separate issue.

@rlouf
Copy link
Member

rlouf commented Feb 21, 2024

Closing for now, until the discussion in #678 has converged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants