[RFC]: vLLM plugin system #7131

youkaichao · 2024-08-04T23:08:05Z

Motivation.

There is an increasing need to customize vLLM, including:

out-of-tree model registration, where users want to register their model outside of vLLM repo. This is partially fulfilled by [Core] enable out-of-tree model register #3871 . But later users find that it does not work in distributed setting with ray: [Bug]: Ray distributed backend does not support out-of-tree models via ModelRegistry APIs #5657
custom executor class, already added in [Core] Allow specifying custom Executor #6557
custom scheduler, requested in [RFC]: Replaceable Scheduler #7123
custom tensor parallel implementation, requested in [RFC]: Model architecture plugins #7124

Usually, the request is to swap out some functions / classes in vLLM, or call some functions before vLLM runs the model. While implementing them in vLLM is not difficult, the maintenaince burden grows.

In order to satisfy the growing need of customization, I propose to introduce vLLM plugin system.

It is inspired by the pytest community, where a plugin is a standalone pypi package, e.g. https://pypi.org/project/pytest-forked/ .

#7130 is a draft implementation, where I added a new env var VLLM_PLUGINS. The way it works, is similar to the operating system's LD_PRELOAD, with a colon-separated list of python modules to import.

One of the most important concern, is to fight against arbitrary code execution risk. When a user serves a model using vLLM, the endpoint user cannot activate the plugin, so this does not suffer from code injection risk. However, there is indeed a risk, if the user runs vLLM in an untrusted environment. In this case:

we require the plugin package name starts with vllm_ , so that vLLM user does not accidentally add irrelevant modules to execute.
we explicitly log the plugin module vLLM is using, so that vLLM user can easily see if any unexpected code is executed.

With these efforts, the security level should be the same as LD_PRELOAD. And since LD_PRELOAD exists for so many years, I think VLLM_PLUGINS should be acceptable in terms of security risk.

Proposed Change.

see #7130 for the draft implementation

Feedback Period.

No response

CC List.

No response

Any Other Things.

No response

The text was updated successfully, but these errors were encountered:

NadavShmayo · 2024-08-06T09:01:34Z

This looks like a step in the right direction to me (:

I have 2 questions regarding this:

Currently it seems like it wouldn't really be possible to replace the scheduler implementation, or add the uneven tensor-parallel implementation using these plugins, or at least not in an intuitive way that I can see.
Don't you think it would make more sense to make different types of plugins each focused on a different cause?
For example one plugin type would be a scheduler plugin, another would be a model architecture plugin, and perhaps a few more.
This could come as an addition to your suggestion of "general-purpose" plugins, but in most cases it'd be simpler to implement a well-defined plugin interface instead of a general-purpose one.
I suggest using Python's built in entrypoints to implement plugins, similar to how pytest implements it (see [Misc] Logits processor plugins #4769 which I worked on as an example for this), are there any advantages you see with the environment variable approach compared to this approach?

I really believe implementing such plugin system concept could make vLLM an even greater technology, and personally it could solve a lot of problems for me by allowing great modularity and costumization.
I'd be more than happy to help in implementing it (:

youkaichao · 2024-08-06T17:22:30Z

Currently it seems like it wouldn't really be possible to replace the scheduler implementation, or add the uneven tensor-parallel implementation using these plugins, or at least not in an intuitive way that I can see.

It might not be easy, but should be possible. By allowing loading a plugin, the plugin has the total control to do anything it wants. In the extreme case, swap the whole vLLM code into another implementation.

For example one plugin type would be a scheduler plugin, another would be a model architecture plugin, and perhaps a few more.

We can consider this as a TODO. It needs to clean up the interface of each components in vLLM, so that users can bring in their implementation more easily.

In the begining, we can reserve the space for them, e.g. use vllm_general_plugin for a general plugin that is blindly executed, and later introduce vllm_scheduler_plugin that is dedicated to replace scheduler.

Currently, we can have vllm_models_plugin to register out-of-tree models, and vllm_executor_plugin to register user-specified executor.

I suggest using Python's built in entrypoints to implement plugins, similar to how pytest implements it

I think this is great! I didn't know it before. It is much better than env var I think. The only concern is, if users installed many plugins for the same component, e.g. scheduler, how can they select the one they want? We might need to design some config file format, to determine which plugins to use.

simon-mo · 2024-08-08T03:38:49Z

I think either LD_PRELOAD way or the Python entrypoints ways are proven patterns. At the current experimental stage, I have two concerns:

We should stress that this is experimental feature and the API can change without notice. Therefore, we should make sure the variable names are VLLM_EXPERIMENTAL_PLUGINS etc. Also in any documentation or examples, these are highly subject to change; and any plugins can break across any version of vLLM.
Plugins works because they have a well documented interface that is backward compatible. We cannot guarantee any of that at the current state of the project. Therefore, we should start thinking about what interfaces are exposed and start thinking about which ones to stabilize.

Regarding the exact code being executed, I don't have much concern about security, rather it is how the plugins is being called and invoked. Will it swap in a class implementation for an abstract class, or some function, or insert some callbacks? It does seems like it needs several use cases to prove out and design over time.

youkaichao · 2024-08-08T04:18:51Z

It does seems like it needs several use cases to prove out and design over time

agree. so this RFC is just a start to explore how we interact with plugins. There are already 2 usecases now: out-of-tree model registration, and user-specified executor registration.

Plugins works because they have a well documented interface that is backward compatible.

that is the stable state of plugin system. we don't need to guarantee that at the moment. it is the plugin's author's responsibility to keep their plugin up-to-date. and we can see what the community makes out of the plugin, and gradually make some part of the system pluggable with stable API.

we should make sure the variable names are VLLM_EXPERIMENTAL_PLUGINS

I think we can directly call it VLLM_PLUGIN. Although the plugin system is immature for plugin developer, the usage is quite stable for end-users. They will need to install some packages and select them.

youkaichao · 2024-08-08T04:20:14Z

Overall I got positive feedback for this RFC.

I will use the entrypoint mechanism mentioned by @NadavShmayo to collect all the installed plugins, and use VLLM_PLUGIN to control which plugins are loaded.

NadavShmayo · 2024-08-12T19:25:50Z

I see that you have already implemented the general-purpose plugin system, nice!
I implemented a basic version of the model architecture plugin system, which as we discussed I think comes as an addition to this.

Would be great if you could have a look at #7438 and give your feedback.

youkaichao · 2024-08-13T23:27:20Z

#7426 finished the framework.

TODOs:

doc update
modular plugins for specific purpose

toslunar · 2024-09-10T04:26:21Z

I use sitecustomize and PYTHONPATH to deploy custom models with parallelism. In other words, Python has a plugin system to a certain extent. For security, it's nice that PYTHONPATH must be audited anyway, while we might miss VLLM_PLUGINS unless it's well-documented. In my opinion, PyTest cannot utilize sitecustomize because PyTest and its plugin system should not disturb other packages much. vLLM doesn't have to work with arbitrary packages (for example, since #2155 many dependencies are pinned).

youkaichao added the RFC label Aug 4, 2024

youkaichao mentioned this issue Aug 12, 2024

[misc][plugin] add plugin system implementation #7426

Merged

NadavShmayo mentioned this issue Aug 12, 2024

[Core][Model][Frontend] Model architecture plugins #7438

Open

3 tasks

tdoublep mentioned this issue Oct 24, 2024

[RFC]: Add support for IBM Spyre accelerator #9652

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: vLLM plugin system #7131

[RFC]: vLLM plugin system #7131

youkaichao commented Aug 4, 2024

NadavShmayo commented Aug 6, 2024

youkaichao commented Aug 6, 2024

simon-mo commented Aug 8, 2024

youkaichao commented Aug 8, 2024

youkaichao commented Aug 8, 2024

NadavShmayo commented Aug 12, 2024

youkaichao commented Aug 13, 2024

toslunar commented Sep 10, 2024

[RFC]: vLLM plugin system #7131

[RFC]: vLLM plugin system #7131

Comments

youkaichao commented Aug 4, 2024

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

NadavShmayo commented Aug 6, 2024

youkaichao commented Aug 6, 2024

simon-mo commented Aug 8, 2024

youkaichao commented Aug 8, 2024

youkaichao commented Aug 8, 2024

NadavShmayo commented Aug 12, 2024

youkaichao commented Aug 13, 2024

toslunar commented Sep 10, 2024