-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic: Embedded Python VM, Plugins and UDFs #25537
Comments
@pauldix and @david (cc @jacksonrnewhouse) - Is there a specification for this? Looking at the list and thinking through security aspects, at first glance these are interesting from a security POV:
Other questions that come to mind:
I'll stop there as I suspect it's already too many questions (and it's not exhaustive). |
@jdstrand The security model for all of this is pretty basic in the open source build. We already have a token setup where users make requests with tokens and they either get full access (i.e. they can do anything on the API) or no access. When users submit plugins to the database, that is what will be checked and the resulting plugin will run with full access to the local DB and server (as it's just a Python VM). We can make the plugin system something that can be turned off via configuration so that it won't run. The commercial Pro version will have finer grained controls, but we'll define that later based on customer requests and needs. Plugin code can ultimately come from anywhere. We won't be gating what plugins exist. This is by design, we don't want people to have to submit a PR to a repo we own and then wait for review from us to create their own plugins or share code with others. If we end up setting up a service like Crates.io, that will be publicly accessible on the internet and anyone will be able to create an account and upload plugin code, which could be accessed by others. Just as with any of similar services online, we provide no guarantees that plugins uploaded by random people don't contain malware. We will likely have a list of vetted plugins (or ones created by us) for our customers. My expectation is that plugins will mostly be single files, so we may not need to bother with a service. A simple mechanism in the server that is able to pull from say a Gist or GH repo would suffice. I was thinking of a service for the added benefit of being able to have approved plugins and to have one place to go to search for user created plugins (that doesn't require us explicitly updating it). We're too early in the process to have many of these things answered. The goal is to get an alpha of the functionality released to the community and then iterate based on feedback and use. |
Note to self and answering my own question and looking at a very early in progress PR, we are planning on (/exploring) using https://crates.io/crates/pyo3 to embed a python interpreter. The docs on that site mention using an existing python shared library from the system. I plan to watch how the implementation evolves (don't have to answer now) wrt what we are embedding (depend on the system? will we build it ourselves? grab it from somewhere official? etc) as there is a security maintenance angle here. |
Sharing this because this may save you some headache: Note that you may have trouble having multiple interpreters running in a single process, because many extension modules (via C or via Rust) have global state (= static variables) that can only exists one (because the underlying Linux lib is only runtime-linked ONCE into the process). See PEP 489 for more details, esp. the "legacy init" section. Pyo3 for example doesn't support multi-phase inits yet (ref PyO3/pyo3#2274). So I think you have the following options:
|
This has a nice property that it also allows for the opportunity to run the interpreter under another UID, which could be a meaningful security hardening measure (eg, database runs as one user (eg, |
I forgot one option in #25537 (comment) : Use a WASM VM like wasmtime and Pyodide. That however makes the package installation more difficult since |
This is an umbrella for many issues related to adding a Python VM to the database. This will require work in API, CLI, and the internals. There should be an easy way for users to define Python based plugins that run inside the database that are able to receive data, process it, interact with third party APIs and services, and send data back into the database. Ideally, the runtime would be able to import libraries from the broader Python ecosystem and work with them.
This issue is by no means exhaustive, but can serve as a jumping off point for further refinement and detail.
Here are the contexts under which we'd want to run:
/api/v3/plugins/<name>
and send request body and headers to the plugin)Within the plugin context, the Python script should have some API automatically imported that allows it to make queries to the database, or write data out to the database. We'll also want to have an in-memory key/value store accessible by the script. Each script should have its own sandboxed store.
We'll need to have an API and CLI for submitting new Python scripts to the DB. We'll also want to collect logs from the scripts and make those accessible via system tables in the query API.
We'll need to have a method for storing and accessing secrets in plugins for connecting to services.
We ultimately want these scripts to be user defined plugins or functions. We'll run a service (like crates.io) for hosting these and will want a method to quickly and easily bring plugins or functions from that service into the database.
Some plugin ideas:
We should create these plugins ourselves to test drive the developer experience of plugin creation, operation, and debugging.
The text was updated successfully, but these errors were encountered: