-
-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for OpenDAL-backed remote stores (file://, initially) #19827
Conversation
Thanks @huonw ! Sorry that I didn't get to this this week: will review by EOD Monday. |
All good, no rush. It's a big one! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
|
||
pub struct Provider { | ||
/// This is public for easier testing of the action cache provider | ||
// TODO: move all the providers into a single crate so that the pub isn't necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, on the contrary: I think that it would be good to separate the grpc
based provider into a separate crate from the opendal
provider, so that they can be compiled independently / in-parallel.
The process_execution
crate had previously gotten unwieldy due to having grpc
, docker
, and nailgun
clients all living in it, and I think that the store
crate is likely to end up in the same position with another client implementation added to it (particular one as potentially-hefty as opendal
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, potentially a crate for with the traits and then separate providers is better.
This source comment is a bit ambiguous, sorry, but it's more referencing that it'd be nice to have the store and action cache providers for each service together in one crate, not necessarily that the gRPC/REAPI and OpenDAL ones should be grouped too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting into a bit of a distracting discussion, but separate crates for each provider might end up with 4 new crates, which is probably fine/manageable?
graph BT
store("fs/store (existing)")
remote("process_execution/remote (existing)")
traits("remote_providers/traits (new)")
grpc("remote_providers/reapi (new)")
opendal("remote_providers/opendal (new)")
selector("remote_providers (new)")
grpc --> traits
opendal --> traits
selector --> grpc
selector --> opendal
store --> selector
remote --> selector
Where the "top-level" remote_providers
centralises things like the functions for choosing a provider based on options, and, potentially more of the validation code that's currently in global_options.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That layout looks good, yea.
// TODO: we shouldn't need to gin up a whole copy of this struct; it'd be better to have the two | ||
// set of remoting options managed together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of potential cleanup that can happen here, yea.
Ideally we could expose a struct all the way through to Python (with a very light Python wrapper around the inner type), such that it could directly be constructed from global options. Currently that's not the case -- in particular: see ExecutionOptions, which is copied into positional args to the scheduler.
I've filed #19902 tracking the refactoring for me. |
This adds support for remote caching to the Github Actions Cache, by setting `remote_store_address = "experimental:github-actions-cache+https://..."`. This is implemented as just another OpenDAL (#19827) provider. As written, this requires the `ACTIONS_CACHE_URL` and `ACTIONS_RUNTIME_TOKEN` values to be massaged into the appropriate pants configuration. I've included basic docs for how to do this. This set-up can/should be packaged into the `init-pants` action in future, but for now, we're in experimental-land. There's various ways in which this can be improved, but I think it's helpful to have a "real world" example of a remote cache beyond the `file://` one to be able to guide the refactoring, plus good to start testing this earlier. Thus, I'd like to land this, and do them as follow-up: - better docs/help (always!), particularly on the `using-pants-in-ci` docs page, which I haven't updated here - now that I've implemented this, I don't like my `<provider>+<scheme>://...` proposal from #19827. I think it'd be better to have a separate `remote_store_provider = "experimental-github-actions-cache"` (or `"reapi"`, etc.) option and `remote_store_address` is just the raw URL (with the provider validating it can understand the URL, of course) - it'd be good to be able to pass the token in in a simpler way, I'm thinking there could be a `remote_oauth_bearer_token` option literally, which is expected (and suggested in docs) to be passed as the `PANTS_REMOTE_OAUTH_BEARER_TOKEN` env var, in addition to `remote_oauth_bearer_token_path` These are tracked in #19902 (and all of the others in that issue would be good too, but weren't as acutely surfaced by this change, I'll start chipping away at it once this is in). This is awkward to test. My plan, not implemented here, is to somehow tee up a dedicated set of tests to run within https://github.com/nektos/act somehow. For now, we'll have to go with manual testing, e.g. this specific PR is validated by https://github.com/huonw/pants-cache-github-actions-poc/actions/runs/6298908094/job/17119702612#step:11:983 This is the first example of something useful user-facing for #11149. This doesn't close it, I'll wait until we have a blob store like S3 too. The commits are individually reviewable. (Replaces #17840.)
This does a no-functionality-change refactoring of the remote _store_ providers (backed by REAPI and OpenDAL). This splits the concept of a remote provider out into their own internal "surface" crate `remote_provider`, plus three crates that are (mostly) implementation-details of that one: - `remote_provider/remote_provider_traits` for the traits (and structs!) that the various providers have to implement - `remote_provider/remote_provider_reapi` for the gRPC/REAPI-backed provider - `remote_provider/remote_provider_opendal` for the OpenDAL-backed provider They're arranged like this: ```mermaid graph BT remote("process_execution/remote (existing)") store("fs/store (existing)") traits("remote_provider_traits (new)") grpc("remote_provider_reapi (new)") opendal("remote_provider_opendal (new)") selector("remote_provider (new)") grpc --> traits opendal --> traits selector --> grpc selector --> opendal remote -- for remote execution --> grpc remote --> selector store --> selector ``` Theoretically the new crates other than `remote_provider` are an implementation detail... except there's a helper in `remote_provider_reapi` that's used for the remote _execution_ in `process_execution/remote`, in addition to the byte store and action cache, hence the dependency there. This is one point of #19902, following up on #19827 (comment). The commits are individually reviewable, although the overall PR view gives a slightly more useful view of the overall file renames for some files.
This replaces the over-complicated address URI scheme-based look-up from #19827 for deciding on which provider to use with just a normal option `[GLOBAL].remote_provider`. This defaults to `reapi`, to preserve the current (stable) behaviour. That is, now, one can set: ```toml [GLOBAL] remote_provider = "experimental-file" remote_store_address = "file:///..." ``` Or similar. Another item in #19694 The commits are somewhat usefully broken up for review.
This is the final major step that unlocks #11149:
remote_store_address = "..."
This doesn't yet resolve that issue, as it doesn't implement any sort of useful "simple" truly-remote cache, but it sets up most of the infrastructure.
User facing change
This only adds support for one additional store for now:
experimental:file:///path/to/dir
, which writes the store to/path/to/dir
.This is store is enough to validate that this functionality (mostly) works, and even seems like it would be useful for:
The user facing functionality here is configured entirely through
remote_store_address
: that option now supports URIs that have schemes other thangrpc
orgrpcs
. To start with, justfile
. It also explicitly supports these schemes being experimental, communicated through theexperimental:
scheme-prefix. This is to, hopefully, allow landing the feature earlier and with less risk.For example, running
pants test ::
in the follow example writes various blobs to subdirectories of/tmp/remote-cache-example
(e.g.action-cache/e3/2f/e32f034f...
andbyte-store/50/ce/50ce4a68...
), and a secondpants --no-pantsd --no-local-cache test ::
command is able to service it from cache, reporting//:slow-test succeeded in 5.01s (cached remotely)
Details of this change
There's three main bits of work:
base_opendal.rs
+ tests files)store/src/remote.rs
andremote/src/remote_cache.rs
)experimental:
scheme handling (inglobal_options.py
)None of these are huge in isolation, but they add up to a fair chunk of code. I think each of those files can be reviewed somewhat in isolation, in that order.
Why OpenDAL?
It's used by sccache for similar remote-caching purposes (https://github.com/search?q=repo%3Amozilla%2Fsccache%20opendal&type=code), and supports a whole lot of services:
Pants doesn't/shouldn't support all of them, but definitely blob stores and CI caches seem exciting!
Next steps
This only adds support for caching "remotely" to the local file system but I'll add support for GitHub Actions as follow up work (#19831), and maybe S3 too. Hopefully as part of doing that I'll work through any authentication (etc.) issues and so it becomes easier for others to add the backends they need too.
I've suggested we use the URI scheme for deciding on the service, rather than an explicit
remote_store_service = "reapi"
option, or similar. In future, some services may use the same scheme, e.g. I imagine several services might conventionally usehttp://
orhttps://
. My current thinking is to solve this similar to pip's VCS requirements (vcs+transport://...
) orsqlalchemy
database URLs (dialect+driver://...
), e.g.remote_store_address = "s3+http://..."
orremote_store_address = "http+s3://..."
, and then the provider would strip off thes3
part as appropriate.There's also a whole lot of TODOs that I don't think are critical to landing this code, and I can chip away at once it's in: by landing this sooner, I'm hoping we can start having the user-facing parts of #11149 being tested, validated and enjoyed sooner rather than later. I've already done a loooot of pre-factoring (see all the PRs referencing #11149), so it'd be nice to switch to having the feature in and just do some normal re-factoring. 😅
TODOs mostly tracked in #19902