-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request][Go]: Support Resource Hints #23893
Comments
Updating documentation for pipeline resource hints for the Go SDK will occur closer to the 2.44.0 cut date. The milestone will be moved down the road if I don't find the time for finishing the scoped level hints. |
What do you think now? Where is this at? |
The Pipeline level stuff is complete, but we aren't getting scoped hints in
for 2.44.0. bump it to 2.45.0.
…On Thu, Dec 1, 2022, 1:17 PM Kenn Knowles ***@***.***> wrote:
What do you think now? Where is this at?
—
Reply to this email directly, view it on GitHub
<#23893 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADKDOFOXWRE3FSJ65NDH5UTWLEIXBANCNFSM6AAAAAARRQEIJU>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
@lostluck is this ready for 2.45, or should we bump it to 2.46? |
@johnjcasey I've dropped the milestone, due to changed plans. The replacement runner work and shoring up the IOs is higher priority than optimal RAM/GPU usage. |
This has been re-prioritized (admittedly, due to a need within Google), but I have an approach I like well enough for parity with both Environment Hints and Annotations on Composites. In particular, it's a mild bit of indirection through a context.Context, which do match against the Hints and annotations well enough. A new package called contextreg, where extractor functions can be added. Ones for ResourceHints on environments, and ones for Annotations for composites. Both kinds are take the "refinement" approach. The closest version is what applies to a leaf transform. eg. Whole Pipeline -> Composite A -> Composite B -> Leaf Transform C So if a given annotation is applied both Composite A and Leaf C, C will have it's own annotation attached, and should take priority for whatever purposes. This ultimately allows annotations to be attached to a Functional DoFn, which wasn't previously possible. Similarly, for Resource Hints, whole Pipeline hints adjust the default environment. But if a composite has hints sets, they form a new derived environment which has both the defaults, and the new overrides set. (At which point, runners (like dataflow) do what they need to do to union environments together.) Hanging on context avoids adding additional APIs for both Annotations and Resource Hints, but also allows for other bits to be added in as needed to the graph from context.Context in the future. The rest ends up in how runners treat the Annotations and Hints, which is not something the SDK should get in the way of, since runners are already free to ignore hints as they like. Conversely, but out of scope for now, would be to possibly re-hydrate these into context.Context that are available on execution time for access in user transforms. This would let the SDK access these in both Structural and Functional DoFns as the user sees fit. But it must be noted that neither should be used for dictating correct behavior of the DoFn if that ever comes to pass. |
I've settled on an approach that works for both PTransform metadata (Annotations, DisplayData) and for Environment metadata (ResourceHints, DisplayData, Dependencies). However, while plumbing environments isn't too bad, we need to do a lot of changes to determining the "base environment". Pipeline construction presently "back updates" the default environment after making everything a proto, by searching for the known environment id. That way we can defer cross compiling to afterwards, if necessary. But this doesn't work if the Go pipeline itself is generating multiple environments, that all use the same binary. The way I want to fix all this is to not do back updates after the fact, and instead pass the base Environment properly filled in ahead of marshalling. This avoids a bunch of wacky logic we have trying to defer doing things correctly until afterwards. We can't simply extend the existing GoWorkerRole scanning mechanism because in the future, we may have a Go Expansion Service which would naturally use that for it's own environments, which would likely not be the same binary, so we can't scan everything and make the substitutions. We might be able to add a "dummy" role that we're replacing, but that just feels like we're adding a hack to continue to support a hack. I'd rather clean it up. Some of this will end up being breaking changes around the runnerlib (Execute will no longer also do the cross compile), but we only have two runnerlib callers (universal and dataflow), and it's very unlikely that someone is silently depending on this API, as the correct approach is sending jobs in through the "universal" package, instead. But that's a discussion that can happen on that PR, not the PTransform one. |
* [#23893] Do Annotation plumbing to the graph. * fmt * Groundwork for DisplayData and Deps support * rm env change * Add plubming and simpler handling and test! * Docs and details. * Missed a rename. * pipeline doc --------- Co-authored-by: lostluck <13907733+lostluck@users.noreply.github.com>
What would you like to happen?
Support Resource Hints.
Specifically being able to apply min_ram and accelerator hints at minimum at the whole pipeline level. This would enable a certain amount of right fitting support on Google Cloud Dataflow.
While the intent is currently to support them only at the pipeline level, it makes sense for the initial infrastructure to be plumbed in at the scope level instead, to avoid a whole re-write for transform level support.
Resource Hints are part of the Environment Proto and have standard URNs and behavior (described on the site.)
The Standard urns don't indicate the format of their payloads however, so part of this work will be to see what Java and Python are doing, and documenting that in the protos. Specify your byte formats kids.
For reference, I wrote a doc on this earlier this year, where I had a few open questions: https://docs.google.com/document/d/1fHIIdY1pXrSc0GmQqdGlw3Y7UOuaVbusprTEtCUGS0I/edit#
Notably, it looks like space has stabilized for Beam at the moment.
Current implementation plan is as follows:
V0: Simple flag, and populate the "top level" graph.Scope with this information at pipeline Build time. Translate these hints to the sole Go environment in the correct format.
Current plan is "simple" where there's a hint interface.
With some underlying concrete implementations, notionally the "defaultHint" type which has the "most specific" merge behavior currently for the accelerators, and "maxHint" for currently the min_ram, where the maximum in a given scope is preserved. (See hint documentation for details). Those will likely have their own constructor helper methods for customization, and with specific functions for AcceleratorHint and MinRamHint in the same package. This would enable new standard hints to be added, or for supporting new non-standard hints if needed.
The only mechanism at this stage would be to set the hint with a flag, similar to the existing python flag.
Converting the hints into the map entry for the environment would simply be to call the URN and Payload methods.
The actual specifics for dealing with merges are moot until v1 however.
This approach implies that the resolution to how to deal with other "metadata" is to have additional packages for Annotations (for PTransforms) and for Display Data (for Pipeline, PTransforms, Environments, PCollections). At this stage, it would be best if the methods to apply those weren't added to the beam.Scope, but remain separate if possible at the moment. The Scope ends up as proxy for the environment (new usecase) and composite PTransforms (existing usecase), and we shouldn't simply be exposing analogs for the underlying protocol buffers.
V1: Needs to make choices on Go environment, since presently a single Go environment is assumed.
This additional complexity is why it's useful to split the "whole pipeline" effort from "Transforms" level efforts.
In particular, hints at different scopes requires supporting multiple environments in a single pipeline, which the Go SDK hasn't needed to do. This requires investigating the Python and Java behavior. Current expectation is that "it's the runner's job to figure them out" and consolidate compatible environments.
Shouldn't be too difficult, but it would best be verified with a thorough suite of proto inspecting code to ensure the new environments are broken up and populated correctly (eg. for any docker image swaps or similar).
It should also be said that this approach continues to hide the Environment from users of the Go SDK, as a beam implementation detail, allowing the framework to construct the protos as appropriate.
There's also a potential for a "modular" approach for dealing with hints, but we don't have sufficient complexity nor variety in hints to warrant a level of indirection that some registry system would enable. As such a single "beam/resources" package would be sufficient.
Issue Priority
Priority: 2
Issue Component
Component: sdk-go
The text was updated successfully, but these errors were encountered: