-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] protodoc: Don't use aspect #21579
Conversation
aa79a4b
to
b8156af
Compare
/docs |
Docs for this Pull Request will be rendered here: https://storage.googleapis.com/envoy-pr/21579/docs/index.html The docs are (re-)rendered each time the CI |
4ab0a22
to
5519082
Compare
thinking on this i think we can use aspects and make better use of descriptors |
@htuch experimenting with this further i dont think this is possible - using aspects with protoc is i think endemically slow in this PR its really fast as it does the following:
in this case protoc only has to parse the descriptor once - it is this that is slow pushing this to an aspect that makes better use of the descriptor file still hits the same problem of repeatedly parsing the descriptor i think this explains why i have seen people mentioning ditching aspects for protoc building in other projects |
b69a276
to
f4e0942
Compare
f4e0942
to
fee8819
Compare
fee8819
to
aec7998
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find - let's focus the reviews on this work as the main win.
aec7998
to
3733ac3
Compare
making this WIP while i add parallel protoc procs and until #21559 has landed |
3733ac3
to
e7f32c9
Compare
CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to |
I have added a parallel runner upstream (https://github.com/envoyproxy/pytooling/blob/main/envoy.base.utils/envoy/base/utils/parallel_runner.py) which divides jobs according to num of cpus I have also added 2 further macros:
i moved these to macros as these are common patterns and both could be used elsewhere immediately re the 2nd macro, none of the the ill move the macros to a separate pr |
tools/base/parallel.py
Outdated
from aio.run import runner | ||
|
||
|
||
@abstracts.implementer(IExecutive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please note this code is no longer part of this PR - the code was moved upstream here - https://github.com/envoyproxy/pytooling/blob/main/envoy.base.utils/envoy/base/utils/parallel_runner.py
i should also note that both the code.check and dependency.check tools effectively orchestrate parallel procs - its this that has made them much faster
implementing this in bazel should be possible - but as commented elsewhere i dont think you can get the number of available cores for tasks only in tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's some practical balance between bazel parallelizing vs it exec'ing tools that then parallelize. If the common use case was to run N tools in bazel at the same time, and they also fanned out, that could cause issues, but I've mostly found in the common case you'll just exec 1 or 2 tools that can parallelize and I think that's alright.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... run N tools in bazel at the same time, and they also fanned out, that could cause issues
yep, agreed that is a concern - the forking procs are essentially not using bazel's proc pool
that said, i think its mostly a theoretical concern, OSs are pretty good at managing simultaneous multicore procs these days
in this case the amount of time spent in my current target (building //docs:api_rst
) falls from ~7mins on 4 cores -> ~1m.20s running serially
that suggests there is a lot less work being done overall, by a pretty big margin
pushing the new code onto 4 cores takes the build time down <30s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's not just CPU scheduling, it's also the ability to do incremental doc builds, right? If Bazel is aware of inputs to individual RST builds, we can avoid rebuilding, so the incremental iteration case becomes faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have a reasonable implementation of a stdin <> stdout line receiver that i was working on previously (envoyproxy/toolshed#273)
im going to try:
- add protobuf Worker parsing
- calling grpcio tools to run a "protoc" like command in python
- setup above as persistent worker
if that works i think we will get the best of all worlds - aspects, incremental builds, bazel defined parallelism, efficient parsing of compile data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have a vaguely working protoc-plugin-compiler-as-a-service
i wasnt able to get it working using protoc itself via grpc_tools
- altho they wrap the c++ lib they call cli.Run(argc, etc)
so afaict there is not a way to use it without hitting the same problem of reparsing the descriptor file on every invokation
i think it does a lot of things we dont want anyway
what i was able to do was read the proto file source (FileDescriptorProto
) from a descriptor set and feed that directly to the api plugin
i still need to hook up the bazel service and above needs a bit of work yet but i think its pretty doable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other downside to the approach here (so far) is that it is using the compiled protoc
i havent got the new plan quite working yet, but my initial r&d suggests that cutting this out - using the protoc
in rules_proto
to derive the initial descriptor, and then using python to generate the plugin/rst output speeds things up considerably
i need to confirm everything yet, but its looking like we are going from ~7min -> ~20s for api rst build !!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after getting the aspect + worker working it takes what was originally a ~7min job (rebuilding api_rst
after python change, so all api files) down to ~15s
wrt re/compiling protoc
this is still an issue i think, but a separate one
im going to close this PR in favour of #21671 given that it is both faster and preferable for the reasons discussed above
Signed-off-by: Ryan Northey <ryan@synca.io>
Signed-off-by: Ryan Northey <ryan@synca.io>
Signed-off-by: Ryan Northey <ryan@synca.io>
Signed-off-by: Ryan Northey <ryan@synca.io>
Signed-off-by: Ryan Northey <ryan@synca.io>
0110aa5
to
46eb531
Compare
Builds docs an order of magnitude faster, removes code Signed-off-by: Ryan Northey <ryan@synca.io>
46eb531
to
f0e986b
Compare
Builds api docs an order of magnitude faster, removes code, uses much less resources (cpu/mem/disk/io/space)
Signed-off-by: Ryan Northey ryan@synca.io
Commit Message:
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]