-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BEAM-13939: Restructure Protos to fix namespace conflicts #16961
BEAM-13939: Restructure Protos to fix namespace conflicts #16961
Conversation
4d96ebf
to
3ee7195
Compare
Codecov Report
@@ Coverage Diff @@
## master #16961 +/- ##
==========================================
- Coverage 74.12% 74.09% -0.03%
==========================================
Files 677 681 +4
Lines 89069 89209 +140
==========================================
+ Hits 66019 66098 +79
- Misses 21899 21960 +61
Partials 1151 1151
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
8e08f62
to
c75cee8
Compare
c75cee8
to
0a60f2a
Compare
For python, thinking it might actually be better to make the structure of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the proto package name be used as the basis for the directory name?
(e.g. org.apache.beam.model.job_management.v1
-> org.apache/beam/model/job_management/v1
and org.apache.beam.model.fn_execution.v1
-> org/apache/beam/model/fn_execution/v1
)
R: @lukecwik |
Is there an easy way to run these CI tasks locally? I'm trying some gradle tasks and some work, some don't and some that used to work don't work 😬 . Want to speed up my iteration speed, if there's a doc/readme I can read somewhere, that would be great too! |
I will start here |
sdks/python/container/Dockerfile
Outdated
@@ -97,7 +97,7 @@ RUN rm /opt/apache/beam/third_party_licenses/golang/LICENSE | |||
|
|||
COPY target/license_scripts /tmp/license_scripts/ | |||
RUN if [ "$pull_licenses" = "true" ] ; then \ | |||
pip install 'pip-licenses<3.0.0' pyyaml tenacity && \ | |||
pip install 'pip-licenses>=3.5.3' pyyaml tenacity && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retest this please |
1 similar comment
retest this please |
retest this please |
Run Java PreCommit |
Run GoPortable PreCommit |
Looks good for me for the proto file changes. |
Milan can confirm, but since the changes are rooted in changing how the proto files import each other, I don't know how separable each change at a PR level. Might be able to organize the changes as commits with: "proto file changes" "generator changes" "go generated code" "python things."... granularity though. I'm ambivalent, since the go side is finished review at this point, and I don't understand the nature of the python side of the change. |
I sympathize with this, it's usually good advice to break large changes into their logical components and land them separately. I think in this case, that will be a challenge; I see the change-set here as one logical component. The nature of building a polyglot SDK on top of proto is that if you change the proto, you have to change all the languages and there's really no way around this without some long migration path. To that end, I think the per-language changes here are isolated enough where you can likely hide the changes from other files and review just the python files, a similar experience to what it would be like if we housed them in their own PR. Maybe a question to ask is what we do if something inexplicably goes wrong with merging this PR: is it easier to recover quickly if we have this change set spread over multiple commits, or a single commit? A revert of 1 commit is easy, a revert of multiple commits with others interspersed is much harder. If you feel really strongly that the python changes should be in their own change set, I am happy to oblige.
Let's chat about this, I'm not sure I understand what is being said here. Why would we have to change this back to a flat structure? I think that will be impossible, hence the extensive changes in the generation tooling.
It looks like the generated go files are already in the RAT, though I do agree with Robert that maintaining the ASF license header is a better avenue, plus it reduces the diff on the PR |
2954739
to
81100c1
Compare
Just to confirm, imports of the form |
correct, this PR changes the the proto structure from a flat one to a hierarchical one, like so:
to
so the reason you can no longer do from apache_beam.portability.api.org.apache.beam.model.pipeline.beam_runner_api_pb2 import TestStreamPayload In order to make this easier to work with, I updated the proto generator to also generate module bindings in the from .org.apache.beam.model import pipeline
# ...
external_transforms_pb2 = pipeline.external_transforms_pb2
# ... so that you don't need to provide the fully qualified path. If we didn't do this, this PR would be even more huge, since we'd have to update every import of the generated bindings in the SDK |
LOG = logging.getLogger() | ||
LOG.setLevel(logging.INFO) | ||
|
||
LICENSE_HEADER = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just drop these two headers and skip the checks over adding logic to prepend them to the auto-generated files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're not strongly opposed, I suggest we keep these. My argument is less related to the build system, and more related to messaging to consumers of the SDK: if they are exploring the SDK, it is a useful notice to have so that they know this is Beam's posture
23ce537
to
35f99b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Unless there are any further objections, I can merge this in later this week.
I guess @lukecwik needs to validate/approve his requested changes first. |
…structure. This is so that the proto files require usage of a org/apache/beam/model namespace in their imports and so that the generated files also include this namespace in their source file metadata.
…enerate all proto files for the go sdk. This new tool will add any necessary options to the proto compiler and generate the proto files relative to the go sdk root to ensure that the generated files have a namespaced file path in their metadata. If you want to generate a proto file in the go sdk, simply use this script in the go:generate directive, the rest will be taken care of by the script.
…te proto bindings. Updates the README for how to generate the model proto bindings into the SDK
…w namespaced structure of the Beam model. It does this by supporting arbitrary directory structures of proto files by calculating and replacing the generated imports with relative imports with the generated source. Additionally, it will generate bindings that allow for imports of the form `from apache_beam.portability.api import beam_runner_api_pb2` so that the SDK is not dependent on the potentially changing structure of the generated bindings within `api`. Imports of the form `from apache_beam.portability.api.org.apache.beam.model import beam_runner_api_pb2` are still supported. setup.py now attempts to generate the proto bindings on invocation since the package structure must exist before the wheel can be created.
…rder to support the new python output structure
35f99b3
to
bbf3a61
Compare
thanks @lostluck @lukecwik @tvalentyn @robertwb for the thoughtful reviews, i had fun on this one! 🙏🏽 |
Can one of the admins verify this patch? |
1 similar comment
Can one of the admins verify this patch? |
Generated protobuf files contain additional information about the messages and services they were compiled from such as the file path to the original source proto file. The protobuf runtime for Golang maintains a global registry of all protobufs being used by registering the descriptors using the file path to the source proto file. If multiple descriptors with the same source file path are registered to the global registry, then the initialization code will prevent startup by panic-ing and printing a message like this to stdout:
This behavior in the Go Protocol Buffer SDK is unlikely to go away:
This change aims to bring the protobuf imports in Beam to follow the guidance from a comment in this issue filed with the protobuf repo:
As such, I've elected to place each protobuf package in a directory
org/apache/beam/model
relative to its respective module root and have updated the build system where necessary.P.S. This is still a work in progress, but opened the PR to socialize since this will affect all of Beam.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.