-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-7819] Python - parse PubSub message_id into attributes property #9232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-7819] Python - parse PubSub message_id into attributes property #9232
Conversation
aaltay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you are ready, let's also run a test on Dataflow. Since Dataflow and Direct runner has different pubsub implementations.
…or absent message_id attribute. Add message_id output to _from_protobuf method
|
FYI, there is #8370 which is attempting to add the message id to the Beam Java SDK. |
|
Note that in Dataflow, the Python SDK uses FnAPI to read from Pub/Sub using a Java harness. In other words, the message id might be missing because Java code doesn't provide it somehow. I'll look into it later. When running locally using DirectRunner, there's a different implementation that replaces
|
I'll have a look at building the runner from this change to see if I can then access the message_id from the dataflow runner. I'm not adept at all at Java however, so wouldn't be much help there. |
The directrunner has been working ok with the above changes; I think the issue is on the dataflow runner side. The directrunner is using the _from_message method, and this is parsing correctly and returning the pubsub message id in my testing so far. |
|
Run Python Dataflow ValidatesRunner |
1 similar comment
|
Run Python Dataflow ValidatesRunner |
|
Need to fix up some tests, and the PubSubMessage tests need amending to be in line with those. Haven't had chance to test with the dataflowrunner as yet. |
|
Ok, so the PubSub timestamp is actually the google.protobuf.timestamp_pb2.Timestamp type. Have amended everything to expect this for the publish_time attribute |
|
FYI, support for message_id and publish_time in Dataflow should be available in a future update of Dataflow. Until then, those fields will appear unset or blank. |
…t import orders; remove pylint exclusion; correct message_id TODO comment for BEAM7819
Is that external to the Beam project? |
|
Run Python Dataflow ValidatesRunner |
Yes |
|
Run Python Dataflow ValidatesRunner |
|
Back to the isort error now: @aaltay are you happy for the file to be excluded? If I make the isort recommended change, it then errors due to grouping on the google side. The issue is if I leave the import in the try block, I get None exceptions all over the place for the timestamp object. Or is this simply because the try block for imports is triggered by a requirement to import the first module in the block and I have them in the wrong order? |
|
Run Python Dataflow ValidatesRunner |
|
@matt-darwin - Re: isort - Sure, it is fine to exclude it from isort. Thank you for the explanation. |
|
Run Python Dataflow ValidatesRunner |
aaltay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you @matt-darwin.
We can merge it once the other reviewers also complete their reviews.
|
@udim do you have additional comments? Can we merge this? |
udim
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment in run_pylint.sh, otherwise LGTM
|
The last comment was also addressed. Merging this. |
|
Post-commits are failing, opened: https://issues.apache.org/jira/browse/BEAM-8153 |
…apache#9232) [BEAM-7819] Python - parse PubSub message_id into attributes property (apache#9232)
…property (apache#9232)" This reverts commit 0b5beac.
According to the documentation, system generated metadata should be in the attributes property of the PubSub message, however this is not the case for message_id which is a seperate property of the protobuf PubSub message.
Parse this and add to the attributes dictionary of the beam PubSub message.
Amend tests to expect message_id k,v to be in the attributes property.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.