-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Fix(yaml): Handle missing optional fields in JSON parsing #35288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jonathaningram possible to validate this PR from your side? Feel free to review it as well. |
Assigning reviewers: R: @claudevdm for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks right to me and I tested a Dataflow Beam YAML pipeline without this change and with it. I can confirm the key error goes away with this change. I tested a missing Pub/Sub message field but not a missing attribute. I assume it works for both though.
Maybe there could be a corresponding docs update to go with this PR, e.g., tell users what happens if they have missing fields, but leave that with you to decide on.
Good idea. Added this to CHANGES.md. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive by comment: it'd be nice if there was a test ensuring we still fail for non optional fields.
@robertwb is it even possible to define an optional or non-optional field? As described in the original issue #35179, I couldn't work out how to specify "required-ness" on my schema. I did notice in the tests in this PR that |
By default, all properties in a json schema are optional; to declare them otherwise one uses the required field: https://json-schema.org/understanding-json-schema/reference/object#required which we respect in Beam: https://github.com/apache/beam/blob/release-2.65/sdks/python/apache_beam/yaml/json_utils.py#L67 . This function takes as input a schema_pb2.FieldType and should respect whether the types in question are optional (though I'm not saying it might not be to strict now). |
Good point. The original PR indeed did not force this requirement. I updated the code to check the required fields. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #35288 +/- ##
==========================================
Coverage 54.50% 54.51%
Complexity 1559 1559
==========================================
Files 1035 1036 +1
Lines 161595 161782 +187
Branches 1139 1139
==========================================
+ Hits 88084 88189 +105
- Misses 71380 71462 +82
Partials 2131 2131
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
* Fix(yaml): Handle missing optional fields in JSON parsing * updated the release doc * check the required fields * check the nullable at the beginning * fixed the pickle error
Fixes #35179
When using ReadFromPubSub with a schema in Beam YAML, the pipeline would fail with a KeyError if a field specified in the schema was missing from the incoming JSON message.
This commit fixes the issue by modifying the
json_to_row
function inapache_beam/yaml/json_utils.py
. The direct dictionary accessvalue[name]
is replaced withvalue.get(name)
to safely handle missing keys, returningNone
instead of raising an error.The converters for array, map, and row types have also been made robust to handle
None
values, which can occur for missing optional fields of these complex types.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.