Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes CoGroupByKey typehint from List to Iterable #22984

Merged
merged 7 commits into from
Sep 27, 2022

Conversation

ryanthompson591
Copy link
Contributor

@ryanthompson591 ryanthompson591 commented Sep 1, 2022

Changes CoGroupByKey to output an Iterable instead of a List.

This will be a breaking change and I put together a document to describe it.

The documentation of CoGroupByKey is very specific that it returns iterable and not a lists.

fixes #21556


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@github-actions github-actions bot added the python label Sep 1, 2022
@ryanthompson591
Copy link
Contributor Author

R: tvalentyn

I want to talk about this before I submit. The coGroupByKey implementation specifically does make a dictionary with a Key to list pairing.

However, the documentation is very explicit that users should expect an iterable.

I want to double check one last time we are not making a breaking change unless we have to.

@codecov
Copy link

codecov bot commented Sep 1, 2022

Codecov Report

Merging #22984 (4420e58) into master (ebacef9) will decrease coverage by 0.08%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #22984      +/-   ##
==========================================
- Coverage   73.58%   73.50%   -0.09%     
==========================================
  Files         716      718       +2     
  Lines       95311    95438     +127     
==========================================
+ Hits        70138    70148      +10     
- Misses      23877    23989     +112     
- Partials     1296     1301       +5     
Flag Coverage Δ
python 83.18% <ø> (-0.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/transforms/util.py 96.06% <ø> (ø)
sdks/go/pkg/beam/io/databaseio/writer.go 0.00% <0.00%> (-31.43%) ⬇️
sdks/go/pkg/beam/core/state/state.go 80.85% <0.00%> (-2.75%) ⬇️
...s/python/apache_beam/io/gcp/bigquery_file_loads.py 87.24% <0.00%> (-0.47%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 93.42% <0.00%> (-0.13%) ⬇️
...thon/apache_beam/ml/inference/sklearn_inference.py 95.45% <0.00%> (-0.07%) ⬇️
sdks/go/pkg/beam/util/gcsx/gcs.go 27.41% <0.00%> (ø)
sdks/go/pkg/beam/artifact/stage.go 61.87% <0.00%> (ø)
sdks/go/pkg/beam/io/filesystem/util.go 96.29% <0.00%> (ø)
sdks/go/pkg/beam/io/databaseio/database.go 24.24% <0.00%> (ø)
... and 14 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2022

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @TheNeuralBit for label python.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@TheNeuralBit
Copy link
Member

R: @tvalentyn
CC: @robertwb

@TheNeuralBit
Copy link
Member

TheNeuralBit commented Sep 1, 2022

@ryanthompson591 I can't comment on your doc so I'll mention here - it encourages users to materialize the iterable result to a list, which can lead to OOMs. I think it's ok to mention casting to a list as a last resort, but the doc should also encourage users to iterate over the iterable without materializing it if at all possible.

@TheNeuralBit
Copy link
Member

Run Python PreCommit

@ryanthompson591
Copy link
Contributor Author

@TheNeuralBit Thanks I'll update the doc to reflect that, discouraging changing the iterable to a list.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 1, 2022

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update CHANGES.md about this linking to public documentation stating how people are to fix their usage?

@ryanthompson591
Copy link
Contributor Author

Fixes issue #21556

CHANGES.md Outdated
@@ -66,7 +66,7 @@

## Breaking Changes

* X behavior was changed ([#X](https://github.com/apache/beam/issues/X)).
* Python SDK CoGroupByKey now outputs an iterable instead of a list. [#21556](https://github.com/apache/beam/issues/21556)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to link to your docs on how to fix upgrade issues here as well (https://docs.google.com/document/d/1RIzm8-g-0CyVsPb6yasjwokJQFoKHG4NjRUcKHKINu0/edit)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@damccorm
Copy link
Contributor

Fixes issue #21556

I think for this to take effect it has to be in the PR description (which you can edit to include it) - https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword

@ryanthompson591
Copy link
Contributor Author

@lukecwik I updated the CHANGES.md file. I think that since the 2.42 release is cut, it is now a good time to add this change.

@ryanthompson591
Copy link
Contributor Author

Run Python 3.9 PostCommit

Copy link
Member

@lukecwik lukecwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still a bunch of open comments on the doc that need resolution.

CHANGES.md Outdated Show resolved Hide resolved
Co-authored-by: Lukasz Cwik <lcwik@google.com>
@ryanthompson591
Copy link
Contributor Author

Thanks for the comments, I think I address the major concerns and added some new sample code.

Feel free to request edit access to that doc.

@ryanthompson591
Copy link
Contributor Author

Run Python 3.9 PostCommit

CHANGES.md Outdated Show resolved Hide resolved
Adds information about what exactly is broken as well as how to fix.
@ryanthompson591
Copy link
Contributor Author

Run Python PreCommit

CHANGES.md Outdated Show resolved Hide resolved
Simplified changes.md
@ryanthompson591
Copy link
Contributor Author

Run Portable_Python PreCommit

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM - thanks for working through all the changes/doc stuff.

@tvalentyn looks like you still have a couple open comments in https://docs.google.com/document/d/1RIzm8-g-0CyVsPb6yasjwokJQFoKHG4NjRUcKHKINu0/edit#heading=h.1thvihn3k3oi - none of them look blocking, but giving you a chance to disagree before merging this :)

@tvalentyn
Copy link
Contributor

LGTM

@tvalentyn tvalentyn merged commit 4499242 into apache:master Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix Iterable todo In CoGroupByKey
5 participants