Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windowing Support for the DaskRunner #27618

Closed
wants to merge 176 commits into from

Conversation

alxmrs
Copy link
Contributor

@alxmrs alxmrs commented Jul 22, 2023

This CL adds basic Windowing support to this runner, including a few tests for side inputs.

Take two of #23913. It looks like some of the issues in CI causing this dask change to fail were resolved on their own with changes in master.

Reviewers: @jrmccluskey


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

alxmrs and others added 30 commits September 19, 2022 16:34
- CoGroupByKey is broken due to how tags are used with GroupByKey
- GroupByKey should output `[('0', None), ('1', 1)]`, however it actually outputs: [(None, ('1', 1)), (None, ('0', None))]
- Once that is fixed, we may have test pipelines work on Dask.
@cisaacstern
Copy link

cisaacstern commented Dec 23, 2023

A final note to self for now, which is simply that I am now seeing side input mapping probably needs to happen at the pipeline visitor level... where we have access to context like:

 /.../beam/sdks/python/apache_beam/runners/dask/dask_runner.py(150)visit_transform()
-> inputs = list(transform_node.inputs)
(Pdb) transform_node.inputs
(<PCollection[main.None] at 0x131e7ac50>,)
(Pdb) transform_node.side_inputs
[<apache_beam.pvalue._UnpickledSideInput object at 0x131e58130>]
(Pdb) transform_node
AppliedPTransform(Map(mult_by), ParDo)
(Pdb) op
ParDo(applied=AppliedPTransform(Map(mult_by), ParDo))
(Pdb) op_class
<class 'apache_beam.runners.dask.transform_evaluator.ParDo'>
(Pdb) 

Ok... to be continued soon...

Copy link
Contributor

Reminder, please take a look at this pr: @riteshghorse @damccorm

@cisaacstern
Copy link

cisaacstern commented Jan 4, 2024

Status:

  • Side inputs now work, with implementation based on Ray Runner
  • Added additional side inputs test based on Ray Runner tests
  • All side inputs tests pass except for 1, which I will take a look at next.

This finally feels like it's getting close. (There is still some upstream work on Dask that needs to go in before this, which I'll push forward as well, as mentioned above.)

@cisaacstern
Copy link

cisaacstern commented Jan 11, 2024

I have now gotten this as far as it can go without the above-referenced fix in dask going in, so going to shift focus back there, and will re-ping here once that goes in and this is ready for review.

Copy link
Contributor

Reminder, please take a look at this pr: @riteshghorse @damccorm

@cisaacstern
Copy link

After a hiatus, I am working on this again now.

Today I will complete the upstream fix dask/dask#10734 in dask.

Once that goes in, I'll complete this. Thanks all for your patience.

@cisaacstern
Copy link

The necessary upstream fix in Dask was merged! 🎉

Once we get a new Dask release that includes this change, I will finish this PR!

Copy link
Contributor

github-actions bot commented Jun 8, 2024

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 8, 2024
Copy link
Contributor

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jun 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants