Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial DaskRunner for Beam (apache#22421)
* WIP: Created a skeleton dask runner implementation. * WIP: Idea for a translation evaluator. * Added overrides and a visitor that translates operations. * Fixed a dataclass typo. * Expanded translations. * Core idea seems to be kinda working... * First iteration on DaskRunnerResult (keep track of pipeline state). * Added minimal set of DaskRunner options. * WIP: Alllmost got asserts to work! The current status is: - CoGroupByKey is broken due to how tags are used with GroupByKey - GroupByKey should output `[('0', None), ('1', 1)]`, however it actually outputs: [(None, ('1', 1)), (None, ('0', None))] - Once that is fixed, we may have test pipelines work on Dask. * With a great 1-liner from @pabloem, groupby is fixed! Now, all three initial tests pass. * Self-review: Cleaned up dask runner impl. * Self-review: Remove TODOs, delete commented out code, other cleanup. * First pass at linting rules. * WIP, include dask dependencies + test setup. * WIP: maybe better dask deps? * Skip dask tests depending on successful import. * Fixed setup.py (missing `,`). * Added an additional comma. * Moved skipping logic to be above dask import. * Fix lint issues with dask runner tests. * Adding destination for client address. * Changing to async produces a timeout error instead of stuck in infinite loop. * Close client during `wait_until_finish`; rm async. * Supporting side-inputs for ParDo. * Revert "Close client during `wait_until_finish`; rm async." This reverts commit 09365f6. * Revert "Changing to async produces a timeout error instead of stuck in infinite loop." This reverts commit 676d752. * Adding -dask tox targets onto the gradle build * wip - added print stmt. * wip - prove side inputs is set. * wip - prove side inputs is set in Pardo. * wip - rm asserts, add print * wip - adding named inputs... * Experiments: non-named side inputs + del `None` in named inputs. * None --> 'None' * No default side input. * Pass along args + kwargs. * Applied yapf to dask sources. * Dask sources passing pylint. * Added dask extra to docs gen tox env. * Applied yapf from tox. * Include dask in mypy checks. * Upgrading mypy support to python 3.8 since py37 support is deprecated in dask. * Manually installing an old version of dask before 3.7 support was dropped. * fix lint: line too long. * Fixed type errors with DaskRunnerResult. Disabled mypy type checking in dask. * Fix pytype errors (in transform_evaluator). * Ran isort. * Ran yapf again. * Fix imports (one per line) * isort -- alphabetical. * Added feature to CHANGES.md. * ran yapf via tox on linux machine * Change an import to pass CI. * Skip isort error; needed to get CI to pass. * Skip test logic may favor better with isort. * (Maybe) the last isort fix. * Tested pipeline options (added one fix). * Improve formatting of test. * Self-review: removing side inputs. In addition, adding a more helpful property to the base DaskBagOp (tranform). * add dask to coverage suite in tox. * Capture value error in assert. * Change timeout value to 600 seconds. * ignoring broken test * Update CHANGES.md * Using reflection to test the Dask client constructor. * Better method of inspecting the constructor parameters (thanks @TomAugspurger!). Co-authored-by: Pablo E <pabloem@apache.org> Co-authored-by: Pablo <pabloem@users.noreply.github.com>
- Loading branch information