Reading tables with a dask-cudf DataFrame #224

sarahyurick · 2021-08-23T17:17:39Z

Updated version of #219. Also tagging @ayushdg if you have time to double check the pandaslike.py changes specifically?

codecov-commenter · 2021-08-23T17:28:53Z

Codecov Report

Merging #224 (9aade51) into main (4dab949) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #224   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           64        64           
  Lines         2589      2590    +1     
  Branches       362       361    -1     
=========================================
+ Hits          2589      2590    +1

Impacted Files	Coverage Δ
dask_sql/context.py	`100.00% <ø> (ø)`
dask_sql/input_utils/convert.py	`100.00% <ø> (ø)`
dask_sql/input_utils/hive.py	`100.00% <100.00%> (ø)`
dask_sql/input_utils/intake.py	`100.00% <100.00%> (ø)`
dask_sql/input_utils/location.py	`100.00% <100.00%> (ø)`
dask_sql/input_utils/pandaslike.py	`100.00% <100.00%> (ø)`
dask_sql/physical/rel/custom/create_table.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4dab949...9aade51. Read the comment docs.

nils-braun

Thanks @sarahyurick!
I have only two comments before we can merge:

there are two additional input methods, hive.py and dask.py. The latter is trivial (I guess a Dask-cudf data frame is also a Dask data frame, so we can just keep the logic), but you should also add a check like in the intake plugin to not allow for GPUs in hive.py (or we also re-write it to allow GPUs, but maybe that is something for the next step). I am actually wondering why the tests did not fail for hive...
can you make sure the coverage is again 100%? On the pandas-like-PR I did already ask, how we can best test the CPU behaviour via GitHub actions. I think for the beginning, we need to have # pragma: no cover comments in all gpu-only places. I would like to keep the 100% coverage if possible (even if this means we will need some coverage exceptions).

sarahyurick · 2021-08-24T17:43:41Z

Sounds good - I've updated hive.py and added some # pragma: no covers - let me know if I missed any!

ayushdg · 2021-08-25T04:35:09Z

dask_sql/input_utils/pandaslike.py

+        if gpu:  # pragma: no cover
+            import dask_cudf
+
+            return dask_cudf.from_cudf(
+                cudf.from_pandas(input_item), npartitions=npartitions, **kwargs,
+            )
+        else:
+            return dd.from_pandas(input_item, npartitions=npartitions, **kwargs)


Given that this input util accepts both cudf and pandas dataframes as valid inputs, you'd probably need an additional check here to check if input_item is a pandas dataframe or not, and call the from_pandas function only for that case.

nils-braun · 2021-08-25T15:30:33Z

I like this! LGTM!

add gpu param

a13110d

sarahyurick mentioned this pull request Aug 23, 2021

Creating tables with a dask-cudf DataFrame #219

Closed

nils-braun reviewed Aug 23, 2021

View reviewed changes

hive and code coverage

400b29d

ayushdg reviewed Aug 25, 2021

View reviewed changes

Update pandaslike.py

9aade51

nils-braun merged commit ece7ec7 into dask-contrib:main Aug 25, 2021

charlesbluca mentioned this pull request Oct 1, 2021

Add gpuCI support #240

Merged

sarahyurick deleted the read_with_gpu branch September 21, 2022 23:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading tables with a dask-cudf DataFrame #224

Reading tables with a dask-cudf DataFrame #224

sarahyurick commented Aug 23, 2021

codecov-commenter commented Aug 23, 2021 •

edited

Loading

nils-braun left a comment

sarahyurick commented Aug 24, 2021

ayushdg Aug 25, 2021

nils-braun commented Aug 25, 2021

Reading tables with a dask-cudf DataFrame #224

Reading tables with a dask-cudf DataFrame #224

Conversation

sarahyurick commented Aug 23, 2021

codecov-commenter commented Aug 23, 2021 • edited Loading

Codecov Report

nils-braun left a comment

Choose a reason for hiding this comment

sarahyurick commented Aug 24, 2021

ayushdg Aug 25, 2021

Choose a reason for hiding this comment

nils-braun commented Aug 25, 2021

codecov-commenter commented Aug 23, 2021 •

edited

Loading