Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading tables with a dask-cudf DataFrame #224

Merged
merged 3 commits into from
Aug 25, 2021

Conversation

sarahyurick
Copy link
Collaborator

Updated version of #219. Also tagging @ayushdg if you have time to double check the pandaslike.py changes specifically?

@codecov-commenter
Copy link

codecov-commenter commented Aug 23, 2021

Codecov Report

Merging #224 (9aade51) into main (4dab949) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##              main      #224   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           64        64           
  Lines         2589      2590    +1     
  Branches       362       361    -1     
=========================================
+ Hits          2589      2590    +1     
Impacted Files Coverage Δ
dask_sql/context.py 100.00% <ø> (ø)
dask_sql/input_utils/convert.py 100.00% <ø> (ø)
dask_sql/input_utils/hive.py 100.00% <100.00%> (ø)
dask_sql/input_utils/intake.py 100.00% <100.00%> (ø)
dask_sql/input_utils/location.py 100.00% <100.00%> (ø)
dask_sql/input_utils/pandaslike.py 100.00% <100.00%> (ø)
dask_sql/physical/rel/custom/create_table.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4dab949...9aade51. Read the comment docs.

Copy link
Collaborator

@nils-braun nils-braun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sarahyurick!
I have only two comments before we can merge:

  • there are two additional input methods, hive.py and dask.py. The latter is trivial (I guess a Dask-cudf data frame is also a Dask data frame, so we can just keep the logic), but you should also add a check like in the intake plugin to not allow for GPUs in hive.py (or we also re-write it to allow GPUs, but maybe that is something for the next step). I am actually wondering why the tests did not fail for hive...
  • can you make sure the coverage is again 100%? On the pandas-like-PR I did already ask, how we can best test the CPU behaviour via GitHub actions. I think for the beginning, we need to have # pragma: no cover comments in all gpu-only places. I would like to keep the 100% coverage if possible (even if this means we will need some coverage exceptions).

@sarahyurick
Copy link
Collaborator Author

Sounds good - I've updated hive.py and added some # pragma: no covers - let me know if I missed any!

Comment on lines 30 to 37
if gpu: # pragma: no cover
import dask_cudf

return dask_cudf.from_cudf(
cudf.from_pandas(input_item), npartitions=npartitions, **kwargs,
)
else:
return dd.from_pandas(input_item, npartitions=npartitions, **kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this input util accepts both cudf and pandas dataframes as valid inputs, you'd probably need an additional check here to check if input_item is a pandas dataframe or not, and call the from_pandas function only for that case.

@nils-braun
Copy link
Collaborator

I like this! LGTM!

@nils-braun nils-braun merged commit ece7ec7 into dask-contrib:main Aug 25, 2021
@charlesbluca charlesbluca mentioned this pull request Oct 1, 2021
@sarahyurick sarahyurick deleted the read_with_gpu branch September 21, 2022 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants