-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating tables with a dask-cudf DataFrame #219
Conversation
Looks like the tests are failing from everywhere else the new parameter would need to be added... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sarahyurick !
I see two possibilities to work around the additional-keyword-argument-problem:
- Either you add into each input plugin which does not support gpu a first line with throws an exception when gpu is set to true
- or you do not specifically use a gpu argument in the main functions and let only the locations input plugin handle it. Additional arguments are passed via
kwargs
, so you could just extract it from there (for example the intake plugin has aintake_table_name
, which is only used for this class)
Co-authored-by: Nils Braun <nils-braun@users.noreply.github.com>
…k/dask-sql into create_table_with_gpu
Okay, it looks like there were only 2 other places in the code (in |
Codecov Report
@@ Coverage Diff @@
## main #219 +/- ##
===========================================
- Coverage 100.00% 99.48% -0.52%
===========================================
Files 64 64
Lines 2470 2500 +30
Branches 340 350 +10
===========================================
+ Hits 2470 2487 +17
- Misses 0 8 +8
- Partials 0 5 +5
Continue to review full report at Codecov.
|
Sorry @sarahyurick - I just realised that the coverage was not shown correctly (because we moved to the new git space) and that the hive test was disabled. |
Sounds good! There were some conflicts that required me to re-open this PR as #224. |
Ok, thanks! Just as a note (but you might already know this): you do not need to create a new PR if there are conflicts - you can just rebase and force-push or (preferred) merge the new main branch into your branch, solve the merge conflicts and push again. But also opening a new PR is fine :-) |
Dask-SQL can create tables directly from storage in SQL syntax:
As suggested by @randerzander it would be nice to have a way to create this backed by a dask-cudf DF instead of a Dask CPU DF. This is my first step toward achieving that.
So far, I just added a boolean parameter
gpu
, although perhaps something likeHOW='gpu'
vs defaultHOW='cpu'
would be better. These changes are also only for reading in a single file. Mainly want to get your thoughts about how to support this @nils-braun :)For example, something like this works with this PR: