-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39769: [C++][Device] Fix Importing nested and string types for DeviceArray #39770
Conversation
|
Do you have plans to add CUDA-based tests? |
@pitrou I'll add explicit CUDA based tests for the device array interface as a separate PR for #39786 (comment) |
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 26801f1. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 6 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…or DeviceArray (apache#39770) ### Rationale for this change In my testing with libcudf and other GPU data, I discovered a deficiency in ImportDeviceArray and thus ImportDeviceRecordBatch where the device type and memory manager aren't propagated to child importers and it fails to import offset-based types such as strings. ### What changes are included in this PR? These are relatively easily handled by first ensuring that `ImportChild` propagates the device_type and memory manager from the parent. Then for importing offset based values we merely need to use the memory manager to copy the final offset value to the CPU to use for the buffer size computation. This will work for any device which has implemented CopyBufferTo/From ### Are these changes tested? A new test is added to test these situations. * Closes: apache#39769 Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
…or DeviceArray (apache#39770) ### Rationale for this change In my testing with libcudf and other GPU data, I discovered a deficiency in ImportDeviceArray and thus ImportDeviceRecordBatch where the device type and memory manager aren't propagated to child importers and it fails to import offset-based types such as strings. ### What changes are included in this PR? These are relatively easily handled by first ensuring that `ImportChild` propagates the device_type and memory manager from the parent. Then for importing offset based values we merely need to use the memory manager to copy the final offset value to the CPU to use for the buffer size computation. This will work for any device which has implemented CopyBufferTo/From ### Are these changes tested? A new test is added to test these situations. * Closes: apache#39769 Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
…or DeviceArray (apache#39770) ### Rationale for this change In my testing with libcudf and other GPU data, I discovered a deficiency in ImportDeviceArray and thus ImportDeviceRecordBatch where the device type and memory manager aren't propagated to child importers and it fails to import offset-based types such as strings. ### What changes are included in this PR? These are relatively easily handled by first ensuring that `ImportChild` propagates the device_type and memory manager from the parent. Then for importing offset based values we merely need to use the memory manager to copy the final offset value to the CPU to use for the buffer size computation. This will work for any device which has implemented CopyBufferTo/From ### Are these changes tested? A new test is added to test these situations. * Closes: apache#39769 Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
#41477) ### Rationale for this change Currently ```MemoryManager``` objects define functionality to Copy or View entire buffers. Occasionally there is the need to only copy a single value or slice from a buffer to a piece of CPU memory (see #39770 (comment)). It's overkill to do a bunch of whole Buffer operations and manually slicing just to copy 4 or 8 bytes. ### What changes are included in this PR? Add the ```MemoryManager::CopyBufferSliceToCPU``` function, which initially attempts to use memcpy for the specified slice. If this is not possible, it defaults to copying the entire buffer and then viewing/copying the slice. Update ```ArrayImporter::ImportStringValuesBuffer``` to use this function. ### Are these changes tested? ```ArrayImporter::ImportStringValuesBuffer``` is tested as a part of ```arrow-c-bridge-test``` * GitHub Issue: #39858 Lead-authored-by: Alan Stoate <alan.stoate@gmail.com> Co-authored-by: Mac Lilly <maclilly45@gmail.com> Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…pointer (apache#41477) ### Rationale for this change Currently ```MemoryManager``` objects define functionality to Copy or View entire buffers. Occasionally there is the need to only copy a single value or slice from a buffer to a piece of CPU memory (see apache#39770 (comment)). It's overkill to do a bunch of whole Buffer operations and manually slicing just to copy 4 or 8 bytes. ### What changes are included in this PR? Add the ```MemoryManager::CopyBufferSliceToCPU``` function, which initially attempts to use memcpy for the specified slice. If this is not possible, it defaults to copying the entire buffer and then viewing/copying the slice. Update ```ArrayImporter::ImportStringValuesBuffer``` to use this function. ### Are these changes tested? ```ArrayImporter::ImportStringValuesBuffer``` is tested as a part of ```arrow-c-bridge-test``` * GitHub Issue: apache#39858 Lead-authored-by: Alan Stoate <alan.stoate@gmail.com> Co-authored-by: Mac Lilly <maclilly45@gmail.com> Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
In my testing with libcudf and other GPU data, I discovered a deficiency in ImportDeviceArray and thus ImportDeviceRecordBatch where the device type and memory manager aren't propagated to child importers and it fails to import offset-based types such as strings.
What changes are included in this PR?
These are relatively easily handled by first ensuring that
ImportChild
propagates the device_type and memory manager from the parent. Then for importing offset based values we merely need to use the memory manager to copy the final offset value to the CPU to use for the buffer size computation.This will work for any device which has implemented CopyBufferTo/From
Are these changes tested?
A new test is added to test these situations.