Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly handle RaggedArray conversions to numpy arrays #1185

Merged
merged 2 commits into from
Feb 27, 2023
Merged

Correctly handle RaggedArray conversions to numpy arrays #1185

merged 2 commits into from
Feb 27, 2023

Conversation

ianthomas23
Copy link
Member

Fixes #1158.

This removes all warnings caused by numpy conversions of ragged arrays which will be errors in numpy 1.24. In fact there weren't any problems in the library code itself as if you follow the docstrings you will create ragged arrays correctly, but some of the tests used shortcuts instead of the recommended way and these have been changed in this PR.

Either of these are correct ways to create a DataFrame series that is a ragged array to use in datashader:

import pandas as pd
x = pd.array([[0, 1, 2], [4, 5, 6, 7, 8, 9]], dtype='Ragged[float32]')

from datashader.datatypes import RaggedArray
x = RaggedArray([[0, 1, 2], [4, 5, 6, 7, 8, 9]], dtype='float32')

The dtype is optional for RaggedArray as it is inferred.

The following worked in the past but are incorrect using numpy 1.24 onwards:

x = np.asarray([[0, 1, 2], [4, 5, 6, 7, 8, 9]])

x = np.asarray([[0, 1, 2], [4, 5, 6, 7, 8, 9]], dtype=object)

The first approach will immediately fail, telling you to use the second dtype=object approach. This works for some but not all codepaths in datashader as it drops important dtype information. Hence avoid both.

Eventually the RaggedArray pandas extension array within datashader will be replaced by awkward-array and will simplify our code and make it more robust to future changes.

@ianthomas23 ianthomas23 requested review from ablythed and jbednar and removed request for ablythed February 20, 2023 14:20
@codecov
Copy link

codecov bot commented Feb 20, 2023

Codecov Report

Merging #1185 (399b36c) into main (aed1760) will increase coverage by 0.00%.
The diff coverage is 85.71%.

@@           Coverage Diff           @@
##             main    #1185   +/-   ##
=======================================
  Coverage   85.39%   85.39%           
=======================================
  Files          35       35           
  Lines        8016     8023    +7     
=======================================
+ Hits         6845     6851    +6     
- Misses       1171     1172    +1     
Impacted Files Coverage Δ
datashader/datatypes.py 93.75% <85.71%> (-0.15%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, once tests pass!

@ianthomas23
Copy link
Member Author

After rebasing this is passing CI.

@ianthomas23 ianthomas23 merged commit 229cea3 into holoviz:main Feb 27, 2023
@ianthomas23 ianthomas23 deleted the numpy_1.24_fixes branch February 27, 2023 14:37
@ianthomas23 ianthomas23 added this to the v0.14.5 milestone Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support numpy 1.24
2 participants