[BUG-REPORT] Expression.astype("uint") works for numpy but not arrow #2191

NickCrews · 2022-08-30T00:33:47Z

See added xfailing test: 538a5a6

The text was updated successfully, but these errors were encountered:

JovanVeljanoski · 2022-10-01T19:21:25Z

Actually I do not thing this is a bug..

Look in at the arrow documentation there is not such thing as uint.

So in your test, if you use .astype(uint64) for instance, things will work..

I guess we could make an alias for uint to be uint64 to account for this..
what do you think @maartenbreddels @NickCrews ?

NickCrews · 2022-10-02T06:55:27Z

hmm, that makes sense why it doesn't work.

If we were starting from scratch, I might actually lean the opposite way: Make uint fail for BOTH numpy and arrow, and force users to be explicit with asking for uint64. But that would break people, so probably we can't change to that behavior now.

If vaex is trying to be a higher level abstraction that hides the differences between numpy and arrow (I think this would be a great goal, but IDK how attainable it actually is) then I would like the alias proposal. However, if there are other cases where I DO need to know which is the backend for my data (eg #2192), then I would prefer if vaex explicitly left things as is and didn't try to do something clever. So IDK, I think it depends on the larger goals.

I'm fine closing this as "not a bug" and just being more explicit in the docstring for astype().

JovanVeljanoski · 2022-10-02T12:01:29Z

I think we generally agree.

I think the main idea (as much as we can make it) is that an average user should not care or even know whether the data lives in arrow or numpy underneath it all, as long as it is handled via vaex. When you get it out of vaex (like with .values or .to_numpy() for example, that's a different story.

And we do want most obvious things to work out of the box with safe general assumptions. I still think that many users are not so knowledgeable about (py)arrow yet.. so it is nice to have some higher abstraction.

I am curious to hear @maartenbreddels opinion on this , so let's keep this open for now, and thanks for reporting!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG-REPORT] Expression.astype("uint") works for numpy but not arrow #2191

[BUG-REPORT] Expression.astype("uint") works for numpy but not arrow #2191

NickCrews commented Aug 30, 2022

JovanVeljanoski commented Oct 1, 2022

NickCrews commented Oct 2, 2022

JovanVeljanoski commented Oct 2, 2022

[BUG-REPORT] Expression.astype("uint") works for numpy but not arrow #2191

[BUG-REPORT] Expression.astype("uint") works for numpy but not arrow #2191

Comments

NickCrews commented Aug 30, 2022

JovanVeljanoski commented Oct 1, 2022

NickCrews commented Oct 2, 2022

JovanVeljanoski commented Oct 2, 2022