-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow format string for format_str in _dtype_from_vaexdtype() #54
Comments
@honno could you please verify that https://github.com/data-apis/dataframe-interchange-tests has a test checking that the format strings used in each dataframe library are indeed in Arrow format now? xref #62, which has a basic test that may be reusable in case you don't yet have one. |
I had left it as a TODO, but just now updated |
Thanks @honno. I think we are good here then. I double checked for Pandas, and that looks as expected: >>> import pandas as pd
>>> pd.__version__
'1.5.0rc0'
>>> df = pd.DataFrame({"A": [True, False, False, True]})
>>> df.__dataframe__().get_column_by_name('A').dtype
(<DtypeKind.BOOL: 20>, 8, 'b', '|') |
When demoing the dataframe protocol at EuroScipy, I actually ran into this, seeing that Vaex is returning a wrong value:
gives
And so the third entry in the dtype tuple "<f4" is wrong, it should be "f" instead (it seems this is just the numpy descriptor). So while this might now be tested in |
Ayup, issue filed in vaexio/vaex#2139, and it's failed in the test suite (well skipped due to flakiness issues). |
Thanks! |
The code now uses NumPy format strings, while the docs for Column.dtype specify it must use the format string from the Apache Arrow C Data Interface (similar but slightly different). So we need a utility to map NumPy to Arrow format here.
Example - should say 'b' not |b1':
Source:
https://arrow.apache.org/docs/format/CDataInterface.html#data-type-description-format-strings
https://numpy.org/doc/stable/reference/arrays.interface.html#arrays-interface
https://numpy.org/doc/stable/reference/generated/numpy.dtype.itemsize.html
https://numpy.org/doc/stable/reference/generated/numpy.dtype.byteorder.html
The text was updated successfully, but these errors were encountered: