You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
File .venv/lib/python3.11/site-packages/pandas/core/internals/blocks.py:2253, in EABackedBlock.get_values(self, dtype)
[2251](.venv/lib/python3.11/site-packages/pandas/core/internals/blocks.py:2251) values = values.astype(object)
[2252](.venv/lib/python3.11/site-packages/pandas/core/internals/blocks.py:2252) # TODO(EA2D): reshape not needed with 2D EAs
-> [2253](.venv/lib/python3.11/site-packages/pandas/core/internals/blocks.py:2253) return np.asarray(values).reshape(self.shape)
ValueError: cannot reshape array of size 9 into shape
Issue Description
I reported this issue at the Pandas repository, but they referred me to here first to verify that it is not an error with awkward_pandas. (see Pandas issue 58927)
When calling iterrows() on a DataFrame which contains an awkward array as a column, a ValueError occurs (see stacktrace example). This error only occurs when all rows of the awkward array are of equal length. In this case the calls to values.astype(object) and/or np.asarray(values) in the get_values function in the pandas/core/internals/blocks.py module result in a 2D array, instead of a 1D array with nested lists.
When the awkward array is actually jagged, the call results in the correct format of the array (see commented line in code example) and iterrows() works as intended.
Expected Behavior
I would expect iterrows() to iterate over the DataFrame rows without throwing an error, but instead returning a Series with the value of the awkward array at the index of the row set correctly.
In the version of this library on main, we have changed this library quite substantially, to make it simpler yet support more dataframe libraries. Therefore, the pandas "awkward" dtype will disappear, and only the .ak accessor (on series and dataframes) as the way to get awkward's vectorised nested/ragged operations. The data columns themselves will tend to be stored in arrow layout, which is becoming the pandas standard.
That's a rather long way of saying, that iterrows() will "just work" as it does for any other data type that pandas already knows about.
Exactly how to get your data to be stored as arrow is another matter and one that pandas seems a bit confused about (see here). With #56 , which I just posted, you could do
df["numbers"] = df.numbers.ak.to_output()
(note that you don't need your data to be in arrow storage before using .ak)
Reproducible Example
Issue Description
I reported this issue at the Pandas repository, but they referred me to here first to verify that it is not an error with
awkward_pandas
. (see Pandas issue 58927)When calling
iterrows()
on a DataFrame which contains an awkward array as a column, a ValueError occurs (see stacktrace example). This error only occurs when all rows of the awkward array are of equal length. In this case the calls tovalues.astype(object)
and/ornp.asarray(values)
in theget_values
function in thepandas/core/internals/blocks.py
module result in a 2D array, instead of a 1D array with nested lists.When the awkward array is actually jagged, the call results in the correct format of the array (see commented line in code example) and
iterrows()
works as intended.Expected Behavior
I would expect
iterrows()
to iterate over the DataFrame rows without throwing an error, but instead returning a Series with the value of the awkward array at the index of the row set correctly.Installed Versions
The text was updated successfully, but these errors were encountered: