-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Setting column on empty DataFrame with loc / avoiding SettingWithCopyWarning for potentially empty DataFrames/copies/views #41891
Comments
What do you want to achieve with |
While setting everything to
Let me know if I'm missing anything. No matter the context though, I think that the |
Note: I've updated my last comment a bit. Indeed the problem doesn't occur when assigning series, arrays, because they'd have to be empty themselves, but the issue with scalar values is still an annoying one, which causes many bugs, because empty dataframes usually are an edge case. The pandas/pandas/core/indexing.py Lines 1648 to 1659 in 499ef8c
I will see if I can come up with a possible solution myself, but do let me know if there are reasons that this shouldn't be fixed. I see that this behaviour is currently explicitly tested, but I wonder why: pandas/pandas/tests/indexing/test_partial.py Lines 385 to 387 in 499ef8c
|
Let me add @jreback to the conversation who added the check :) |
This also make updating columns more complicated. For example I want to update an OHLC dataframe where volume = 0 to the close value since the open, high, and low value might be 0. df = pd.DataFrame({
'open': [ 100, 0, 200],
'high': [ 105, 0, 200],
'low': [ 95, 0, 200],
'close': [ 95, 95, 200],
'volume': [1000, 0, 500],
})
# this used to work
df.loc[df['volume'] == 0, ['open', 'high', 'low']] = df['close'] # <- ValueError : shape mismatch if the loc result is empty |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Problem description
Let's consider a function
add_column
that adds a column.df[column] = value
(Option 1), then the function will throw theSettingWithCopyWarning
whenever it is called on a copy/view (even if we don't care about propagating the change to the original dataframe).df.loc[:, column] = value
(Option 2). However, this throws as soon as the dataframe is empty, i.e. doesn't contain any rowsThis then requires ugly solutions like the following
whenever we might be dealing with dataframes or their copies/views that are possibly empty.
INSTALLED VERSIONS
commit : 2cb9652
python : 3.7.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-64-generic
Version : #58-Ubuntu SMP Fri Jul 10 19:33:51 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.2.4
numpy : 1.18.1
pytz : 2019.2
dateutil : 2.7.3
pip : 20.3.3
setuptools : 41.1.0
Cython : None
pytest : 5.3.2
hypothesis : None
sphinx : 3.0.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 5.8.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : None
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.16
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.17.0
xlrd : 2.0.1
xlwt : None
numba : 0.51.2
The text was updated successfully, but these errors were encountered: