-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Replacing NaN with None in Pandas 1.3 does not work #42423
Comments
That was apparently changed on purpose: #39761 but it's indeed not very convenient, for example we often want to replace NaN by None just before converting the dataframe back to a dict (df.to_dict) or other data structure. I'm pretty sure that this sole change breaks many existing code bases. |
One can still convert to None, however you need to convert to object dtype first:
gives
This behavior similar to the result of assigning using .loc/.iloc:
gives (on both 1.3.x and 1.2.x):
I can understand this may be somewhat surprising, but is certainly more consistent. It also allows assignment without conversion to object, and conversion to object can involve significant performance degredation. cc @jbrockmendel for any other thoughts. |
When you're workin with json, json doesn't recognize NaNs. Conversion to object dtype would be mandatory in this scenario. |
@yashs97 It would be helpful to understand how the workaround I posted is not sufficient. |
i was asking for a workaround without converting to object dtype since like you said it involves performance degradation |
This is the right way to do this.
The 1.2 behavior referenced by the OP also converts to object, so if you want None in your Series, there is no way to do this. |
Is there a heads up for this in the migration guide? Would be helpful. |
@dylancaponi - By migration guide, do you mean the whatsnew (release notes)? |
This silently broke code that was the top hit for "pandas replace NaN with None" on stack overflow for many years, and was shared on this Git repo: If at all possible - please produce a warning if you see ```df.where(pd.notnull(df), None)`` |
Code Sample, a copy-pastable example
Pandas 1.2
Pandas 1.3
Problem description
Replacing NaN values with None (or any other Python object) should work as in previous Pandas versions.
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : f00ed8f
python : 3.8.10.final.0
python-bits : 64
OS : Darwin
OS-release : 21.0.0
Version : Darwin Kernel Version 21.0.0: Sun Jun 20 18:43:49 PDT 2021; root:xnu-8011.0.0.121.4~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : es_ES.UTF-8
LOCALE : es_ES.UTF-8
pandas : 1.3.0
numpy : 1.18.5
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : 57.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : None
pymysql : None
psycopg2 : 2.9.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 7.20.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.3.3
sqlalchemy : 1.4.20
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: