- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 19.2k
 
Description
Pandas version checks
- 
I have checked that this issue has not already been reported.
 - 
I have confirmed this bug exists on the latest version of pandas.
 - 
I have confirmed this bug exists on the main branch of pandas.
 
Reproducible Example
import pandas as pd
import numpy as np
d = {'name':["bob", "todd", "sarah", "john"],
        'gp': [1, 2, np.NaN, 2],
        'score':[90, 40, 80, 98]}
df = pd.DataFrame(d)
df.name = df.name.astype("category")
df.gp = df.gp.astype("Int16")
df.gp = df.gp.astype("category")
print('-pandas version: ', pd.__version__)
print('-df dtypes:\n', df.dtypes)
print('-df.gp:\n', df.gp)
df.to_csv("test.csv")Issue Description
This Bug is similar to: #46297. Executing the example above produces the following error in pandas 1.4.2 (this code example works fine on older pandas versions, e.g. 1.2.4). The problem occurs when saving a dataframe to a .csv when the categorical type is set over a nan supported "Int" dtype (note not a default "int" dtype).
-pandas version:  1.4.2
-df dtypes:
 name     category
gp       category
score       int64
dtype: object
-df.gp:
 0      1
1      2
2    NaN
3      2
Name: gp, dtype: category
Categories (2, Int16): [1, 2]
Traceback (most recent call last):
  File "<ipython-input-28-995d10a4922f>", line 15, in <module>
    df.to_csv("test.csv")
  File "C:\Users\Sade\anaconda3\envs\asdf\lib\site-packages\pandas\core\generic.py", line 3551, in to_csv
    return DataFrameRenderer(formatter).to_csv(
  File "C:\Users\Sade\anaconda3\envs\asdf\lib\site-packages\pandas\io\formats\format.py", line 1180, in to_csv
    csv_formatter.save()
  File "C:\Users\Sade\anaconda3\envs\asdf\lib\site-packages\pandas\io\formats\csvs.py", line 261, in save
    self._save()
  File "C:\Users\Sade\anaconda3\envs\asdf\lib\site-packages\pandas\io\formats\csvs.py", line 266, in _save
    self._save_body()
  File "C:\Users\Sade\anaconda3\envs\asdf\lib\site-packages\pandas\io\formats\csvs.py", line 304, in _save_body
    self._save_chunk(start_i, end_i)
  File "C:\Users\Sade\anaconda3\envs\asdf\lib\site-packages\pandas\io\formats\csvs.py", line 311, in _save_chunk
    res = df._mgr.to_native_types(**self._number_format)
  File "C:\Users\Sade\anaconda3\envs\asdf\lib\site-packages\pandas\core\internals\managers.py", line 473, in to_native_types
    return self.apply("to_native_types", **kwargs)
  File "C:\Users\Sade\anaconda3\envs\asdf\lib\site-packages\pandas\core\internals\managers.py", line 304, in apply
    applied = getattr(b, f)(**kwargs)
  File "C:\Users\Sade\anaconda3\envs\asdf\lib\site-packages\pandas\core\internals\blocks.py", line 634, in to_native_types
    result = to_native_types(self.values, na_rep=na_rep, quoting=quoting, **kwargs)
  File "C:\Users\Sade\anaconda3\envs\asdf\lib\site-packages\pandas\core\internals\blocks.py", line 2163, in to_native_types
    values = take_nd(
  File "C:\Users\Sade\anaconda3\envs\asdf\lib\site-packages\pandas\core\array_algos\take.py", line 114, in take_nd
    return arr.take(indexer, fill_value=fill_value, allow_fill=allow_fill)
  File "C:\Users\Sade\anaconda3\envs\asdf\lib\site-packages\pandas\core\arrays\masked.py", line 653, in take
    return type(self)(result, mask, copy=False)
  File "C:\Users\Sade\anaconda3\envs\asdf\lib\site-packages\pandas\core\arrays\integer.py", line 315, in __init__
    raise TypeError(
TypeError: values should be integer numpy array. Use the 'pd.array' function instead
Expected Behavior
I'd expect the above example to write a dataframe to a .csv as in older pandas versions instead of ending in an error traceback.
Installed Versions
INSTALLED VERSIONS
commit           : 4bfe3d0
python           : 3.8.5.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
Version          : 10.0.19041
machine          : AMD64
processor        : AMD64 Family 23 Model 8 Stepping 2, AuthenticAMD
byteorder        : little
LC_ALL           : None
LANG             : en
LOCALE           : English_United States.1252
pandas           : 1.4.2
numpy            : 1.18.5
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.2.4
setuptools       : 50.3.1.post20201107
Cython           : 0.29.21
pytest           : 6.1.1
hypothesis       : None
sphinx           : 3.2.1
blosc            : None
feather          : None
xlsxwriter       : 1.3.7
lxml.etree       : 4.6.1
html5lib         : 1.1
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.19.0
pandas_datareader: None
bs4              : 4.9.3
bottleneck       : 1.3.2
brotli           :
fastparquet      : None
fsspec           : 0.8.3
gcsfs            : None
markupsafe       : 1.1.1
matplotlib       : 3.3.2
numba            : 0.51.2
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.5
pandas_gbq       : None
pyarrow          : None
pyreadstat       : None
pyxlsb           : None
s3fs             : None
scipy            : 1.5.2
snappy           : None
sqlalchemy       : 1.3.20
tables           : 3.6.1
tabulate         : None
xarray           : None
xlrd             : 1.2.0
xlwt             : 1.3.0
zstandard        : None