- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 19.2k
Description
Pandas version checks
- 
I have checked that this issue has not already been reported. 
- 
I have confirmed this bug exists on the latest version of pandas. 
- 
I have confirmed this bug exists on the main branch of pandas. 
Reproducible Example
import io
import pandas as pd
csv = io.StringIO(f"""\
value,description
42,A small int
18446744073709551616.5,This is 2.0^64 + 0.5
""")
data = pd.read_csv(csv)
print(data.dtypes)
print(data["value"].values)value          object
description    object
dtype: object
['42' '18446744073709551616.5']
Issue Description
If the CSV column doesn't contain preceding floats, pd.read_csv() (without dtype=) interprets numbers str of dtype object. The two first examples below work because the data stays below 
Known work-arounds: One could use dtype={"value": "float64"}, but in our case we'd really prefer to not build the machinery and a type information database. Also engine="pyarrow" seems to work correctly, but we'd prefer to avoid that dependency.
OK, ex. 1: int smaller than $2^{64}$ , interpreted as int64:
csv = io.StringIO(f"""\
value,description
42,A small integer
18446744073709551615,This is 2^63
""")
data = pd.read_csv(csv)
print(data.dtypes)
print(data["value"].values)value          uint64
description    object
dtype: object
[                  42 18446744073709551615]
OK, ex. 2: float smaller than $2^{64}$ , interpreted as float64:
csv = io.StringIO(f"""\
value,description
42,A small integer
18446744073709551615.0,This is 2.0^63
""")
data = pd.read_csv(csv)
print(data.dtypes)
print(data["value"].values)value          float64
description     object
dtype: object
[4.20000000e+01 1.84467441e+19]
OK, ex. 3: small float followed by $2^{64}$  as an int, interpreted as float64:
csv = io.StringIO(f"""\
value,description
4.2,A small float
18446744073709551616,This is 2^64
""")
data = pd.read_csv(csv)
print(data.dtypes)
print(data["value"].values)value          float64
description     object
dtype: object
[4.20000000e+00 1.84467441e+19]
FAIL, ex. 4: small int followed by $2^{64}+0.5$  as a float, interpreted as a str:
csv = io.StringIO(f"""\
value,description
42,A small int
18446744073709551616.5,This is 2.0^64 + 0.5
""")
data = pd.read_csv(csv)
print(data.dtypes)
print(data["value"].values)value          object
description    object
dtype: object
['42' '18446744073709551616.5']
FAIL, ex. 5: small int followed by $2^{64}$  as an int, interpreted as a str:
csv = io.StringIO(f"""\
value,description
42,A small int
18446744073709551616,This is 2.0^64
""")
data = pd.read_csv(csv)
print(data.dtypes)
print(data["value"].values)value          object
description    object
dtype: object
['42' '18446744073709551616']
Expected Behavior
A small int followed by $2^{64}+0.5$  as a float should be interpreted as float64:
csv = io.StringIO(f"""\
value,description
42,A small int
18446744073709551616.5,This is 2.0^64 + 0.5
""")
data = pd.read_csv(csv)
print(data.dtypes)
print(data["value"].values)value          float64
description     object
dtype: object
[4.20000000e+01 1.84467441e+19]
A small int followed by $2^{64}$  as an int should be interpreted as float64:
csv = io.StringIO(f"""\
value,description
42,A small int
18446744073709551616,This is 2.0^64
""")
data = pd.read_csv(csv)
print(data.dtypes)
print(data["value"].values)value          float64
description     object
dtype: object
[4.20000000e+01 1.84467441e+19]
Installed Versions
INSTALLED VERSIONS
commit           : 2e218d1
python           : 3.11.1.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.15.85
Version          : #1-NixOS SMP Wed Dec 21 16:36:38 UTC 2022
machine          : x86_64
processor        :
byteorder        : little
LC_ALL           : None
LANG             : fi_FI.UTF-8
LOCALE           : fi_FI.UTF-8
pandas           : 1.5.3
numpy            : 1.24.1
pytz             : 2022.7.1
dateutil         : 2.8.2
setuptools       : 65.5.0
pip              : 20.3.4
Cython           : None
pytest           : 7.2.1
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : 8.8.0
pandas_datareader: None
bs4              : None
bottleneck       : None
brotli           : None
fastparquet      : None
fsspec           : None
gcsfs            : None
matplotlib       : None
numba            : None
numexpr          : 2.8.4
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 11.0.0
pyreadstat       : None
pyxlsb           : None
s3fs             : None
scipy            : 1.10.0
snappy           : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
zstandard        : None
tzdata           : None