You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using to_sql with mssql+pyodbc and fast_executemany=True, uploading a DataFrame with a single row containing a tz-aware datetime into a datetimeoffset column causes the timezone offset to be lost. Doing the same thing with a DataFrame containing more than one row produces the correct result.
frompprintimportpprintimportsysimporturllibimportpandasaspdimportsqlalchemyassaprint(sys.version)
# 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 22:39:24) [MSC v.1916 32 bit (Intel)]print(f'SQLAlchemy {sa.__version__}, pandas {pd.__version__}, pyodbc {pyodbc.version}')
# SQLAlchemy 1.3.12, pandas 0.25.3, pyodbc 4.0.28connection_string= (
r'DRIVER=ODBC Driver 17 for SQL Server;'r'SERVER=localhost,49242;'r'DATABASE=myDb;'r'Trusted_Connection=Yes;'r'UseFMTONLY=Yes;'
)
connection_uri='mssql+pyodbc:///?odbc_connect='+urllib.parse.quote_plus(connection_string)
engine=sa.create_engine(connection_uri, fast_executemany=True)
# test environmenttable_name='DateTimeOffset_Test'engine.execute(sa.text(f"DROP TABLE IF EXISTS [{table_name}]"))
engine.execute(sa.text(f"CREATE TABLE [{table_name}] (id int primary key, dto datetimeoffset)"))
# test datamy_tz=datetime.timezone(datetime.timedelta(hours=-7))
dto_value=datetime.datetime(2020, 1, 1, 0, 0, 0, tzinfo=my_tz)
print(dto_value) # 2020-01-01 00:00:00-07:00# ^# test codenum_rows=1row_data= [(x, dto_value) forxinrange(num_rows)]
df=pd.DataFrame(row_data, columns=['id', 'dto'])
print(df)
# id dto# 0 0 2020-01-01 00:00:00-07:00# ^ - gooddf.to_sql(table_name, engine, if_exists='append', index=False)
result=engine.execute(sa.text(f"SELECT id, CAST(dto as varchar(50)) AS foo FROM [{table_name}]")).fetchall()
pprint(result)
# [(1, '2020-01-01 00:00:00.0000000 +00:00')]# ^ - bad
I can confirm that this is still an issue with the latest version of pandas. It appears that to_sql() calls .execute() if the DataFrame contains a single row, and it calls .executemany() if the DataFrame contains multiple rows. One possible fix would be to always call .executemany(), even if the DataFrame only has one row. If there are concerns about performance or backwards compatibility then an argument like method="executemany" could be used to control this, e.g.,
method=None (the default) could still call .execute() for a one-row DataFrame, while
method="executemany" would tell pandas to always use .executemany()
Problem description
(Prompted by this Stack Overflow question.)
When using
to_sql
withmssql+pyodbc
andfast_executemany=True
, uploading a DataFrame with a single row containing a tz-aware datetime into adatetimeoffset
column causes the timezone offset to be lost. Doing the same thing with a DataFrame containing more than one row produces the correct result.Simply changing
num_rows = 1
tonum_rows = 2
produces correct results.Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.8.1.final.0
python-bits : 32
OS : Windows
OS-release : 8.1
machine : AMD64
processor : AMD64 Family 18 Model 1 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 0.25.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 44.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.8.4 (dt dec pq3 ext)
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : 1.3.12
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
The text was updated successfully, but these errors were encountered: