Skip to content

ENH: full SQL support for datetime64 values #7103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue May 13, 2014 · 5 comments · Fixed by #8208
Closed

ENH: full SQL support for datetime64 values #7103

jorisvandenbossche opened this issue May 13, 2014 · 5 comments · Fixed by #8208
Labels
IO SQL to_sql, read_sql, read_sql_query
Milestone

Comments

@jorisvandenbossche
Copy link
Member

See the tests I added for datetime, NaT and NaN values in PRs #7082 and #7100.

Postrgresql (psycopg2)

  • writing and reading works

SQLite

  • writing and reading works with sqlalchemy
  • reading works with read_sql_table, and you get strings with read_sql_query (sqlite has no native datetime format -> nothing to do about that)
  • writing fails with fallback mode 'Unsupported type' (datetime64 columns not supported with sqlite fallback #7567)

MySQL (pymysql / MySQLdb / mysql.connector)

  • Reading works with all drivers
  • Writing datetime64 columns depends on driver:
    • pymysql: fails with "KeyError: < class 'pandas.tslib.Timestamp' >"

      In [65]: import pymysql
      In [66]: engine_mysql = sqlalchemy.create_engine('mysql+pymysql://root@localhost/pandas_nosetest')
      In [67]: df = pd.DataFrame({'A': pd.date_range('2013-01-01 09:00:00', periods=3),
      ...:                    'B': np.arange(3.0)})
      In [68]: df.to_sql('test_datetime', engine_mysql, if_exists='replace')
      Traceback (most recent call last):
      File "<ipython-input-68-95961c7ce232>", line 1, in <module>
      df.to_sql('test_datetime', engine_mysql, if_exists='replace')
      ...
      File "C:\Anaconda\envs\devel\lib\site-packages\pymysql\converters.py", line 24, in escape_item
      encoder = encoders[type(val)]
      KeyError: <class 'pandas.tslib.Timestamp'>
      
    • MySQLdb: works

    • mysql.connector: fails with "ProgrammingError: Failed processing pyformat-parameters; Python 'timestamp' cannot be converted to a MySQL type" or with " 'MySQLConverter' object has no attribute '_timestamp_to_mysql' " (see 'MySQLConverter' object has no attribute '_timestamp_to_mysql' error with datetime64[ns], MySQL and MySQL-connector #7936 for more details)

The data are feeded to the database using df.itertuples() and then np.asscalar(), and this gives data of the Timestamp class. And for some reason most drivers can handle this (because it is a subclass of datetime.datetime?), but pymysql and mysql.connector not.

For now, all the tests are skipped for MySQL / pymysql.

Oracle / SQL Server ?


There are also some issues with NaT handling (see the tests I added for datetime, NaT and NaN values in PRs #7082 and #7100), but to track this there is also an issue specific on NaN handling (#2754)

@jreback jreback added this to the 0.14.1 milestone May 16, 2014
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.15.0, 0.14.1 Jul 1, 2014
@maxgrenderjones
Copy link
Contributor

Note that this is similar (the same?) as the NaN/NaT conversions (#2754, #4199, PR #4200) issues referred to in #4163

@jorisvandenbossche
Copy link
Member Author

I suppose we can solve this by creating a custom Timestamp class, subclassing from the sqlalchemy.types.DateTime (also mentioned here #2754 (comment) by @jreback), with something like:

class Timestamp(sqlalchemy.types.TypeDecorator):
    """convert to/from pandas.tslib.Timestamp type """

    impl = sqlalchemy.types.DateTime

    def process_bind_param(self, value, dialect):
        f = getattr(value,'to_datetime', None)
        if f is not None:
            return f()
        return value

And the use this in _sqlalchemy_type instead of DateTime (https://github.com/pydata/pandas/blob/master/pandas/io/sql.py#L787).
I tested it with pymysql, and it works. Question remain:

  • does this slow it down a lot for drivers where it already worked now (eg for postgresql)
  • how to handle timezones in this case.

@jreback There are both Timestamp.to_datetime() and Timestamp.to_pydatetime() methods (I used to_datetime above). Any reason to not use to_datetime (which is slightly faster than to_pydatetime)?

@jreback
Copy link
Contributor

jreback commented Sep 6, 2014

@jorisvandenbossche hmm, they are virtually the same (one is on the shadow class _Timestamp) but it doesn't matter you can call them similary). Not really sure why they both exist.

I think Timestamp.to_datetime (and Timedelta.to_timedelta) are pretty clear ways of doing this.

@jorisvandenbossche
Copy link
Member Author

however, if we would pick one, I think the name to_pydatetime is more clear, and also in line with the DatetimeIndex method

@jreback
Copy link
Contributor

jreback commented Sep 6, 2014

ok, sure (and maybe thats why their are 2)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
3 participants