Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite3 timestamp converter cannot handle timezone #73285

Closed
bozokopic mannequin opened this issue Dec 28, 2016 · 35 comments
Closed

sqlite3 timestamp converter cannot handle timezone #73285

bozokopic mannequin opened this issue Dec 28, 2016 · 35 comments
Labels
3.8 (EOL) end of life stdlib Python modules in the Lib dir topic-sqlite3 type-feature A feature request or enhancement

Comments

@bozokopic
Copy link
Mannequin

bozokopic mannequin commented Dec 28, 2016

BPO 29099
Nosy @malemburg, @abalkin, @serhiy-storchaka, @zhangyangyu, @hassanbot
Files
  • sqlite3.patch
  • sqlite3-2.patch
  • sqlite3-2.patch: Regenerated for review
  • sqlite3-3.patch: Based on -2, addressing comments.
  • timestamptz.patch
  • timestamptz-2.patch
  • timestamptz-3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2016-12-28.23:04:29.635>
    labels = ['3.8', 'type-feature', 'library']
    title = 'sqlite3 timestamp converter cannot handle timezone'
    updated_at = <Date 2020-09-18.07:14:50.133>
    user = 'https://bugs.python.org/bozokopic'

    bugs.python.org fields:

    activity = <Date 2020-09-18.07:14:50.133>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2016-12-28.23:04:29.635>
    creator = 'bozo.kopic'
    dependencies = []
    files = ['46067', '46081', '46082', '46254', '46265', '46277', '46279']
    hgrepos = []
    issue_num = 29099
    keywords = ['patch']
    message_count = 34.0
    messages = ['284205', '284231', '284240', '284241', '284243', '284245', '284252', '284291', '284318', '285211', '285227', '285229', '285230', '285231', '285232', '285243', '285245', '285287', '285288', '285289', '285291', '285371', '285372', '285375', '285377', '285381', '285385', '285985', '285989', '285990', '285992', '285993', '377068', '377090']
    nosy_count = 7.0
    nosy_names = ['lemburg', 'ghaering', 'belopolsky', 'serhiy.storchaka', 'xiang.zhang', 'bozo.kopic', 'hassanbot']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue29099'
    versions = ['Python 3.8']

    @bozokopic
    Copy link
    Mannequin Author

    bozokopic mannequin commented Dec 28, 2016

    Current convert_timestamp function raises exception when parsing timestamps that contain timezone information and have microsecond set to zero.

    Patch for this behavior is included.

    Example script that demonstrates this problem:

    import sys
    import datetime
    import sqlite3
    import traceback
    import contextlib
    
    
    def main():
        db = sqlite3.connect('test.db', detect_types=sqlite3.PARSE_DECLTYPES)
        db.executescript("""
            CREATE TABLE IF NOT EXISTS test (
                timestamp TIMESTAMP)
            """)
    
        t = datetime.datetime.now(tz=datetime.timezone.utc)
        t = t.replace(microsecond=0)
        db.executemany("INSERT INTO test VALUES (:timestamp)", [{'timestamp': t}])
        db.commit()
    
        with contextlib.closing(db.cursor()) as c:
            try:
                c.execute('SELECT * FROM test')
                c.fetchall()
                print('original implementation success')
            except Exception as e:
                print('original implementation failed')
                traceback.print_exc()
            c.close()
    
        sqlite3.register_converter("timestamp", _sqlite_convert_timestamp)
    
        with contextlib.closing(db.cursor()) as c:
            try:
                c.execute('SELECT * FROM test')
                c.fetchall()
                print('patched implementation success')
            except Exception as e:
                print('patched implementation failed')
                traceback.print_exc()
            c.close()
    
    
    def _sqlite_convert_timestamp(val):
        datepart, timepart = val.split(b" ")
    
        # this is the patch
        timepart = timepart.split(b'+', 1)[0].split(b'-', 1)[0]
    year, month, day = map(int, datepart.split(b"-"))
    timepart_full = timepart.split(b".")
    hours, minutes, seconds = map(int, timepart_full[0].split(b":"))
    if len(timepart_full) == 2:
        microseconds = int('{:0<6.6}'.format(timepart_full[1].decode()))
    else:
        microseconds = 0
    val = datetime.datetime(year, month, day, hours, minutes, seconds,
                            microseconds)
    return val
    
    if __name__ == '__main__':
        sys.exit(main())

    @bozokopic bozokopic mannequin added type-crash A hard crash of the interpreter, possibly with a core dump 3.7 (EOL) end of life stdlib Python modules in the Lib dir labels Dec 28, 2016
    @zhangyangyu
    Copy link
    Member

    Thanks for your report Bozo.

    I think this is due to timestamp converter doesn't handle timezone now. Besides this ValueError, if I understand correctly, even if you pass a datetime object with tzinfo, with microsecond set, the returned datetime object is a wrong one since the tzinfo is discarded.

    @zhangyangyu zhangyangyu added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Dec 29, 2016
    @zhangyangyu zhangyangyu changed the title sqlite3 timestamp converter ValueError sqlite3 timestamp converter cannot handle timezone Dec 29, 2016
    @bozokopic
    Copy link
    Mannequin Author

    bozokopic mannequin commented Dec 29, 2016

    Yes, that is correct. Timezone information is discarded. Submitted patch only resolves occurrence of ValueError.

    Is additional patch that enables parsing of timezone required?

    @zhangyangyu
    Copy link
    Member

    I would prefer add the support for timezone since the current behaviour seems not correct to me. If you'd like to work on it, don't forget to add a test case and sign the CLA.

    But I am not sure this should be treated as a bug or new feature.

    @serhiy-storchaka
    Copy link
    Member

    Could you provide a unittest?

    @zhangyangyu
    Copy link
    Member

    import sqlite3, datetime
    c = sqlite3.connect(':memory:', detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
    cur = c.cursor()
    cur.execute('create table test(t timestamp)')
    t = datetime.datetime.now(tz=datetime.timezone.utc)
    cur.execute("insert into test(t) values (?)", (t,))
    cur.execute('select t from test')
    l = cur.fetchone()[0]
    t == l   # the result not equal to the original one

    @bozokopic
    Copy link
    Mannequin Author

    bozokopic mannequin commented Dec 29, 2016

    I'm providing new patch that adds support for parsing timezone information.

    Tests are included.

    @serhiy-storchaka
    Copy link
    Member

    In general the patch LGTM. Added comments on Rietveld.

    @zhangyangyu
    Copy link
    Member

    LGTM generally. :-)

    @serhiy-storchaka
    Copy link
    Member

    LGTM except that arguments of assertEqual() should be swapped.

    @zhangyangyu
    Copy link
    Member

    LGTM except that arguments of assertEqual() should be swapped.

    Thanks Serhiy! I'll take care of that. One problem remained is what to do with 2.7. There is no timezone object in 2.7. My preference is just pointing out the limitation that only naive datetime object is supported in the doc.

    @serhiy-storchaka
    Copy link
    Member

    Agreed. And maybe raise more informative error for datetimes with a timezone.

    I'm also not sure about 3.5 and 3.6. Datetimes with timezones were never supported. Currently all returned ditetime objects are naive. Getting an aware datetime can confuse applications.

    @malemburg
    Copy link
    Member

    I don't think this can be considered a bug fix, since it changes behavior for applications that read data from SQLite databases which were not created by Python.

    Those application may now see datetime values with tz infos and will likely not be prepared to handle all the problems associated with this.

    Is it possible to make the support optional for 3.5 and 3.6 and only enable it for 3.7 (with the possibility of disabling it again) ?

    @zhangyangyu
    Copy link
    Member

    I asked whether this should be treated as an enhancement or bug fix. Now with your concerns I think it fits as enhancement more.

    Is it possible to make the support optional for 3.5 and 3.6 and only enable it for 3.7 (with the possibility of disabling it again) ?

    How about just make it just into 3.7 and just document only naive datetime objects are supported in 3.5, 3.6 and 2.7.
    I am not sure it's worth to make it even optional in 3.7. Discarding tzinfo sounds weird, and worse, the behaviour is not deterministic, it could also fail with an exception.

    @serhiy-storchaka
    Copy link
    Member

    The naive datetime converter is registered under the name "timestamp". The aware datetime converter or the universal datetime converter (if it is needed) can be registered under different names.

    @malemburg
    Copy link
    Member

    On 11.01.2017 17:08, Serhiy Storchaka wrote:

    The naive datetime converter is registered under the name "timestamp". The aware datetime converter or the universal datetime converter (if it is needed) can be registered under different names.

    This sounds like a good idea.

    Perhaps use "timestamptz" or something similar.

    @serhiy-storchaka serhiy-storchaka added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Jan 11, 2017
    @malemburg
    Copy link
    Member

    On 11.01.2017 17:04, Xiang Zhang wrote:

    I am not sure it's worth to make it even optional in 3.7. Discarding tzinfo sounds weird, and worse, the behaviour is not deterministic, it could also fail with an exception.

    Best practice is to store all date/time values using UTC (or some
    other fixed timezone) in databases and to manage the timezone/locale
    information elsewhere.

    The TZ info then becomes redundant and only results in more
    database space being used as well as slower queries (due to the extra
    TZ calculations being done; many databases internally store the
    values as UTC to avoid having to do TZ calculations at query time).

    Of course, there are use cases, where you'd still want to work
    with TZ values in the database, so an extra converter for this
    sounds like a good plan.

    @zhangyangyu
    Copy link
    Member

    timestamptz.patch implements a new converter that's able to convert aware datetime objects.

    @serhiy-storchaka
    Copy link
    Member

    I think the timestamptz converter should either interpret strings without timezone as UTC (and perhaps understand the "Z" suffix as sqlite3 date() function) or raises an error. It should never return naive datetime.

    The timestamp converter needs better error reporting when get an input with a timezone.

    @zhangyangyu
    Copy link
    Member

    I think the timestamptz converter should either interpret strings without timezone as UTC (and perhaps understand the "Z" suffix as sqlite3 date() function) or raises an error. It should never return naive datetime.

    Currently timestamptz just convert back what the user passed in, no matter naive or aware objects. What to do with them is left to the app. If we raise an error, could users use naive and aware objects together? And interpreting strings without timezone as UTC seems will mistranslate the object. For example, pass in datetime.datetime.now() and translate it back as UTC may not be right.

    The timestamp converter needs better error reporting when get an input with a timezone.

    I thought about it but our intention to introduce a new converter is not to break old code. Doesn't add error reporting violate the intention? Users' code may not catch the error now.

    @serhiy-storchaka
    Copy link
    Member

    Naive and aware objects should not be mixed. Their actually are different types. The converter doesn't return str or bytes when can't parse an input as a datetime, and it shouldn't return a naive datetime if can't find a timezone.

    SQLite date and time functions imply UTC if timezone is omitted. This is a reason for returning aware UTC datetime objects. On other side, Python is more powerful programming language, it distinguish naive and aware datetime objects, and it is unlikely that these two types are mixed in one database column. It is better to raise an error that silently return possible wrong result. It exceptional case user can write special converter or just call SQLite datetime() for unifying data format.

    I think the old code unlikely will be broken if preserve an exception type and don't change conditions for raising an error. New error message can contain full input string and suggest to use the timestamptz converter if it looks as a datetime with timezone.

    @zhangyangyu
    Copy link
    Member

    I think the old code unlikely will be broken if preserve an exception type and don't change conditions for raising an error.

    Currently there could be no error when the object gets a timezone. The timezone is simply discarded and only when microseconds comes to 0 there is a ValueError. I don't think the user code would prepare to catch the ValueError. I don't oppose make timestamp more strict but just not sure if it's suitable.

    @zhangyangyu
    Copy link
    Member

    timestamptz-2.patch make timestamp and timestamptz specilized for their own objects.

    @serhiy-storchaka
    Copy link
    Member

    I think ProgrammingError is not correct type of an exception. It is used for programming errors such as using uninitialized or closed objects or supplying incorrect number of bindings. But this error can be caused by invalid data in a database created not by Python. Currently converters raise ValueError or TypeError for invalid data. ValueError is raised for datetime without microseconds but with timezone offset. I think ValueError is more appropriate type for such errors.

    Perhaps even TypeError should be converted to ValueError.

    try:
        # parsing date or datetime
    except (ValueError, TypeError):
        raise ValueError(...)
    

    @serhiy-storchaka
    Copy link
    Member

    Perhaps even TypeError should be converted to ValueError.

    But this is different issue of course.

    @zhangyangyu
    Copy link
    Member

    I am okay with ValueError(actually I use it in the patch originally) but I am not in favour of catching the errors of the parser. The parser raises errors due to invalid data, not timezone. I think propagate it to users could make the reason more obvious.

    @zhangyangyu
    Copy link
    Member

    timestamptz-3.patch uses ValueError instead of ProgrammingError and treat suffix Z as UTC timezone.

    @zhangyangyu
    Copy link
    Member

    Ping for review for timestamptz-3.patch.

    @serhiy-storchaka
    Copy link
    Member

    Added few minor comments on Rietveld. Technically the patch LGTM. But the behavior change of the timestamp converter for input without timezone offset perhaps needs wider discussion. We have several options:

    1. Current behavior. Silently drop the timezone if there are microseconds, and raise an error when there are no microseconds. This is bad for many reasons, but some user code can depend on this.

    2. Just drop the timezone, don't raise an error when there are no microseconds. The converter will become returning incorrect result instead of just failing on unexpected input. This will fix the user code that is aware of this, but currently fails when input doesn't have microseconds.

    3. Return aware datetime for input with timezone offset (as in original patch). Returning aware or naive datetime depending on input can break a user code that is not prepared for this.

    4. Raise an error for input with timezone offset, always return naive datetime objects from this converter. This can break the user code that depends on current behavior and works with the input containing the same timezone offsets.

    Any option can break some code. I prefer option 3, but want to hear thoughts of other core developers. Maybe discuss this on Python-Dev?

    @zhangyangyu
    Copy link
    Member

    Maybe discuss this on Python-Dev?

    It's fine. Could you compose a mail instead of me? I am not good at that. :-(

    @serhiy-storchaka
    Copy link
    Member

    I'm even worse at that. :-(

    @zhangyangyu
    Copy link
    Member

    I'm even worse at that. :-(

    LoL. Okay I'll do that. :-)

    @hassanbot
    Copy link
    Mannequin

    hassanbot mannequin commented Sep 17, 2020

    This still isn't fixed as of 3.8 (or in master I think).

    I can understand why you wouldn't want to allow serializing and deserializing time zones, since tzinfo objects cannot be accurately serialized with a simple UTC offset, but you should at least get an error when trying to insert an aware object. Anything is better than it is now, where you get no warning or error when inserting the object, and get a hard to interpret error ("invalid literal for int() with base 10") when trying to retrieve it.

    For deserialization, the datetime class now (since 3.7) includes a fromisoformat() method that could be used as a counterpart to the isoformat() method used when serializing. At least it would be consistent then.

    @hassanbot hassanbot mannequin added 3.8 (EOL) end of life and removed 3.7 (EOL) end of life labels Sep 17, 2020
    @serhiy-storchaka
    Copy link
    Member

    Xiang Zhang, was there a discussion on Python-Dev?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @erlend-aasland
    Copy link
    Contributor

    As per discussion on Discourse, we will deprecate the built-in, default adapters and converters. Thus, this issue is now superseded by gh-90016.

    @erlend-aasland erlend-aasland closed this as not planned Won't fix, can't repro, duplicate, stale May 21, 2022
    @erlend-aasland erlend-aasland moved this from Done to Discarded in sqlite3 issues May 21, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life stdlib Python modules in the Lib dir topic-sqlite3 type-feature A feature request or enhancement
    Projects
    Status: Discarded
    Development

    No branches or pull requests

    4 participants