-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(elasticsearch): time_zone setting does not work for cast datetime expressions #17048
Conversation
Codecov Report
@@ Coverage Diff @@
## master #17048 +/- ##
==========================================
+ Coverage 76.86% 76.95% +0.08%
==========================================
Files 1042 1042
Lines 56254 56271 +17
Branches 7785 7785
==========================================
+ Hits 43242 43301 +59
+ Misses 12754 12712 -42
Partials 258 258
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
cls, target_type: str, dttm: datetime, **kwargs: Any | ||
) -> Optional[str]: | ||
|
||
if target_type.upper() != utils.TemporalType.DATETIME: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the previous logic of not short circuiting is clearer, i.e., if DATETIME
then …
.
|
||
if es_version and StrictVersion(es_version) >= StrictVersion("7.8"): | ||
datetime_formatted = dttm.isoformat(sep=" ", timespec="seconds") | ||
return f"""DATETIME_PARSE('{datetime_formatted}', 'yyyy-MM-dd HH:mm:ss')""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the f-string as previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was first written according to the way of not short-circuiting, so there will be nested if. If the f-string is written in one line, then the length of this line will be too long, so the expression is extracted. This way of writing refers to drill .py#convert_dttm method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! First pass observation: one concern regarding the changed signature of convert_dttm
, other than that the changes here make sense. Curious to hear your thoughts on the signature, happy to take a second pass + do some testing later.
superset/db_engine_specs/base.py
Outdated
@@ -686,13 +685,14 @@ def df_to_sql( | |||
|
|||
@classmethod | |||
def convert_dttm( # pylint: disable=unused-argument | |||
cls, target_type: str, dttm: datetime, | |||
cls, target_type: str, dttm: datetime, **kwargs: Any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this is **kwargs: Any
and not db_extra: Optional[Dict[str, Any]]
like has been done in get_column_spec
? Unpacking into kwargs will make more difficult to add new parameters to this method going forward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the writing here is not strict enough, for db_extra
, it is better to show the declaration type, I think, the method signature can be like this, def convert_dttm(cls, target_type: str, dttm: datetime, db_extra: Optional[Dict[str, Any]], **kwargs: Any)
also continue to retain kwargs
, so that later if a data source needs non-db_extra information, this way is also compatible, what do you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aniaan I'd prefer not to add **kwargs
unless it's currently needed (it's easy enough to add later when the need comes up). And thinking more closely, I'd personally prefer to keep all arguments named so that all parameters in the method are explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can, then we will not add it for now, and we will consider it later if we have this situation.
I'll correct the PR later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@villebro I have updated it, you can review it when you have time
…GH16726-merge-test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional comments, but in general LGTM
cls, target_type: str, dttm: datetime, db_extra: Optional[Dict[str, Any]] = None | ||
) -> Optional[str]: | ||
|
||
db_extra = db_extra if db_extra else {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: db_extra = db_extra or {}
should do the job, too
if es_version and StrictVersion(es_version) >= StrictVersion("7.8"): | ||
datetime_formatted = dttm.isoformat(sep=" ", timespec="seconds") | ||
return ( | ||
f"""DATETIME_PARSE('{datetime_formatted}', 'yyyy-MM-dd HH:mm:ss')""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should have a fallback/more clear error message if the version isn't parseable by StrictVersion
(Since this is to be populated by the user, we can expect to bump into unparseable values here):
>>> from distutils.version import StrictVersion
>>> StrictVersion("7.8.0.0")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ville/.pyenv/versions/3.8-dev/lib/python3.8/distutils/version.py", line 40, in __init__
self.parse(vstring)
File "/Users/ville/.pyenv/versions/3.8-dev/lib/python3.8/distutils/version.py", line 137, in parse
raise ValueError("invalid version number '%s'" % vstring)
ValueError: invalid version number '7.8.0.0'
Same for non-string values (I expect someone may try to enter it as a number, not string):
>>> StrictVersion(7.8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ville/.pyenv/versions/3.8-dev/lib/python3.8/distutils/version.py", line 40, in __init__
self.parse(vstring)
File "/Users/ville/.pyenv/versions/3.8-dev/lib/python3.8/distutils/version.py", line 135, in parse
match = self.version_re.match(vstring)
TypeError: expected string or bytes-like object
Just in case, perhaps we could do something as simple as
supports_dttm_parse = False
try:
if es_version:
supports_dttm_parse = StrictVersion(es_version) >= StrictVersion("7.8")
except ValueError:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@villebro I have updated it, you can review it when you have time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your patience with all the iterations!
SUMMARY
The elasticsearch
CAST
function does not take effect for thetime zone
setting. In ES7.8 and above, we can use theDATETIME_PARSE
function to solve this problem.In the current master code, the
extra
dict in the database contains theversion
key. We can use version to solve this problem. According to the instructions, version is set forpresto
at the beginning, and ES should also reuse this field.TESTING INSTRUCTIONS
pip install elasticsearch-dbapi==0.2.6
ADDITIONAL INFORMATION