-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce configs for some timezone handling aspects: epoch_s and relative end time #6721
Conversation
superset/connectors/sqla/models.py
Outdated
@@ -175,10 +175,11 @@ def dttm_sql_literal(self, dttm): | |||
if self.database_expression: | |||
return self.database_expression.format(dttm.strftime('%Y-%m-%d %H:%M:%S')) | |||
elif tf: | |||
seconds_since_epoch = int(dttm.timestamp()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that int(dttm.timestamp()) == (dttm - datetime(1970, 1, 1)).total_seconds()
? Absolutely and always? Timezones included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't equal and that's the right thing: dttm can be either TZ aware or unaware. Its aware when either the user specifies the TZ in the freeform input, and unaware otherwise. Epoch is defined as the number of seconds since UTC 1/1/1970. If its TZ aware, then it calculates based off the UTC, if its unaware it assumes that the current TZ is UTC.
So to answer your question: the equivalence only holds when dttm itself is unaware, ie when it is not freeform input in the ISO8601 format by the user. Otherwise, we don't want to disregard the timezone that the user has provided.
On the subject of timezones, there is a bit of a fundamental issue here: The timestamp column can be in one TZ, while superset-webserver itself is running in another TZ. And then the UI also allows entering timestamp filters, which could be in another TZ.
My thought was that perhaps we should allow the user to specify what the TZ of the timestamp column is, particularly if the timestamp is in a numeric format like epoch_s. And then honor that.
I will open an issue about better TZ support in Superset, and then this code can be revised. Until then, it simply fixes a bugfix related to not honoring the user specified TZ input in the freeform input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we need a revamp of timezone handling in Superset, though we shouldn't break how current charts display time on the way there.
I know when using datasources with timestamp, that stuff is fairly clear/straightforward, meaning there's really just the time of the event and the time of the user's browser to take into consideration. For other things like a truncated ds
or for example our "baby birth names" dataset, where the data points are on the first day of the year, we often don't want any form of conversion. Like I don't what the chart to adjust for timezones at all or people in different timezones will see something like 1/1/x 9AM PST
.
There are thousands of charts in the wild that have adapted (perhaps using time expressions, or the datasource time lag feature, or even altered pipelines and source data) to whatever the current behavior is now in Superset and we should not make changes that will change the way charts look. This requires some thinking and a "Superset Improvement Proposal", and change management. Probably a per-chart feature flag around using the "new and improved time handling" that would be False for existing charts and True for new ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method does not affect anything about how the timestamp values are rendered in the viz. It only controls how the filter is rendered. Ie how do we want to convert the "since"/"until" values in the form to the SQL timestamp comparison filter.
The current behavior is as follows:
- If the user gives the start/end with timestamp: (a) If the DB column is epoch_s, then this will throw an error since we are doing aware-dttm minus naive-dttm, which is an error. (b) If the DB column is not epoch, then the user specified time will be converted into the timestamp string as given by the column's timestring format.
- If the user entered start/end is without timestamp, as in the majority of cases: (a) If the DB columns is epoch_s, then we will return the number of seconds since 1/1/1970 (local time). (b) Otherwise, the timestamp would be converted using python's strftime.
epoch_s/epoch_ms lack information about what timezone is the "epoch" calculation done using. Technically it should be UTC, but users could be subtracting it from 1/1/1970 in their local time zone (and using the time-shift feature instead).
Incorporating timezone in the database expression is easy: The user can provide an '%z' if the underlying DB supports timezones in literals.
I agree with you on the need to preserve absolute backward compatibility and as such I will redo my implementation:
- I will introduce a epoch_ms_tz/epoch_s_tz: Which means that the DB column is in number of milliseconds/seconds since 1/1/1970 (given TZ). If tz is not given, then the default is to make it behave like today: time delta since 1/1/1970 (local TZ).
[ My current implementation was changing the cases 1.a and 2.a above ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think dttm.timestamp() is the right thing to do here:
The database functions epoch_to_dttm already convert the expression to the unix timestamp. So the assumption is already that epoch_s means UTC-epoch_s. So this function dttm_sql_literal converts the control input start/end timestamps to epoch-time. And I would argue that it too should behave the same way.
I can feature flag both these changes since they are "breaking" but I think the right thing to do.
I decided against introducing a new kind of epoch_s_tz because the db_engine_specs.py file already assumes that epoch_s is in UTC epoch (like it should).
3e8ab31
to
394565f
Compare
Codecov Report
@@ Coverage Diff @@
## master #6721 +/- ##
==========================================
+ Coverage 56.07% 56.12% +0.05%
==========================================
Files 526 521 -5
Lines 23247 23174 -73
Branches 2784 2769 -15
==========================================
- Hits 13035 13006 -29
+ Misses 9802 9759 -43
+ Partials 410 409 -1
Continue to review full report at Codecov.
|
superset/utils/core.py
Outdated
@@ -946,7 +946,7 @@ def get_since_until(time_range: Optional[str] = None, | |||
|
|||
""" | |||
separator = ' : ' | |||
today = parse_human_datetime('today') | |||
today = parse_human_datetime('now') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagging @betodealmeida as he wrote most of this method and did the original dynamic time implementation.
There are clear challenges here. First depending on using a ts
vs ds
, a user may want different behaviors, and that's not mentioning the issues around time zones.
Looks like the code here assumes a ds
and tries to get 7 full days, and @agrawaldevesh , it looks like you are using ts
and want true relative time.
It seems like Superset needs to know more about the grain of your time column, and maybe needs to offer better/smarter options around relative time filters.
In the meantime, you can use the relative time expressions as shown here:
I think 7 days ago may truncate though. Also we've been talking about improving this component so that it shows what the relative expression compute to, right there in that popover.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think having users remember to use 'now' (instead of default 'today') would be a problem for my deployment of superset. So I will try to make this be configurable instead such that it preserves backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree this should be now
, and left bound should probably be relative to now as well, @betodealmeida what do you think?
87a9eca
to
3394f9e
Compare
@mistercrunch, @betodealmeida ... the diff is ready to re-reviewed. All of its changes are guarded behind feature flags so it would cause no backward incompatibilities. |
3394f9e
to
b04c9ba
Compare
superset/config.py
Outdated
@@ -188,7 +188,9 @@ | |||
# --------------------------------------------------- | |||
# Feature flags that are on by default go here. Their | |||
# values can be overridden by those in super_config.py | |||
FEATURE_FLAGS = {} | |||
FEATURE_FLAGS = { | |||
'IS_EPOCH_S_TRULY_UTC': False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xtinec , please review this use of the feature flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to have a description of the feature flag here as a comment. Also, @xtinec how is one supposed to set a single entry in the feature flag in superset_config.py
? Seems like someone would have to either override all of the FEATURE_FLAGS
dict or none.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xtinec said she would be fixing this behavior. Basically the proper merging needs to be done. For now, we can work around it by assuming the default value in the code proper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mistercrunch, is there anything else you would like me to improve in this PR ? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrawaldevesh @mistercrunch I have a PR coming up shortly to allow merging of feature flags in superset_config.py
and those set by default in config.py
.
b04c9ba
to
50e8d9c
Compare
superset/connectors/sqla/models.py
Outdated
@@ -125,12 +125,15 @@ def datasource(self): | |||
return self.table | |||
|
|||
def get_time_filter(self, start_dttm, end_dttm): | |||
is_epoch_in_utc = config\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll expose a utility function for you to make this easier on server side in my PR to fix the merging issue.
FYI @mistercrunch
…h-tz Introduce a config DEFAULT_RELATIVE_END_TIME which is used when computing the "Last X days". The default behavior (as currently) is to let that be 'today', which actually means the 0th second of today. We can also let it be 'now' which means the data as of now (ie at query time). Secondly, also introduce another config IS_EPOCH_S_TRULY_UTC, which means that the logged time in epoch_s format is actually in UTC. Currently (as the default) is that it it is in the local (superset webserver) timezone. ** There is no backward incompatibility with thes config features since the default behavior hasn't changed. **
50e8d9c
to
cdb785f
Compare
@mistercrunch, I have reworked this PR to not use the FEATURE_FLAGS framework, since its still being perfected/worked upon. I am looking forward to using the FEATURE_FLAGS stuff in my next PR when its more baked. So I think you can safely accept this PR now since it does not change any behavior. cc: @xtinec |
LGTM |
@agrawaldevesh @mistercrunch apologies for chiming in late here but regarding the epoch time, epoch should always be in reference to UTC hence the old logic (seconds since 1970-01-01 00:00:00 in the local timezone) is a bug and thus probably should be deprecated rather than supporting both options. Note the current default |
One of the issues with using the beginning of midnight of today is that the logging time-column's timezone can be different from the timezone of the superset flask server.
This is also important for my usecase of running Superset primarily with Pinot, wherein you are interested in showing what has happened right now as opposed to from the beginning of the day