Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(mssql): week time grain should respect datefirst setting #10811

Merged
merged 7 commits into from
Nov 16, 2021

Conversation

binome74
Copy link
Contributor

@binome74 binome74 commented Sep 7, 2020

SUMMARY

In MS SQL Server specifying SET DATEFIRST has no effect on DATEDIFF. DATEDIFF always uses Sunday as the first day of the week to ensure the function operates in a deterministic way. This is not desirable for locales where weeks start on Monday. To get the first day of a calendar week for the current locale use DATEPART which does respect the DATEFIRST setting.

TEST PLAN

  1. Create a test table in SQL Server (any version).
SELECT cast(d AS DATE) d, n 
INTO p1w_test
FROM (VALUES
     ('2020-08-01',  1), ('2020-08-02',  2), ('2020-08-03',  3), ('2020-08-04',  4), ('2020-08-05',  5), ('2020-08-06',  6)
    ,('2020-08-07',  7), ('2020-08-08',  8), ('2020-08-09',  9), ('2020-08-10', 10), ('2020-08-11', 11), ('2020-08-12', 12)
    ,('2020-08-13', 13), ('2020-08-14', 14), ('2020-08-15', 15), ('2020-08-16', 16), ('2020-08-17', 17), ('2020-08-18', 18)
    ,('2020-08-19', 19), ('2020-08-20', 20), ('2020-08-21', 21), ('2020-08-22', 22), ('2020-08-23', 23), ('2020-08-24', 24)
) T(d, n)
;
  1. Add p1w_test table as a source in Superset.
  2. Explore the table, set "Time Column" to "d", "Time Grain" to "week". For the query pick SUM(n).
  3. For North American locales (@@datefirst == 7) the resulting table should be
2020-07-26   1
2020-08-02  35
2020-08-09  84
2020-08-16 133
2020-08-23  47
  1. For European locales (@@datefirst == 1) the resulting table should be
2020-07-27   3
2020-08-03  42
2020-08-10  91
2020-08-17 140
2020-08-24  24

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

@@ -46,7 +46,7 @@ class MssqlEngineSpec(BaseEngineSpec):
"PT0.5H": "DATEADD(minute, DATEDIFF(minute, 0, {col}) / 30 * 30, 0)",
"PT1H": "DATEADD(hour, DATEDIFF(hour, 0, {col}), 0)",
"P1D": "DATEADD(day, DATEDIFF(day, 0, {col}), 0)",
"P1W": "DATEADD(week, DATEDIFF(week, 0, {col}), 0)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The P1W time grain tends to refer to the official weekly date truncation, in this case WEEK. There are a few time grains that target specific weekday starts that can be seen here: https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/base.py#L94-L95 . Do you think it would make sense to leave P1W unchanged, and add two new time grains, specifically 1969-12-28T00:00:00Z/P1W to also refer to the regular WEEK interval, and 1969-12-29T00:00:00Z/P1W to refer to the proposed monday starting week definition?

Copy link
Contributor Author

@binome74 binome74 Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we want see more or less consistent behaviour of Superset across all the supported engines we should check how other engines calculate P1W interval i.e. what "the official weekly date truncation" actually means: respecting or not respecting the current locale. If they do respect the locale then we should change it for MS SQL server accordingly (as proposed). Else we surely can introduce these 2 explicit "starting Monday"/"starting Sunday" intervals, but I think it also worth mentioning as a remark in the documentation to avoid confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With "official weekly date truncation" I was referring to what the database by default means by WEEK, which can mean different things for different databases. Personally, I tend to use the weekly grain to see what 7 day aggregates are; I usually don't so much care if it's a week starting on Sunday or Monday. For the use case where this is important, I suggest using the dedicated time grains where supported. Going forward, I do agree it would be great to start introducing more consistency in what these terms mean, and we encourage and gladly accept contributions to that end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With regard to the proposed change here, and given the ambiguity of what P1W means, I think this change makes sense as a "best effort" weekly time grain (for the lack of a better term). But if you could add the explicit Sunday and Monday starting grains, then users could have the option of using either the local week definition, or a fixed one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, finally

  1. As of "P1W" I still believe that Superset should behave expectedly i.e. if the underlying database/connection has certain locale settings Superset should not override them unless the user wants to specify it explicitly. I also have added the DATEADD(day, DATEDIFF(day, 0, {col}), 0) wrap around the {col} in order to correctly truncate the time part in case of MS SQL 2005/2008. Not the least is that the previous version has returned a wrong (T+1) value for the axis' labels / cells text, though is hasn't affected the grouping itself.
  2. I added the support of "1969-12-28T00:00:00Z/P1W" and "1969-12-29T00:00:00Z/P1W" grains for the user to have a choice to explicitly indicate which start of the week he or she prefers for the query. For the "week_start_sunday" I used the previous version of "P1W" with subtraction of one day. The "week_start_monday" is calculated in similar fashion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@binome74 I'm ok with defaulting to the default locale of the database. In the latest commit I didn't see the original DATEPART call; did I understand correctly that the current syntax works similarly, but supports older versions of SQL Server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes
"P1W": "DATEADD(day, 1 - DATEPART(weekday, {col}), DATEADD(day, DATEDIFF(day, 0, {col}), 0))"
here DATEADD(day, 1 - DATEPART(weekday, {col}) does the job
and DATEADD(day, DATEDIFF(day, 0, {col}), 0)) just truncates the time part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good. Stylistic request: can we change the reserved names (day, weekday etc) to all caps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to make the changes proposed by the linter:

superset/db_engine_specs/mssql.py:54:0: C0301: Line too long (99/88) (line-too-long)
superset/db_engine_specs/mssql.py:49:0: C0301: Line too long (102/88) (line-too-long)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. Did it so.

@stale
Copy link

stale bot commented Nov 26, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added inactive Inactive for >= 30 days and removed inactive Inactive for >= 30 days labels Nov 26, 2020
…usted the "week" grain for better backward compatibility with MS SQL 2005/2008.
@codecov-io
Copy link

codecov-io commented Dec 3, 2020

Codecov Report

Merging #10811 (0d12be1) into master (5d3e1b5) will decrease coverage by 0.20%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10811      +/-   ##
==========================================
- Coverage   77.04%   76.84%   -0.21%     
==========================================
  Files        1041     1041              
  Lines       56068    56126      +58     
  Branches     7742     7742              
==========================================
- Hits        43198    43129      -69     
- Misses      12612    12739     +127     
  Partials      258      258              
Flag Coverage Δ
hive ?
mysql 81.95% <ø> (+<0.01%) ⬆️
postgres 81.96% <ø> (+0.01%) ⬆️
presto ?
python 82.05% <ø> (-0.41%) ⬇️
sqlite 81.64% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/db_engine_specs/mssql.py 95.83% <ø> (ø)
superset/db_engines/hive.py 0.00% <0.00%> (-85.19%) ⬇️
superset/db_engine_specs/hive.py 69.49% <0.00%> (-16.99%) ⬇️
superset/db_engine_specs/presto.py 83.47% <0.00%> (-6.49%) ⬇️
superset/views/database/mixins.py 81.03% <0.00%> (-1.73%) ⬇️
superset/connectors/sqla/models.py 86.79% <0.00%> (-1.43%) ⬇️
superset/models/core.py 89.26% <0.00%> (-0.74%) ⬇️
superset/db_engine_specs/base.py 88.20% <0.00%> (-0.39%) ⬇️
superset/utils/core.py 89.61% <0.00%> (-0.36%) ⬇️
superset/charts/schemas.py 100.00% <0.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d3e1b5...0d12be1. Read the comment docs.

@villebro villebro changed the title "P1W" grain should respect DATEFIRST setting in MS SQL Server fix(mssql): week time grain should respect datefirst setting Dec 3, 2020
@pull-request-size pull-request-size bot added size/S and removed size/XS labels Dec 3, 2020
@stale
Copy link

stale bot commented Jun 26, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Jun 26, 2021
@binome74
Copy link
Contributor Author

Is there anything else I need to do to have my PR accepted?

@stale stale bot removed the inactive Inactive for >= 30 days label Jul 12, 2021
@eschutho
Copy link
Member

@binome74 this looks good, but the tests have expired. Can you close and then just reopen this PR to retrigger all the test runs?

@binome74 binome74 closed this Jul 20, 2021
@binome74 binome74 reopened this Jul 20, 2021
@villebro
Copy link
Member

@binome74 I'm terribly sorry this PR got overlooked. I rebased it, fixed the conflict and will merge once CI passes. Thanks again so much for the contribution, looking forward to getting this in!

@pull-request-size pull-request-size bot added size/M and removed size/S labels Nov 15, 2021
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@villebro villebro merged commit 211b32a into apache:master Nov 16, 2021
AAfghahi pushed a commit that referenced this pull request Jan 10, 2022
* "P1W" grain should respect DATEFIRST setting in MS SQL Server

* Added "week_start_sunday" and "week_start_monday" grains support. Adjusted the "week" grain for better backward compatibility with MS SQL 2005/2008.

* Stylistic and linter-requested changes

* fix test

Co-authored-by: Valeriy Aleksashkin <v.aleksashkin@gmail.com>
Co-authored-by: Ville Brofeldt <ville.v.brofeldt@gmail.com>
@mistercrunch mistercrunch added the 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels label Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M 🚢 1.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants