Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Time shifts with different granularity for ECharts #24176

Merged
merged 15 commits into from
Jun 8, 2023

Conversation

michael-s-molina
Copy link
Member

@michael-s-molina michael-s-molina commented May 22, 2023

SUMMARY

There's an option in the Advanced Analytics section of Explore called Time Shift that allows an user to compare the query results with the same query dislocated in time by the amount specified in the control. The way this is implemented, is to fire a query for each specified Time Shift and later join the results with the original query. To make the join work the algorithm modifies the temporal column of the time shift results to match the one in the original query. The problem is that the algorithm is not taking into account that the time granularity and time shifts can influence on the temporal column and affect the resulting timestamps. For example, if the Time Grain is Quarter and Time Shift is 28 days, the results could have different timestamps but still belong to the same quarter. Another example is when the Time Grain is Week and after applying the time shifts, the results fall in a different year which may produce different week numbers for the timestamps. The objective of this PR is to fix these problems by generating an artificial join key that takes into consideration the Time Grain when joining the results.

The following table illustrates how the join keys are calculated depending on the Time Grain:

Time Grain Join key
Week year-week
Month year-month
Quarter year-quarter
Year year

This join key is only used when joining the results and is discarded in the end.

This PR also improves the Python codebase by creating a TimeGrain class and replacing all string literals with references to the new class.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Screenshot 2023-05-22 at 13 24 36 Screenshot 2023-05-22 at 13 26 02

TESTING INSTRUCTIONS

1 - Choose a dataset with plenty of temporal data available
2 - Create a Line Chart (ECharts)
3 - Play with different Time Grains and Time Shifts
4 - Make sure the offsets are correctly calculated

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's still in draft, but a few initial thoughts

superset/common/query_context_processor.py Outdated Show resolved Hide resolved
superset/common/query_context_processor.py Outdated Show resolved Hide resolved
@michael-s-molina michael-s-molina force-pushed the fix-time-shifts branch 2 times, most recently from 4e4d95c to 7a5a2ac Compare May 31, 2023 14:38
@michael-s-molina michael-s-molina marked this pull request as ready for review May 31, 2023 14:39
@codecov
Copy link

codecov bot commented May 31, 2023

Codecov Report

Merging #24176 (559b370) into master (1d9a761) will increase coverage by 0.15%.
The diff coverage is 97.12%.

❗ Current head 559b370 differs from pull request most recent head 13ed830. Consider uploading reports for the commit 13ed830 to get more accurate results

@@            Coverage Diff             @@
##           master   #24176      +/-   ##
==========================================
+ Coverage   68.29%   68.45%   +0.15%     
==========================================
  Files        1957     1951       -6     
  Lines       75624    75528      -96     
  Branches     8223     8218       -5     
==========================================
+ Hits        51649    51699      +50     
+ Misses      21867    21718     -149     
- Partials     2108     2111       +3     
Flag Coverage Δ
hive 53.53% <66.41%> (+0.13%) ⬆️
postgres 79.14% <92.36%> (+0.14%) ⬆️
presto 53.45% <66.41%> (+0.13%) ⬆️
python 82.97% <98.47%> (+0.19%) ⬆️
sqlite 77.69% <92.36%> (+0.15%) ⬆️
unit 53.92% <77.09%> (+0.48%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...s/plugin-chart-pivot-table/src/PivotTableChart.tsx 0.00% <0.00%> (ø)
...rset-frontend/src/components/ReportModal/index.tsx 79.03% <ø> (ø)
...src/dashboard/components/PropertiesModal/index.tsx 64.14% <ø> (+0.10%) ⬆️
...ponents/controls/VizTypeControl/VizTypeGallery.tsx 88.23% <ø> (ø)
...-frontend/src/features/alerts/AlertReportModal.tsx 54.44% <ø> (ø)
...-frontend/src/visualizations/presets/MainPreset.js 100.00% <ø> (ø)
superset/charts/post_processing.py 92.59% <ø> (+3.23%) ⬆️
superset/examples/birth_names.py 70.00% <ø> (ø)
superset/examples/supported_charts_dashboard.py 0.00% <ø> (ø)
superset/examples/world_bank.py 30.66% <ø> (ø)
... and 44 more

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@zhaoyongjie
Copy link
Member

zhaoyongjie commented May 31, 2023

@michael-s-molina

  1. IMO, the renaming/refining of TimeGrain should be put in a separate chore pull request rather than this one.
  2. I can't get result of your screenshot. (on this PR branch)
image

BTW, If you could provide a dataset for testing, it would be helpful in testing this PR. We should use a same dataset to test the accuracy of the result.

@zhaoyongjie
Copy link
Member

Additionally, I believe that the time grain and time offset granularity should be consistent. We might use same dataset to discuss this topic.

@michael-s-molina
Copy link
Member Author

  1. IMO, the renaming/refining of TimeGrain should be put in a separate chore pull request rather than this one.

I could do that but I guess I was excited to fix the problem and ended up doing all at once. Sorry if this makes it more difficult to review but I think the important part is that I took the time to improve the code base.

  1. I can't get result of your screenshot. (on this PR branch)

It was not working indeed with GENERIC_CHART_AXES enabled because the query object didn't contain a time_grain_sqla field. I submitted a fix to get this from the form_data and now it should be working.

@michael-s-molina michael-s-molina force-pushed the fix-time-shifts branch 3 times, most recently from 06116f7 to c87c24d Compare June 1, 2023 15:01
Comment on lines 491 to 493
offset_metrics_df[index] = offset_metrics_df[index] - DateOffset(
**normalize_time_delta(offset)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the index of offset_metrics_df seems constructing an appropriate index for the result of dataframe, if the align of the results incorrect, I think we should find root cause from here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand what you're suggesting. Can you explain more?

@zhaoyongjie zhaoyongjie dismissed their stale review June 2, 2023 18:10

Thanks for provide more information. If I have more time, will continue review.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few second pass comments

superset/common/query_context_processor.py Outdated Show resolved Hide resolved
superset/common/query_context_processor.py Outdated Show resolved Hide resolved
superset/common/query_context_processor.py Outdated Show resolved Hide resolved
superset/common/query_context_processor.py Outdated Show resolved Hide resolved
superset/db_engine_specs/base.py Show resolved Hide resolved
tests/unit_tests/common/test_get_aggregated_join_column.py Outdated Show resolved Hide resolved
@michael-s-molina
Copy link
Member Author

@villebro I addressed your comments.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small optional stylistic comments

superset/constants.py Outdated Show resolved Hide resolved
superset/db_engine_specs/base.py Show resolved Hide resolved
tests/unit_tests/common/test_get_aggregated_join_column.py Outdated Show resolved Hide resolved
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@zhaoyongjie
Copy link
Member

@villebro @michael-s-molina I'll take another look this PR, give me some time. Thanks!

@zhaoyongjie zhaoyongjie mentioned this pull request Jun 8, 2023
9 tasks
@zhaoyongjie
Copy link
Member

zhaoyongjie commented Jun 8, 2023

@michael-s-molina @villebro

Draft a PR to refine the join key generator. please review. Thanks both!

@michael-s-molina michael-s-molina merged commit e5b7f7c into apache:master Jun 8, 2023
@michael-s-molina
Copy link
Member Author

Draft a PR to refine the join key generator. please review. Thanks both!

@zhaoyongjie That's not the best way to help reviewing a PR. I spent a lot of time debugging the issue, addressing comments and executing tests using Airbnb charts to get to a solution. If you think the solution can be improved, you can make suggestions, provide more context, contribute code to the PR, etc.

Completely ignoring the PR and opening your own is not cool 😕

@zhaoyongjie
Copy link
Member

@michael-s-molina
I've made the following suggestions during the code review:

  1. It would be better to keep this pull request focused on the necessary changes rather than including refactorings and unrelated fixes. Let's aim to minimize the scope of the PR.
  2. We should avoid hard coding values without considering the maintainability for other engineers.

@michael-s-molina
Copy link
Member Author

michael-s-molina commented Jun 8, 2023

@zhaoyongjie I thought I have addressed both your suggestions with my latest commits. You even removed the Request Changes status. Then you asked time to review the PR and suddenly opened your own which was really confusing to me.

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.0.0 labels Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XXL 🚢 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants