Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(druid): Remove legacy Druid NoSQL logic #23997

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented May 9, 2023

SUMMARY

This PR addresses Superset 3.0 task #11 by removing the remaining legacy Druid NoSQL logic which was removed in Superset 2.0. Specifically it primarily removes form-data keys including:

  • druid_time_origin
  • having_druid
  • having_filters*
  • granularity**

* This represents SIMPLE, i.e., non-SQL, having filters which were only supported with the Druid NoSQL connector.

** This term is used inappropriately and represents either the Druid NoSQL time granularity or the SQL time column (an alias for granularity_sqla used in the query context and frontend). Differentiating between the two is somewhat of a challenge.

Note this PR leaves the Slice.datasource_type column (and associated logic) as is—which was primarily used to differentiate between the SQL and Druid NoSQL connectors—as it isn't officially deprecated and maybe used to currently (or in the future) for ad-hoc query based datasets (or similar).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

  • Grepped the code (in a case insensitive way) for the term DRUID and the various form-data keys.
  • CI.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@john-bodley john-bodley force-pushed the john-bodley--remove-legacy-druid-logic branch from d6ea0db to c7cf36a Compare May 9, 2023 23:49
@@ -230,8 +230,6 @@ def pivot_table_v2(
Pivot table v2.
"""
verbose_map = datasource.data["verbose_map"] if datasource else None
if form_data.get("granularity_sqla") == "all" and DTTM_ALIAS in df:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely a bug (see like #259), i.e., it likely should have been granularity rather than granularity_sqla as this represents the Druid NoSQL time grain (as opposed to the SQL time column) where all is a Druid NoSQL concept which represents a single time grain/bucket to encapsulate the entire time range.

# [TimeZone List]
# See: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
# other tz can be overridden by providing a local_config
DRUID_TZ = tz.tzutc()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were all unused.

@@ -372,9 +372,7 @@ def query_obj(self) -> QueryObjectDict: # pylint: disable=too-many-locals
del groupby[groupby_labels.index(DTTM_ALIAS)]
is_timeseries = True

granularity = self.form_data.get("granularity") or self.form_data.get(
Copy link
Member Author

@john-bodley john-bodley May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat concerning as these terms mean different things in the form-data. Clearly a byproduct of extremely poorly named and confusing variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, good reminder to NEVER have dual meanings for variables.

@@ -1188,11 +1180,7 @@ def query_obj(self) -> QueryObjectDict:
"month": "P1M",
"year": "P1Y",
}
time_grain = mapping[self.form_data.get("subdomain_granularity", "min")]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now inlined.

@john-bodley john-bodley force-pushed the john-bodley--remove-legacy-druid-logic branch 4 times, most recently from cd2ccaf to 2bfa10f Compare May 10, 2023 00:15
@john-bodley john-bodley marked this pull request as ready for review May 10, 2023 00:16
@john-bodley john-bodley force-pushed the john-bodley--remove-legacy-druid-logic branch 4 times, most recently from 9a15940 to 97d27f8 Compare May 10, 2023 00:46
@codecov
Copy link

codecov bot commented May 10, 2023

Codecov Report

Merging #23997 (5d77073) into master (c4242a3) will increase coverage by 0.00%.
The diff coverage is 57.14%.

❗ Current head 5d77073 differs from pull request most recent head ec1b1ea. Consider uploading reports for the commit ec1b1ea to get more accurate results

@@           Coverage Diff           @@
##           master   #23997   +/-   ##
=======================================
  Coverage   69.08%   69.08%           
=======================================
  Files        1905     1905           
  Lines       74848    74810   -38     
  Branches     8108     8107    -1     
=======================================
- Hits        51705    51684   -21     
+ Misses      21031    21015   -16     
+ Partials     2112     2111    -1     
Flag Coverage Δ
hive 53.68% <46.15%> (+0.01%) ⬆️
javascript 55.62% <0.00%> (+<0.01%) ⬆️
mysql 79.32% <61.53%> (+0.02%) ⬆️
postgres 79.39% <61.53%> (+0.02%) ⬆️
presto 53.61% <46.15%> (+0.01%) ⬆️
python 83.29% <61.53%> (+0.02%) ⬆️
sqlite 77.92% <61.53%> (+0.01%) ⬆️
unit 54.09% <38.46%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...d/packages/superset-ui-chart-controls/src/types.ts 100.00% <ø> (ø)
...ges/superset-ui-core/src/query/buildQueryObject.ts 100.00% <ø> (ø)
...packages/superset-ui-core/src/query/types/Query.ts 100.00% <ø> (ø)
...-frontend/src/components/AlteredSliceTag/index.jsx 88.57% <0.00%> (ø)
...src/dashboard/util/getFilterConfigsFromFormdata.js 100.00% <ø> (+4.54%) ⬆️
superset-frontend/src/explore/constants.ts 100.00% <ø> (ø)
...et-frontend/src/explore/controlPanels/Separator.js 0.00% <ø> (ø)
...rontend/src/visualizations/FilterBox/FilterBox.jsx 50.40% <ø> (+0.78%) ⬆️
...end/src/visualizations/FilterBox/transformProps.ts 7.14% <ø> (ø)
superset/charts/post_processing.py 93.23% <ø> (+0.64%) ⬆️
... and 10 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@john-bodley john-bodley force-pushed the john-bodley--remove-legacy-druid-logic branch 3 times, most recently from f9bb97e to 7a3b045 Compare May 10, 2023 01:34
@john-bodley john-bodley force-pushed the john-bodley--remove-legacy-druid-logic branch from 7a3b045 to 963f436 Compare May 10, 2023 02:05
@john-bodley john-bodley requested review from eschutho and villebro May 10, 2023 02:05
@john-bodley john-bodley force-pushed the john-bodley--remove-legacy-druid-logic branch from 963f436 to e489a1b Compare May 10, 2023 03:23
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few first pass comments. My comments regarding granularity may be baseless, but if needed I can double check by doing some testing, as I think a few of these should not be removed.

Comment on lines 1361 to 1256
"granularity": {
"description": "Name of temporal column used for time filtering. For legacy Druid datasources this defines the time grain.",
"nullable": true,
"type": "string"
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have trouble wrapping my head around these different granularities and when they're in the main payload vs extras, but shouldn't this still be here, only without the reference to legacy Druid? I looked at the QueryObject class, and granularity still seems to be featured in the constructor/property list.

Alternatively, do we need to update line 1367 which states that granularity_sqla "is deprecated, use granularity instead."?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro you're right and I've reverted the deletion. There is already a comment stating that granularity_sqla is deprecated.

@@ -34,7 +34,6 @@

from superset.common.chart_data import ChartDataResultFormat
from superset.utils.core import (
DTTM_ALIAS,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always thrilled to see the removal of a DTTM_ALIAS reference 🙂

@@ -372,9 +372,7 @@ def query_obj(self) -> QueryObjectDict: # pylint: disable=too-many-locals
del groupby[groupby_labels.index(DTTM_ALIAS)]
is_timeseries = True

granularity = self.form_data.get("granularity") or self.form_data.get(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, good reminder to NEVER have dual meanings for variables.

@john-bodley john-bodley force-pushed the john-bodley--remove-legacy-druid-logic branch from e489a1b to 6af08eb Compare May 10, 2023 17:20
@john-bodley
Copy link
Member Author

Thanks @villebro for the review. I've addressed your comments.

@michael-s-molina and @villebro will the whole granularity/granularity_sqla be obsolete once the generic x-axis feature is rolled out and no-longer configurable behind a feature flag? I'll be stoked once this wart—and the other temporal misnomers—have been removed.

@john-bodley john-bodley force-pushed the john-bodley--remove-legacy-druid-logic branch from 6af08eb to 9d83f20 Compare May 10, 2023 20:58
@john-bodley john-bodley requested review from rusackas and villebro May 16, 2023 01:58
@john-bodley
Copy link
Member Author

@villebro I've addressed your comments. Would you mind re-reviewing the PR?

@john-bodley john-bodley force-pushed the john-bodley--remove-legacy-druid-logic branch from 9d83f20 to a36e933 Compare June 7, 2023 20:23
@john-bodley john-bodley force-pushed the john-bodley--remove-legacy-druid-logic branch from a36e933 to 51b6af6 Compare June 7, 2023 20:25
Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic job @john-bodley! Really reduced the complexity. I left just one comment for a field that may be removed.

@@ -1159,10 +1151,7 @@ class Meta: # pylint: disable=too-few-public-methods
)
filters = fields.List(fields.Nested(ChartDataFilterSchema), allow_none=True)
granularity = fields.String(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't granularity be removed?

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this! 🔥 🔥 🔥 🔥 🔥

"description": "Name of temporal column used for time filtering. "
"For legacy Druid datasources this defines the time grain."
},
metadata={"description": "Name of temporal column used for time filtering. "},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michael-s-molina I believe your question is somewhat akin to the conversation with @villebro where the term "granularity" is overloaded. Per the augmented description you can see it referred to the time gran for Druid NoSQL connector (which is now obsolete), however it also is used to represent the temporal time column for the SQL connector.

@michael-s-molina michael-s-molina merged commit 9adb023 into apache:master Jun 9, 2023
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.0.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants