-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(thumbnails): add support for user specific thumbs #22328
Conversation
Codecov Report
@@ Coverage Diff @@
## master #22328 +/- ##
==========================================
+ Coverage 66.89% 66.90% +0.01%
==========================================
Files 1847 1850 +3
Lines 70611 70677 +66
Branches 7749 7749
==========================================
+ Hits 47233 47285 +52
- Misses 21362 21376 +14
Partials 2016 2016
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
8636996
to
48e90c3
Compare
superset/models/slice.py
Outdated
@classmethod | ||
def get(cls, id_: int) -> Slice: | ||
session = db.session() | ||
qry = session.query(Slice).filter_by(id=id_) | ||
return qry.one_or_none() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a similar method in Dashboard
that bypasses the base filter
superset/tasks/utils.py
Outdated
def get_executor( | ||
executor_types: List[ExecutorType], | ||
model: Union[Dashboard, ReportSchedule, Slice], | ||
initiator: Optional[str] = None, | ||
) -> Tuple[ExecutorType, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why this function returns the username as a string, and not the User
object, is because this function will be called frequently (e.g. when getting all charts/dashboards). Since we only know the selenium username in the config, we would otherwise have to fetch it from the metastore, causing unnecessary round trips to the metastore.
superset/thumbnails/tasks.py
Outdated
@@ -0,0 +1,101 @@ | |||
# Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is really the same as superset/tasks/thumbnails.py
before, just updated with the new logic (fetching the executor and overriding the username etc)
AlertQueryError(), | ||
), | ||
(["gamma"], None, [ExecutorType.INITIATOR], AlertQueryError()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really just one added test for the new INITIATOR
case, other than that it's just updating the test cases to conform to the new sig of get_executor
@@ -0,0 +1,323 @@ | |||
# Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was mostly moved from the previous tests/unit_tests/reports/test_utils.py
file, just updating to new return types + adding new relevant test cases.
f419302
to
8a67d59
Compare
@villebro Thank you for implementation of this feature. Just sharing our usecase. But this implementation will force the thumbnail generation for all the dashboards for each user, which is not ideal for us. |
47abf41
to
14de647
Compare
@kamalkeshavani-aiinside we can certainly consider adding this in a future PR. All it would really require is adding a Would you be open to working on this feature? I'm happy to provide guidance and review help if needed. |
from enum import Enum | ||
|
||
|
||
class ExecutorType(str, Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same type as the previous ReportScheduleExecutor
, but with the added INITIATOR
enum.
c7aa129
to
10a7ce0
Compare
feb651d
to
298705b
Compare
dashboard = db.session.query(Dashboard).all()[0] | ||
self.login(username="admin") | ||
uri = f"api/v1/dashboard/{dashboard.id}/thumbnail/{dashboard.digest}/" | ||
rv = self.client.get(uri) | ||
_, thumbnail_url = self._get_id_and_thumbnail_url(DASHBOARD_URL) | ||
rv = self.client.get(thumbnail_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the digest may now be affected by who is logged in, all tests are updated to fetch the thumbnail URL via the API after login.
f6bdf66
to
5d2ce4e
Compare
@villebro Thank you for the suggestion. Sure, I can try to work on this with your help. |
) | ||
|
||
unique_string = _adjust_string_for_executor(unique_string, executor_type, executor) | ||
return md5_sha_from_str(unique_string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our current dashboard digest is:
@property
def digest(self) -> str:
"""
Returns a MD5 HEX digest that makes this dashboard unique
"""
unique_string = f"{self.position_json}.{self.css}.{self.json_metadata}"
return md5_sha_from_str(unique_string)
Adding an executor will invalidate all current computed dashboard thumbnails.
I would also prefer to bring these changes back to /superset/tasks/thumbnails
and avoid introducing the deprecation and underlying breaking change on the task structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding an executor will invalidate all current computed dashboard thumbnails.
Oof, that was an unintended mistake, the executor was only supposed to be added in the case of CURRENT_USER
.
I would also prefer to bring these changes back to /superset/tasks/thumbnails and avoid introducing the deprecation and underlying breaking change on the task structure.
Makes sense - I'll revert the move
b9ecbb7
to
67759ed
Compare
67759ed
to
77d32e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely @michael-s-molina 👍 As I also found a bug today in the PR I'm going to let it simmer over the weekend as I feel there's room for improvement in the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good, since this change will invalidate thumbnails cache it would be nice to add a note on UPDATING.md
return func(dashboard, executor_type, executor) | ||
|
||
unique_string = ( | ||
f"{dashboard.id}\n{dashboard.charts}\n{dashboard.position_json}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will dashboard.charts
generate a N+1 issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR shouldn't change current performance, as this property is already present on the current request payload.
SUMMARY
This PR adds support for generating user-specific thumbnails. This is a typical requirement in environments where some form of user impersonation is being used, and sharing thumbnails across all users with access to the same dashboards/charts could leak sensitive data.
This PR does the following:
ReportScheduleExecutor
toExecutorType
so that it can be reused for thumbnails. Also moves the utils from thereports
package to thetasks
package, as it's shared with thumbnails now.thumbnails
to contain the thumbs-specific types etc. Also move the thumbnail task module here and deprecate the old one (it will still work, but will now emit a deprecation warning).CURRENT_USER
which corresponds to the logged-in user that initiated the request. This user will be undefined for Alerts & Reports (=Celery has initiated those), but for thumbnails, this is the user that requested the thumbnail.THUMBNAIL_EXECUTE_AS
- similar config as for Alerts & ReportsTHUMBNAIL_DASHBOARD_DIGEST_FUNC
: callback for generating custom digests for dashboards. This is handy if a deployment wants to use different hashing functions or use advanced logic for deciding if a thumbnail can be shared across a larger user pool or not.THUMBNAIL_CHART_DIGEST_FUNC
: callback for generating custom digests for chartsBy default the digests will stay unchanged, as the new default value for
THUMBNAIL_EXECUTE_AS = [ExecutorType.SELENIUM]
. However, when setting it to[ExecutorType.CURRENT_USER]
, the username will be added to theunique_string
prior to hashing to make it unique per user.AFTER
This chart thumbnail was cached using the following query on a Trino database connection using user impersonation with the following virtual dataset:
As can be seen, both the
current_user
(rendered by Trino) and{{ current_username() }}
(rendered by Superset) both show the user as beingv_brofeldt
, i.e. not a service account.BEFORE
When I changed the chart to reference a postgres database with basic auth using the username
postgres
andTHUMBNAIL_EXECUTE_AS = [ExecutorType.SELENIUM]
(=same as current behavior), the result was as follows:TESTING INSTRUCTIONS
ADDITIONAL INFORMATION