Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce hashids permalink keys #19324

Merged
merged 10 commits into from
Mar 24, 2022
Merged

Conversation

villebro
Copy link
Member

@villebro villebro commented Mar 23, 2022

SUMMARY

This PR implements hashids based permalinks on both Dashboard and Explore, and deprecates the currently supported integer and UUID based permalinks that were introduced here: #19078.

Old UUID format (default): https://host/dashboard/p/035ed596-546d-4563-a482-02d5dd4649d8/
Old numeric format (optional): https://host/dashboard/p/1/
New format: https://host/dashboard/p/dVK7AbavkPy/

The motivation for this change is this discussion. TL;DR: hashids provide shorter and less hideous permalinks thanks to being base62 (vs UUID which uses a regular base16 hex alphabet) and obfuscate numeric ids in a way that makes it infeasibly difficult to guess the actual id.

When creating a permalink for the first time, the app checks if a "salt" has been created for the resource in question (separate for Dashboard and Explore): if not, it creates a 48 character string using secrets.token_urlsafe (similar to how the temporary cache keys are currently constructed) and persists it in the key-value store under the "app" resource. By automatically generating the salt we avoid the need for introducing new config parameters. By creating separate salts for Dashboard and Explore we ensure that

  • the user can't guess permalink ids for Explore based on Dashboard permalinks and vice versa
  • No two deployments will have the same permalink salt

Using SECRET_KEY was not possible for the following reasons:

  • It would be shared by both resources (the point made above)
  • Rotating the SECRET_KEY would have required rotating all permalink keys, which is impossible, as they're "out in the wild"
  • If a vulnerability were to be exposed in hashids compromising the salt, the SECRET_KEY could be exposed

The minimum length of the hashids key is set at 11 characters: this was mostly an arbitrary decision. However, here's a few reasons that support using 11 chars:

  • Relatively short while still retaining good protection against guessability: only after creating 2*10^16 key-value pairs will we need to resort to 12 characters long permalinks
  • Youtube appears to have 11 characters long keys. If it works for Youtube, it probably works for the average Superset deployment

Note that this will invalidate permalinks created on deployments that have been running with this cherry: #19078. However, since this PR is fairly recent and hasn't been released in an official Apache release, the risk is minimal that this would have been in production use anywhere. Needless to say, this cherry will be included in the forthcoming 1.5.0 version.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@villebro villebro changed the title feat: introduce hashids permalink keys [WIP] feat: introduce hashids permalink keys Mar 23, 2022
@villebro villebro changed the title [WIP] feat: introduce hashids permalink keys feat: introduce hashids permalink keys Mar 23, 2022
@codecov
Copy link

codecov bot commented Mar 23, 2022

Codecov Report

Merging #19324 (f32601b) into master (3313530) will increase coverage by 0.02%.
The diff coverage is 98.74%.

@@            Coverage Diff             @@
##           master   #19324      +/-   ##
==========================================
+ Coverage   66.65%   66.67%   +0.02%     
==========================================
  Files        1675     1676       +1     
  Lines       64654    64687      +33     
  Branches     6503     6503              
==========================================
+ Hits        43092    43127      +35     
+ Misses      19876    19874       -2     
  Partials     1686     1686              
Flag Coverage Δ
hive 52.74% <60.62%> (+0.04%) ⬆️
mysql 81.67% <98.74%> (+0.02%) ⬆️
postgres 81.72% <98.74%> (+0.02%) ⬆️
presto 52.59% <60.62%> (+0.04%) ⬆️
python 82.14% <98.74%> (+0.02%) ⬆️
sqlite 81.49% <98.74%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/config.py 91.63% <ø> (-0.06%) ⬇️
superset/dashboards/permalink/schemas.py 100.00% <ø> (ø)
superset/temporary_cache/utils.py 100.00% <ø> (ø)
superset/views/core.py 75.98% <0.00%> (+0.10%) ⬆️
...uperset/dashboards/filter_state/commands/create.py 100.00% <100.00%> (ø)
...uperset/dashboards/filter_state/commands/update.py 100.00% <100.00%> (ø)
superset/dashboards/permalink/api.py 87.27% <100.00%> (-0.45%) ⬇️
superset/dashboards/permalink/commands/base.py 100.00% <100.00%> (ø)
superset/dashboards/permalink/commands/create.py 92.59% <100.00%> (ø)
superset/dashboards/permalink/commands/get.py 82.35% <100.00%> (ø)
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3313530...f32601b. Read the comment docs.

@villebro villebro force-pushed the villebro/hashids branch 2 times, most recently from f5a43e4 to b6d36d5 Compare March 23, 2022 15:35
@villebro
Copy link
Member Author

@michael-s-molina @ktmud @graceguo-supercat @betodealmeida reviews and comments for the new permalink format would be much appreciated!

@michael-s-molina
Copy link
Member

I noticed that when we paste a copied link, it will quickly update the URL with permalink_key and then native_filters_key. Is the permalink_key publishing necessary?

Screen.Recording.2022-03-24.at.11.51.57.AM.mov

@ktmud
Copy link
Member

ktmud commented Mar 24, 2022

Thank you for the quick iteration and thoughtful implementation. Will take a look later today!

@ktmud
Copy link
Member

ktmud commented Mar 24, 2022

Haven't looked into the code in details, but based on the PR description, I'm wondering whether we can just generate a new salt and a new uuid for every permalink and save the hashid itself directly in the KeyValue table.

We will add a new string column "key" in KeyValueEntry table with index=true, unique=true and nullable=false.

When creating a permalink:

  1. We generate a uuid + a new salt
  2. Generate a hashid with the salt + uuid
  3. Save KeyValueEntry(uuid=uuid, key=hashid).

When querying a permalink, you just filter the table by the key string.

Yes, this will increase the size of the KeyValueEntry table but it should also greatly reduce code complexity. It saves the additional query for getting the shared salt and the need to decode a permalink.

@ktmud
Copy link
Member

ktmud commented Mar 24, 2022

By saving the key string itself, we also open the possibility for users to configure custom short urls in the future.

@villebro
Copy link
Member Author

villebro commented Mar 24, 2022

I'm wondering whether we can just generate a new salt and a new uuid for every permalink and save the hashid itself directly in the KeyValue table.

Do you mean regenerate the salt each time? This would open up the theoretical risk of collisions, and would in practice be the same as using token_urlsafe(11), thus removing the need for hashids, which is what @michael-s-molina originally proposed here.

We will add a new string column "key" in KeyValueEntry table with index=true, unique=true and nullable=false.

I'm not opposed to introducing a new field in the table, but if it becomes very specific to permalinks, we may be better off with a dedicated table for them.

  1. We generate a uuid + a new salt

Again, I may be missing something, but since hashids only supports basing off of positive integers, we'd still need that serially generated id to generate the key, which in turn would require an update operation after creating a new entry. So if I understand what you're proposing correctly, for this use case it would be much simpler to just use token_urlsafe(11) and persist the UUID based on the token and namespace, after which it becomes directly queryable.

@villebro
Copy link
Member Author

villebro commented Mar 24, 2022

By saving the key string itself, we also open the possibility for users to configure custom short urls in the future.

In the case of permalinks, I don't think having the ability to generate "vanity ids" is as useful as for e.g. dashboards (=slugs), as the state contained in permalinks will probably usually tend to go stale over time.

@ktmud
Copy link
Member

ktmud commented Mar 24, 2022

So if I understand what you're proposing correctly, for this use case it would be much simpler to just use token_urlsafe(11) and persist the UUID based on the token and namespace, after which it becomes directly queryable.

Indeed, what I described was just a more complex version of token_urlsafe. In that case, current implementation looks good to me. Although maybe simply using token_urlsafe like @michael-s-molina original prosed may also work.

CreateKeyValueCommand(resource=RESOURCE, value=value, key=uuid_key).run()


@memoized
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@ktmud
Copy link
Member

ktmud commented Mar 24, 2022

Approving in case you want to merge as is and address @michael-s-molina 's question in another PR.

@villebro
Copy link
Member Author

villebro commented Mar 24, 2022

I noticed that when we paste a copied link, it will quickly update the URL with permalink_key and then native_filters_key. Is the permalink_key publishing necessary?

Yes, I agree completely - this is a horrible flicker. I would have addressed this here, but it would have required doing a lot of slightly unrelated changes. Since this can be changed later, I'd prefer to get this PR merged first and happy to add more polish to this once it's done.

Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the improvement!

@villebro
Copy link
Member Author

Thanks for the reviews! I'm going to merge as is, and like I said, happy to keep improving this and related features!

@villebro villebro merged commit f4b71ab into apache:master Mar 24, 2022
@villebro villebro deleted the villebro/hashids branch March 24, 2022 19:53
villebro added a commit that referenced this pull request Apr 3, 2022
* feat: introduce hashids permalink keys

* implement dashboard permalinks

* remove shorturl notice from UPDATING.md

* lint

* fix test

* introduce KeyValueResource

* make filterState optional

* fix test

* fix resource names

(cherry picked from commit f4b71ab)
philipher29 pushed a commit to ValtechMobility/superset that referenced this pull request Jun 9, 2022
* feat: introduce hashids permalink keys

* implement dashboard permalinks

* remove shorturl notice from UPDATING.md

* lint

* fix test

* introduce KeyValueResource

* make filterState optional

* fix test

* fix resource names
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.0.0 and removed 🚢 2.0.1 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels lts-v1 preset-io size/XL 🍒 1.5.0 🍒 1.5.1 🍒 1.5.2 🍒 1.5.3 🚢 2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants